Use Cert-Manager in OpenShiftβοΈ
The following material covers Let's Encrypt certificate automation with cert-manager using AWS Route53.
The cert-manager is a Kubernetes/OpenShift operator that allows to issue and automatically renew SSL certificates. In this tutorial, the steps to secure DNS Name will be demonstrated.
Below is an instruction on how to automatically issue and install wildcard certificates on OpenShift Ingress Controller and API Server covering all cluster Routes. To secure separate OpenShift Routes, please refer to the OpenShift Route Support project for cert-manager
.
PrerequisitesβοΈ
- The cert-manager;
- OpenShift v4.7 - v4.11;
- Connection to the OpenShift Cluster;
- Enabled AWS IRSA;
- The latest
oc
utility. Thekubectl
tool can also be used for most of the commands.
Install Cert-Manager OperatorβοΈ
Install the cert-manager
operator via OpenShift OperatorHub that uses Operator Lifecycle Manager (OLM):
-
Go to the OpenShift Admin Console β OperatorHub, search for the
cert-manager
, and click Install: -
Modify the
ClusterServiceVersion
OLM resource, by selecting the Update approval β Manual. If selecting Update approval β Automatic after the automatic operator update, the parameters in theClusterServiceVersion
will be reset to default.Note
Installing an operator with Manual approval causes all operators installed in namespace
openshift-operators
to function as manual approval strategy. In case the Manual approval is chosen, review the manual installation plan and approve it. -
Navigate to Operators β Installed Operators and check the operator status to be Succeeded:
-
In case of errors, troubleshoot the Operator issues:
Create AWS Role for Route53βοΈ
The cert-manager
should be configured to validate Wildcard certificates using the DNS-based method.
-
Check the DNS Hosted zone ID in AWS Route53 for your domain.
-
Create Route53 Permissions policy in AWS for
cert-manager
to be able to create DNS TXT records for the certificate validation. In this example,cert-manager
permissions are given for a particular DNS zone only. Replace Hosted zone ID XXXXXXXX in the "Resource": "arn:aws:route53:::hostedzone/XXXXXXXXXXXX".{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "route53:GetChange", "Resource": "arn:aws:route53:::change/*" }, { "Effect": "Allow", "Action": [ "route53:ChangeResourceRecordSets", "route53:ListResourceRecordSets" ], "Resource": "arn:aws:route53:::hostedzone/XXXXXXXXXXXX" } ] }
-
Create an AWS Role with Custom trust policy for the
cert-manager
service account to use the AWS IRSA feature and then attach the created policy. Replace the following:${aws-account-id}
with the AWS account ID of the EKS cluster.${aws-region}
with the region where the EKS cluster is located.${eks-hash}
with the hash in the EKS API URL; this will be a random 32 character hex string, for example, 45DABD88EEE3A227AF0FA468BE4EF0B5.${namespace}
with the namespace where cert-manager is running.${service-account-name}
with the name of the ServiceAccount object created by cert-manager.- By default, it is "system:serviceaccount:openshift-operators:cert-manager" if
cert-manager
is installed via OperatorHub. - Attach the created Permission policy for Route53 to the Role.
-
Optionally, add Permissions boundary to the Role.
{ "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": "sts:AssumeRoleWithWebIdentity", "Principal": { "Federated": "arn:aws:iam::* ${aws-account-id}:oidc-provider/oidc.eks.${aws-region}.amazonaws.com/id/${eks-hash}" }, "Condition": { "StringEquals": { "oidc.eks.${aws-region}.amazonaws.com/id/${eks-hash}:sub": "system:serviceaccount:${namespace}:${service-account-name}" } } } ] }
-
Copy the created Role ARN.
Configure Cert-Manager Integration With AWS Route53βοΈ
-
Annotate the
ServiceAccount
created bycert-manager
(required for AWS IRSA), and restart thecert-manager
pod. -
Replace the
eks.amazonaws.com/role-arn
annotation value with your own Role ARN. -
Modify the
cert-manager
Deployment
with the correct file system permissionsfsGroup: 1001
, so that theServiceAccount
token can be read.Note
In case the
ServiceAccount
token cannot be read and the operator is installed using the OperatorHub, addfsGroup: 1001
via OpenShift ClusterServiceVersion OLM resource. It should be acert-manager
controller spec. These actions are not required for OpenShift v4.10.Info
A mutating admission controller will automatically modify all pods running with the service account:
cert-manager controller pod
apiVersion: apps/v1 kind: Pod # ... spec: # ... serviceAccountName: cert-manager serviceAccount: cert-manager containers: - name: ... # ... env: - name: AWS_ROLE_ARN value: >- arn:aws:iam::XXXXXXXXXXX:role/cert-manager - name: AWS_WEB_IDENTITY_TOKEN_FILE value: /var/run/secrets/eks.amazonaws.com/serviceaccount/token volumeMounts: - name: aws-iam-token readOnly: true mountPath: /var/run/secrets/eks.amazonaws.com/serviceaccount volumes: - name: aws-iam-token projected: sources: - serviceAccountToken: audience: sts.amazonaws.com expirationSeconds: 86400 path: token defaultMode: 420
-
If you have separate public and private DNS zones for the same domain (split-horizon DNS), modify the
cert-manager
Deployment
in order to validate DNS TXT records via public recursive nameservers.Note
Otherwise, you will be getting an error during a record validation:
To avoid the error, addWaiting for DNS-01 challenge propagation: NS ns-123.awsdns-00.net.:53 returned REFUSED for _acme-challenge.
--dns01-recursive-nameservers-only --dns01-recursive-nameservers=8.8.8.8:53,1.1.1.1:53
as ARGs to thecert-manager
controllerDeployment
.labels: app: cert-manager app.kubernetes.io/component: controller app.kubernetes.io/instance: cert-manager app.kubernetes.io/name: cert-manager app.kubernetes.io/version: v1.9.1 spec: containers: - args: - '--v=2' - '--cluster-resource-namespace=$(POD_NAMESPACE)' - '--leader-election-namespace=kube-system' - '--dns01-recursive-nameservers-only' - '--dns01-recursive-nameservers=8.8.8.8:53,1.1.1.1:53'
Note
The
Deployment
must be modified via OpenShift ClusterServiceVersion OLM resource if the operator was installed using the OperatorHub. The OpenShiftClusterServiceVersion
OLM resource includes several Deployments, and the ARGs must be modified only for thecert-manager
controller.- Save the resource. After that, OLM will try to reload the resource automatically and save it to the YAML file. If OLM resets the config file, double-check the entered values.
Configure ClusterIssuersβοΈ
ClusterIssuer
is available on the whole cluster.
-
Create the ClusterIssuer resource for Let's Encrypt Staging and Prod environments that signs a Certificate using
cert-manager
.Note
Let's Encrypt has a limit of duplicate certificates in the Prod environment. Therefore, a
ClusterIssuer
has been created for Let's Encrypt Staging environment. By default, Let's Encrypt Staging certificates will not be trusted in your browser. The certificate validation cannot be tested in the Let's Encrypt Staging environment.- Change
user@example.com
with your contact email. - Replace
hostedZoneID
XXXXXXXXXXX with the DNS Hosted zone ID in AWS for your domain. - Replace the region value
${region}
. - The secret under
privateKeySecretRef
will be created automatically by thecert-manager
operator.
apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-staging spec: acme: email: user@example.com server: https://acme-staging-v02.api.letsencrypt.org/directory privateKeySecretRef: name: letsencrypt-staging-issuer-account-key solvers: - dns01: route53: region: ${region} hostedZoneID: XXXXXXXXXXX
apiVersion: cert-manager.io/v1 kind: ClusterIssuer metadata: name: letsencrypt-prod spec: acme: email: user@example.com server: https://acme-v02.api.letsencrypt.org/directory privateKeySecretRef: name: letsencrypt-prod-issuer-account-key solvers: - dns01: route53: region: ${region} hostedZoneID: XXXXXXXXXXX
- Change
-
Check the
ClusterIssuer
status: -
If the
ClusterIssuer
state is not ready, investigatecert-manager
controller pod logs:
Configure CertificatesβοΈ
-
In two different namespaces, create a Certificate resource for the OpenShift Router (Ingress controller for OpenShift) and for the OpenShift APIServer.
- OpenShift Router supports a single wildcard certificate for Ingress/Route resources in different namespaces (so called, default SSL certificate). The Ingress controller expects the certificates in a
Secret
to be created in theopenshift-ingress
namespace; the API Server, in theopenshift-config
namespace. Thecert-manager
operator will automatically create these secrets from theCertificate
resource. - Replace
${DOMAIN}
with your domain name. It can be checked withoc whoami --show-server
. Put domain names in quotes.
The certificate for OpenShift Router in the `openshift-ingress` namespace
apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: router-certs namespace: openshift-ingress labels: app: cert-manager spec: secretName: router-certs secretTemplate: labels: app: cert-manager duration: 2160h # 90d renewBefore: 360h # 15d subject: organizations: - Org Name commonName: '*.${DOMAIN}' privateKey: algorithm: RSA encoding: PKCS1 size: 2048 rotationPolicy: Always usages: - server auth - client auth dnsNames: - '*.${DOMAIN}' - '*.apps.${DOMAIN}' issuerRef: name: letsencrypt-staging kind: ClusterIssuer
The certificate for OpenShift APIServer in the `openshift-config` namespace
apiVersion: cert-manager.io/v1 kind: Certificate metadata: name: api-certs namespace: openshift-config labels: app: cert-manager spec: secretName: api-certs secretTemplate: labels: app: cert-manager duration: 2160h # 90d renewBefore: 360h # 15d subject: organizations: - Org Name commonName: '*.${DOMAIN}' privateKey: algorithm: RSA encoding: PKCS1 size: 2048 rotationPolicy: Always usages: - server auth - client auth dnsNames: - '*.${DOMAIN}' - '*.apps.${DOMAIN}' issuerRef: name: letsencrypt-staging kind: ClusterIssuer
Info
-
cert-manager
supports ECDSA key pairs in theCertificate
resource. To use it, change RSAprivateKey
to ECDSA:
rotationPolicy: Always
is highly recommended sincecert-manager
does not rotate private keys by default.- Full
Certificate
spec is described in thecert-manager
API documentation.
- OpenShift Router supports a single wildcard certificate for Ingress/Route resources in different namespaces (so called, default SSL certificate). The Ingress controller expects the certificates in a
-
Check that the certificates in the namespaces are ready:
-
Check the details of the certificates via CLI:
-
Check the cert-manager controller pod logs if the Staging Certificate condition is not ready for more than 7 minutes:
-
When the certificate is ready, its private key will be put into the OpenShift
Secret
in the namespace indicated in theCertificate
resource:
Modify OpenShift Router and API Server Custom ResourcesβοΈ
-
Update the Custom Resource of your Router (Ingress controller). Patch the
defaultCertificate
object value with{ "name": "router-certs" }
:oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec": { "defaultCertificate": { "name": "router-certs" }}}' --insecure-skip-tls-verify
Info
After updating the
IngressController
object, the OpenShift Ingress operator redeploys the router. -
Update the Custom Resource for the OpenShift API Server:
-
Export the name of
APIServer
:
-
Patch the
servingCertificate
object value with{ "name": "api-certs" }
:
-
Move From Let's Encrypt Staging Environment to ProdβοΈ
-
Test the Staging certificate on the OpenShift Admin Console. The
--insecure
flag is used because Let's Encrypt Staging certificates are not trusted in browsers by default: -
Change
issuerRef
toletsencrypt-prod
in bothCertificate
resources:oc edit certificate api-certs -n openshift-config oc edit certificate router-certs -n openshift-ingress
Note
In case the certificate reissue is not triggered after that, try to force the certificate renewal with cmctl:
If this won't work, delete the
api-certs
androuter-certs
secrets. It should trigger the Prod certificates issuance:Please note that these actions will lead to logging your account out of the OpenShift Admin Console, since certificates will be deleted. Accept the certificate warning in the browser and log in again after that.
-
Check the status of the Prod certificates:
-
Check the web console and make sure it has secure connection:
Troubleshoot CertificatesβοΈ
Below is an example of the DNS TXT challenge
record created by the cert-manager
operator:
Use nslookup
or dig
tools to check if the DNS propagation for the TXT record is complete:
Otherwise, use web tools like Google Admin Toolbox:
If the correct TXT value is shown (the value corresponds to the current TXT value in the DNS zone), it means that the DNS propagation is complete and Let's Encrypt is able to access the record in order to validate it and issue a trusted certificate.
Note
If the DNS validation challenge self check fails, cert-manager
will retry the self check with a fixed 10-second retry interval. Challenges that do not ever complete the self check will continue retrying until the user intervenes by either retrying the Order
(by deleting the Order
resource) or amending the associated Certificate
resource to resolve any configuration errors.
As soon as the domain ownership has been verified, any cert-manager
affected validation TXT records in the AWS Route53 DNS zone will be cleaned up.
Please find below the issues that may occur and their troubleshooting:
- When certificates are not issued for a long time, or a
cert-manager
resource is not in a Ready state, describing a resource may show the reason for the error.
- Basically, the
cert-manager
creates the following resources during aCertificate
issuance:CertificateRequest
,Order
, andChallenge
. Investigate each of them in case of errors.
- Use the cmctl tool to show the state of a
Certificate
and its associated resources.
-
Check the
cert-manager
controller pod logs:
-
Certificate error debugging:
a. Decode certificate chain located in the secrets:oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.crt"}}' | base64 -d | while openssl x509 -noout -text; do :; done 2>/dev/null oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.crt"}}' | base64 -d | while openssl x509 -noout -text; do :; done 2>/dev/null
cmctl inspect secret router-certs -n openshift-ingress cmctl inspect secret api-certs -n openshift-config
b. Check the SSL RSA private key consistency:
oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -check -noout oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -check -noout
c. Match the SSL certificate public key against its RSA private key. Their modulus must be identical:
diff <(oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.crt"}}' | base64 -d | openssl x509 -noout -modulus | openssl md5) <(oc get secret api-certs -n openshift-config -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -noout -modulus | openssl md5) diff <(oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.crt"}}' | base64 -d | openssl x509 -noout -modulus | openssl md5) <(oc get secret router-certs -n openshift-ingress -o 'go-template={{index .data "tls.key"}}' | base64 -d | openssl rsa -noout -modulus | openssl md5)
Remove Obsolete Certificate Authority Data From KubeconfigβοΈ
After updating the certificates, the access to the cluster via Lens or CLI will be denied because of the untrusted certificate errors:
Such behavior appears because the oc
tool references an old CA data in the kubeconfig file.
Note
Examine the Certificate Authority data using the following command:
oc config view --minify --raw -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 -d | openssl x509 -text
This certificate has the CA:TRUE
parameter, which means that this is a self-signed root CA certificate.
To fix the error, remove the old CA data from your OpenShift kubeconfig file:
Since this field will be absent in the kubeconfig file, system root SSL certificate will be used to validate the cluster certificate trust chain. On Ubuntu, Let's Encrypt OpenShift cluster certificates will be validated against Internet Security Research Group
root in /etc/ssl/certs/ca-certificates.crt
.
Certificate RenewalsβοΈ
The cert-manager
automatically renews the certificates based on the X.509 certificate's duration and the renewBefore
value. The minimum value for the spec.duration
is 1 hour; for spec.renewBefore
, 5 minutes. It is also required that spec.duration
> spec.renewBefore
.
Use the cmctl tool to manually trigger a single instant certificate renewal:
Otherwise, manually renew all certificates in all namespaces with the app=cert-manager
label:
Run the cmctl renew --help
command to get more details.