Deploying the Platform in a public AWS cloud environment

On this page:
🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions.

This guide provides detailed instructions on deploying the Platform in an Amazon Web Services (AWS) environment from scratch, from creating an AWS account to installing and configuring the Platform.

1. Prerequisites

Before installing and configuring the Platform, you must perform the following steps.

1.1. Elements required at the pre-deployment stage

Before you begin, make sure you have the resources that are required for further steps:

Documentation:
  • Updating document with additional steps for the selected Platform release version. It is required only for the Platform update procedure.

Digital signature certificates (digital-signature-ops certificates):
  • Key-6.dat — your organization’s private key;

  • allowed-key.yaml — a list of all issued keys. Initially includes only Key-6.dat. When the key is changed, information about the new key is added here without deleting the old one;

  • CAs.json — a list of all Accredited Key Certification Centers from the ІІТ website;

  • CACertificates.p7b - a public key from the ІІТ website.

Configuration files for the hardware cryptomodule:
  • 3 files with appropriate values (see attached examples):

  • sign.key.device-type — specify the device type for the key (file);

  • sign.key.file.issuer — specify the Accredited Key Certification Center that issued the key (change the value inside the file to match your issuer);

  • sign.key.file.password — specify the password for the file key (change the value inside the file to match your password).

    4 files with empty values (create 4 empty files with the following names):

  • sign.key.hardware.device — key device type (hardware);

  • sign.key.hardware.password —  hardware key password;

  • sign.key.hardware.type — key type;

  • osplm.ini — INI configuration.

To learn more about loading and updating the keys and digital signature certificates, see Updating the Platform digital signature keys and certificates.

1.2. Creating an AWS account

Before installing OpenShift Container Platform on AWS, you need to create an AWS account.

To learn how to do this, refer to the AWS documentation: How do I create and activate a new AWS account?

1.3. Setting up an AWS account

Before installing OpenShift Container Platform, you need to set up your AWS account.

1.3.1. Configuring Route 53

To install OpenShift Container Platform, you need to register a domain name. You can do this using the Amazon Route 53 service or any other domain name registrar.

Also, the AWS account you use must have a dedicated public hosted zone in your Route 53 service.

For details, refer to the Origin Kubernetes Distribution (OKD) documentation: Configuring Route 53.

1.3.2. Configuring an external domain

If you registered the domain name through an external registrar, not Route 53, you need to delegate the domain. To do this, perform these steps:

  1. Sign in to your AWS account and create a public hosted zone using the Route 53 service as described in Configuring Route 53. Use the same domain name as you registered externally.

  2. In the Route 53 console, go to the public hosted zone you created and check the NS type record (name servers process DNS requests for the domain name). The Value column contains a list of NS server names. Save them as they will be needed later.

  3. Go to the external domain name registrar where you created the domain name.

  4. Open the domain settings and find the settings related to NS servers.

  5. Provide the NS servers you copied from the public hosted zone in your AWS account.

1.3.3. AWS account limits

The OpenShift Container Platform cluster uses a number of AWS components, and the default service limits affect your ability to install a cluster.

To see a list of AWS components whose limits may impact your ability to install and run an OpenShift Container Platform cluster, refer to the OKD documentation: AWS account limits.

You must also increase the CPU limit for your Amazon on-demand virtual machines. For details, refer to the AWS documentation: How do I request an EC2 vCPU limit increase for my On-Demand Instance?

1.3.4. Creating an IAM user

  1. Before installing OpenShift Container Platform, create an IAM user. For details, refer to the AWS documentation: Creating an IAM user in your AWS account.

  2. In addition, perform these important steps:

    • Remove any Service control policies (SCPs) restrictions from your AWS account.

    When you create a cluster, an associated AWS OpenID Connect (OIDC) identity provider is also created. The OIDC provider configuration is based on the public key stored in the AWS region us-east-1. Customers using AWS SCP must allow the use of the region us-east-1 even if the cluster is deployed in a different region. If these policies are not configured correctly, permission errors may occur since the OKD installer verifies them.

    + TIP: For details, refer to section 1.1. DEPLOYMENT PREREQUISITES of the following document: Red Hat OpenShift Service on AWS 4. Prepare your environment.

    • Properly configure the permissions boundary for the IAM user you created.

      Here is an example of a permissions boundary policy. You can use it or completely remove any permissions boundary.

      Setting the permissions boundary policy
      {
          "Version": "2012-10-17",
          "Statement": [
              {
                  "Effect": "Allow",
                  "NotAction": [
                      "iam:*"
                  ],
                  "Resource": "*"
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "iam:Get*",
                      "iam:List*",
                      "iam:Tag*",
                      "iam:Untag*",
                      "iam:GenerateServiceLastAccessedDetails",
                      "iam:GenerateCredentialReport",
                      "iam:SimulateCustomPolicy",
                      "iam:SimulatePrincipalPolicy",
                      "iam:UploadSSHPublicKey",
                      "iam:UpdateServerCertificate",
                      "iam:CreateInstanceProfile",
                      "iam:CreatePolicy",
                      "iam:DeletePolicy",
                      "iam:CreatePolicyVersion",
                      "iam:DeletePolicyVersion",
                      "iam:SetDefaultPolicyVersion",
                      "iam:CreateServiceLinkedRole",
                      "iam:DeleteServiceLinkedRole",
                      "iam:CreateInstanceProfile",
                      "iam:AddRoleToInstanceProfile",
                      "iam:DeleteInstanceProfile",
                      "iam:RemoveRoleFromInstanceProfile",
                      "iam:UpdateRole",
                      "iam:UpdateRoleDescription",
                      "iam:DeleteRole",
                      "iam:PassRole",
                      "iam:DetachRolePolicy",
                      "iam:DeleteRolePolicy",
                      "iam:UpdateAssumeRolePolicy",
                      "iam:CreateGroup",
                      "iam:UpdateGroup",
                      "iam:AddUserToGroup",
                      "iam:RemoveUserFromGroup",
                      "iam:PutGroupPolicy",
                      "iam:DetachGroupPolicy",
                      "iam:DetachUserPolicy",
                      "iam:DeleteGroupPolicy",
                      "iam:DeleteGroup",
                      "iam:DeleteUserPolicy",
                      "iam:AttachUserPolicy",
                      "iam:AttachGroupPolicy",
                      "iam:PutUserPolicy",
                      "iam:DeleteUser",
                      "iam:CreateRole",
                      "iam:AttachRolePolicy",
                      "iam:PutRolePermissionsBoundary",
                      "iam:PutRolePolicy"
                  ],
                  "Resource": "*"
              },
              {
                  "Effect": "Allow",
                  "Action": [
                      "iam:CreateAccessKey",
                      "iam:DeleteAccessKey",
                      "iam:UpdateAccessKey",
                      "iam:CreateLoginProfile",
                      "iam:DeleteLoginProfile",
                      "iam:UpdateLoginProfile",
                      "iam:ChangePassword",
                      "iam:CreateVirtualMFADevice",
                      "iam:EnableMFADevice",
                      "iam:ResyncMFADevice",
                      "iam:DeleteVirtualMFADevice",
                      "iam:DeactivateMFADevice",
                      "iam:CreateServiceSpecificCredential",
                      "iam:UpdateServiceSpecificCredential",
                      "iam:ResetServiceSpecificCredential",
                      "iam:DeleteServiceSpecificCredential"
                  ],
                  "Resource": "*"
              }
          ]
      }
To learn more about creating an IAM user, refer to the OKD documentation: Creating an IAM user.

1.3.5. Required AWS permissions for the IAM user

To deploy all components of an OpenShift Container Platform cluster, the IAM user requires certain permissions that must be attached to that user.

To see an example of these permissions, refer to the OKD documentation: Required AWS permissions for the IAM user.

1.4. Creating additional accounts

Before installing OpenShift Container Platform on AWS, you need to create a Docker Hub and Red Hat account.

This is necessary to form a docker pull secret that will be used later.

1.4.1. Creating a Docker Hub account

  • Some services use images from Docker Hub repositories. To use them, you need an account. For details, refer to the Docker documentation: Docker ID accounts.

  • Additionally, the limit on the number of images uploaded per day may prevent the services from starting. To avoid this, you’ll need to upgrade your subscription to the Pro level. This will change the limit from 200 image pulls per 6 hours to 5,000 image pulls per day. For details, refer to the Docker documentation: Upgrade your subscription.

1.4.2. Creating a Red Hat account

To download the images required to install OpenShift Container Platform, you need a Red Hat account. For details, refer to the Red Hat documentation: Red Hat Login ID and Account.

This is necessary to download the generated pull secret later, as described in Preparing to install the OKD cluster on AWS. It will allow you to authenticate and download container images for OpenShift Container Platform components.

2. Deploying additional resources to install an OKD cluster on AWS

Launch the following resources in AWS to install the cluster and platform successfully. The diagram below illustrates the infrastructure scheme with these resources. This is done to simplify the platform installation and avoid unwanted errors associated with setting up from a local computer.

image

2.1. Description of additional resources

A more detailed description of the additional resources from the scheme is depicted below:

  • S3 Bucket is used to store the Terraform state.

  • DynamoDB is a table in DynamoDB that will be used to store information about the locking of the Terraform state.

  • NAT Gateway provides a private server with Internet access.

  • Bastion is an intermediary server that ensures secure and restricted access to the server in a private network. Subsequently, an SSH tunnel will be established through this bastion to the Initial-node.

  • Initial-node is a server in a private network, through which the cluster and platform will be installed.

These resources can be deployed using the prepared Terraform code in the following steps.

The following table provides the recommended settings for Bastion.

Table 1. Bastion settings
# Setting Value

1

Instance type

t2.nano

2

vCPUs

1

3

RAM

0.5 GiB

4

CPU Credits/hr

3

5

Platform

Ubuntu

6

AMI name

ubuntu-bionic-18.04-amd64-server-20210224

7

Volume

8 Gb

The following table provides the recommended settings for the secondary virtual machine.

Table 2. Secondary virtual machine settings
# Setting Value

1

Instance type

t2.medium

2

vCPUs

2

3

RAM

4 GiB

4

CPU Credits/hr

24

5

Platform

Ubuntu

6

AMI name

ubuntu-bionic-18.04-amd64-server-20210224

7

Volume

150 Gb

2.2. Additional settings

2.2.1. Installing necessary tools

For further actions, installing the required tools on your local computer is necessary.

You can check the correct installation of these tools using the following commands:

Example 1. Checking the installation of tools
Checking unzip
$ unzip -v
Checking aws cli
$ aws --version
Checking terraform
$ terraform version

2.2.2. Setting up AWS CLI

Authenticate your AWS account using AWS CLI. To do this, execute the following command:

AWS account authentication
$ aws configure
AWS Access Key ID [None]: AKIAIOSFODNN7EXAMPLE
AWS Secret Access Key [None]: JalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Default region name [None]: eu-central-1
Default output format [None]: json
For more detailed information on the AWS account authentication process using AWS CLI, refer to the official AWS documentation: Configure the AWS CLI.

2.2.3. Setting up AWS cross-account access

Before running the Terraform code, it must be downloaded. This requires access to an AWS S3 bucket where it is stored. This is possible only if a special IAM role is created. Follow these steps to do so:

  1. Create an AWS IAM role.

    $ aws iam create-role \
          --role-name UserCrossAccountRole \
          --description "Role for uploading terraform files from AWS S3" \
          --assume-role-policy-document '{
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Action": "sts:AssumeRole",
                        "Effect": "Allow",
                        "Principal": {
                            "AWS": "arn:aws:iam::<YourAccountId>:root"
                        }
                    }
                ]
              }'

    <YourAccountId> — insert your AWS account ID here

  2. Create an AWS IAM policy.

    $ aws iam create-policy \
          --policy-name UserCrossAccountPolicy \
          --policy-document '{
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Action": "sts:AssumeRole",
                        "Effect": "Allow",
                        "Resource": "arn:aws:iam::764324427262:role/CustomCrossAccountRole"
                    }
                ]
              }'
  3. Attach the policy to the role.

    $ aws iam attach-role-policy \
          --role-name UserCrossAccountRole \
          --policy-arn arn:aws:iam::<YourAccountId>:policy/UserCrossAccountPolicy
    <YourAccountId> — insert your AWS account ID here.
  4. Add the required values for the role to the config file.

    $ cat <<EOT >> ~/.aws/config
    [profile user-cross-account-role]
    role_arn = arn:aws:iam::764324427262:role/CustomCrossAccountRole
    source_profile = default
    EOT
  5. After this, you must contact the support team, who should add your AWS account ID to the Trust relationships for the CustomCrossAccountRole in their AWS account.

2.2.4. Downloading Terraform code

  1. Download the archive with the Terraform code.

    $ aws s3 cp s3://mdtu-ddm-platform-installer/terraform/terraform.zip terraform.zip  --profile user-cross-account-role
  2. Unzip the Terraform code into a separate directory.

    $ unzip terraform.zip -d ~/terraform

2.3. Description of the Terraform code

As an example of process automation, Terraform code was implemented, which can be customized to your parameters and used for deploying infrastructure.

2.3.1. Initial Terraform code

This Terraform code is designed to create resources for further steps. These resources include:

  • S3 Bucket

  • DynamoDB Table

Initial Code. Description of Terraform files:
  • main.tf – the main configuration Terraform file. It contains modules for creating:

    • S3 bucket

    • DynamoDB table

  • providers.tf — used to define the Terraform version, necessary plugins, and AWS provider parameters.

  • variables.tf — used for describing all the variables used in the Terraform configuration.

  • terraform.tfvars — contains values for specific variables defined in the Terraform configuration files. If necessary, change the values for the following parameters to the required ones:

    • region — this variable determines the AWS region in which the resources will be created.

    • tags — this variable adds tags (labels) to the resources.

2.3.2. Main Terraform code

The main Terraform code deploys all the necessary resources. Below is a description of the templates.

Main Code. Description of Terraform files:
  • main.tf — the main configuration Terraform file. It contains modules for creating:

    • VPC;

    • ec2_bastion;

    • ec2_instance;

    • key_pair.

  • providers.tf — used to define the Terraform version, necessary plugins, and AWS provider parameters. Be sure to change the values for the following parameters as needed:

    • bucket — this variable contains the name of the S3 bucket. Change <ACCOUNT_ID> to your AWS account ID.

  • iam-node-role.tf — used to create a special IAM role with the necessary permissions. This will enable setting up AWS cross-account resource access and downloading the Docker image for containers and the Installer.

  • elastic-ip.tf — used to create an AWS Elastic IP (EIP) resource using Terraform.

  • security-groups.tf — creates Security Groups, which allow SSH connections (TCP port 22) to the bastion and initial-node.

  • ssh-key.tf — contains code for creating an SSH private key, saving it to a file, and setting its access rights.

  • files/user_data.sh.tpl is a script template executed when creating or updating an EC2 instance in the AWS environment. This script will do the following for the initial-node:

    • install Docker;

    • install Unzip;

    • install AWS CLI v2;

    • additionally, set up AWS cross-account resource access.

  • variables.tf — used for describing all the variables used in the Terraform configuration.

  • terraform.tfvars — contains values for specific variables defined in the Terraform configuration files. If necessary, change the values for the following parameters as needed:

    • region — this variable determines the AWS region where resources will be created.

    • platform_name — this variable adds a name to the cluster and AWS resources.

    • ingress_cidr_blocks — Add the necessary IP address here to connect via SSH to an additional virtual machine.

    • prefix_list_ids — if multiple addresses need to be opened for connection, create a prefix-list and use its ID in this parameter.

    • tags — this variable adds tags (labels) to the resources.

2.4. Running Terraform code

After the changes made in the previous steps, the Terraform code is now ready to be executed.

2.4.1. Running the Initial Terraform code

  1. Sequentially execute the following commands to navigate to the directory with the initial Terraform code and initialize the working Terraform directory.

    $ cd ~/terraform/initCode
    
    $ terraform init
  2. Use the following command to apply the changes defined in the configuration files and create the resources.

    $ terraform apply -auto-approve
  3. Wait for the resources to be created.

2.4.2. Running the Main Terraform code

  1. Sequentially execute the following commands to navigate to the directory with the main Terraform code and initialize the working Terraform directory.

    $ cd ~/terraform/mainCode
    
    $ terraform init
  2. Use the following command to apply the changes defined in the configuration files and create the resources.

    $ terraform apply -auto-approve
  3. Wait for the resources to be created.

2.5. Connecting to the additional virtual machine

To connect to the additional virtual machine from your computer, you need to create an SSH tunnel. Use the following command:

Example 2. Creating an SSH tunnel
$ ssh -i <SSH_KEY> -L 1256:<NODE_PRIVATE_IP>:22 -N -f ubuntu@<BASTION_PUBLIC_IP>

After creating an SSH tunnel, you can connect to the additional virtual machine. Use the following command:

Example 3. Connecting via SSH
$ ssh -i <SSH_KEY> ubuntu@localhost -p 1256
Additional virtual machine purpose

You need to perform all subsequent steps on the additional virtual machine, namely the installation of the cluster and Platform.

2.6. Starting the openshift-install container

To install the cluster using the openshift-install Docker image, perform the following steps.

  1. Sign in to AWS Elastic Container Registry (ECR).

    $ sudo aws ecr get-login-password --profile cross-account-role --region eu-central-1 | docker login --username AWS --password-stdin 764324427262.dkr.ecr.eu-central-1.amazonaws.com
  2. Download the Docker image.

    $ docker pull 764324427262.dkr.ecr.eu-central-1.amazonaws.com/openshift-install:v3
  3. Tag the Docker image you downloaded.

    $ docker tag 764324427262.dkr.ecr.eu-central-1.amazonaws.com/openshift-install:v3 openshift-install:v3
  4. Create a new folder to keep all the cluster data.

    $ mkdir ~/openshift-cluster
  5. Switch to the folder you created.

    $ cd ~/openshift-cluster
  6. Run the openshift-install container.

    $ sudo docker run --rm -it --name openshift-install-v3 \
        --user root:$(id -g) \
        --net host \
        -v $(pwd):/tmp/openshift-cluster \
        --env AWS_ACCESS_KEY_ID=<КЛЮЧ_ДОСТУПУ> \
        --env AWS_SECRET_ACCESS_KEY=<secret-access-key> \
        openshift-install:v3 bash

3. Preparing to install the OKD cluster on AWS

In OpenShift Container Platform version 4.11, you can install a customized cluster on infrastructure that the installation program provisions on AWS.

OKD version

The recommended OKD version is 4.11.0-0.okd-2022-08-20-022919.

To install the cluster, perform the following steps:

  1. Inside the container, switch to the /tmp/openshift-cluster folder.

    $ cd /tmp/openshift-cluster
  2. Follow the steps described in the official documentation on the OKD website, Installing a cluster on AWS with customizations, up to the step Obtaining an AWS Marketplace image: Obtaining an AWS Marketplace image.

  3. Download the customized OKD installer that includes a fix for the blocker issue described at [OCPBUGS-11636](https://issues.redhat.com/browse/OCPBUGS-11636).

    $ aws s3 cp s3://mdtu-ddm-platform-installer/okd-installer/openshift-install-fix-aws-4.11.0-0.okd-2022-08-20-022-fix-aws.tar.gz openshift-install-fix-aws-4.11.0-0.okd-2022-08-20-022-fix-aws.tar.gz --profile cross-account-role
  4. Unzip the installation program from the downloaded archive.

    $ tar xvfz openshift-install-fix-aws-4.11.0-0.okd-2022-08-20-022-fix-aws.tar.gz
    To configure the installation, you need to create a install-config.yaml file and enter the necessary parameters before installing the cluster.
  5. Create a new directory for the cluster configuration files and the install-config.yaml file. Do this by executing the following commands in sequence:

    $ mkdir /tmp/openshift-cluster/cluster-state
    
    $ touch /tmp/openshift-cluster/cluster-state/install-config.yaml

    After creating the file, fill it with the necessary parameters. The created configuration file includes only the necessary parameters for a minimal cluster deployment. For customization of settings, refer to the official documentation.

    Recommended parameters for install-config.yaml:
    install-config.yaml
    apiVersion: v1
    baseDomain: <BASE_DOMAIN>(1)
    compute:
      - architecture: amd64
        hyperthreading: Enabled
        name: worker
        platform:
          aws:
            zones:
              - eu-central-1c
            rootVolume:
              size: 80
              type: gp3
            type: r5.2xlarge
            amiID: ami-094fe1584439e91dd
        replicas: 3
    controlPlane:
      architecture: amd64
      hyperthreading: Enabled
      name: master
      platform:
        aws:
          zones:
            - eu-central-1c
          rootVolume:
            size: 80
            type: gp3
          type: r5.xlarge
      replicas: 3
    metadata:
      name: <CLUSTER_NAME>(2)
    networking:
      clusterNetwork:
        - cidr: 10.128.0.0/14
          hostPrefix: 23
      machineNetwork:
        - cidr: 10.0.0.0/16
      networkType: OVNKubernetes
    platform:
      aws:
        region: eu-central-1
        userTags:
          'user:tag': <CLUSTER_NAME>(2)
    publish: External
    pullSecret: <PULL_SECRET>(4)
    sshKey: <SSHKEY>(3)
    • (1) <BASE_DOMAIN> — the domain name you created and configured earlier. For details, see Configuring Route 53 and Configuring an external domain.

    • (2) <CLUSTER_NAME> — the name of the future OKD cluster.

    • (3) <SSHKEY> — one or more SSH keys used to access the cluster machines. You can use the same key that was created during the OKD cluster installation, or any other key.

      For details, refer to the OKD documentation: Optional configuration parameters.
    • (4) <PULL_SECRET> — the secret you created earlier (for details, see Creating additional accounts). You need to get this secret from the Red Hat OpenShift Cluster Manager.

      To learn more, refer to step 5 on this OKD page: Obtaining the installation program.

      You need to add your Red Hat and Docker Hub credentials to the pull secret. A combined secret will look as follows:

      An example of a combined *pull secret*
      {
         "auths":{
            "cloud.openshift.com":{
               "auth":"b3Blb=",
               "email":"test@example.com"
            },
            "quay.io":{
               "auth":"b3Blb=",
               "email":"test@example.com"
            },
            "registry.connect.redhat.com":{
               "username":"test",
               "password":"test",
               "auth":"b3Blb=",
               "email":"test@example.com"
            },
            "registry.redhat.io":{
               "username":"test",
               "password":"test",
               "auth":"b3Blb=",
               "email":"test@example.com"
            },
            "index.docker.io/v2/":{
               "username":"test",
               "password":"test",
               "auth":"b3Blb=",
               "email":"test@example.com"
            }
         }
      }

      For convenience, the pull secret should be written to the install-config.yaml file in one line. The final secret will look as follows:

      An example of a one-line *pull secret*
      '{"auths":{"cloud.openshift.com":{"auth":"b3Blb=","email":"test@example.com"},"quay.io":{"auth":"b3Blb=","email":"test@example.com"},"registry.connect.redhat.com":{"username":"test","password":"test","auth":"b3Blb=","email":"test@example.com"},"registry.redhat.io":{"username":"test","password":"test","auth":"b3Blb=","email":"test@example.com"},"index.docker.io/v2/":{"username":"test","password":"test","auth":"b3Blb=","email":"test@example.com"}}}'
    The Installer deletes the install-config.yaml file when creating the cluster. We recommend backing up the install-config.yaml file if you need to deploy multiple clusters.
  6. Also, execute the following command to customize the OpenShift cluster version 4.11 installation. This variable allows you to specify a specific image used during the installation.

    $ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE="quay.io/openshift/okd:4.11.0-0.okd-2022-08-20-022919"

4. Running the OKD4 installer and deploying an empty OKD4 cluster

After install-config.yaml is created, run the following command to deploy the OKD cluster:

Installing the OKD cluster
$ ./openshift-install create cluster --dir /tmp/openshift-cluster/cluster-state --log-level=info
The cluster deployment process usually takes up to 1 hour.

Upon successful deployment, the following cluster access and credential information displays in your terminal:

  • login;

  • password;

  • a link to the cluster’s web console

image

The installation program generates a series of cluster definition files for your cluster in the installation directory, which are necessary for its uninstallation.

To learn more, refer to the Prerequisites section on this OKD page: Uninstalling a cluster on AWS.

This directory will also contain the /auth folder with two authentication files: for working with the cluster through the web console and the OKD command line interface (OKD CLI).

5. Replacing self-signed certificates with trusted certificates

To replace self-signed certificates with trusted ones, you first need to obtain these certificates.

This section describes obtaining free certificates from Let’s Encrypt and installing them on your server.

Let’s Encrypt certificates are obtained using the acme.sh utility.

To learn about using Let’s Encrypt via the ACME protocol, refer to the Let’s Encrypt documentation: ACME Client Implementations.
To replace the certificates, perform the following steps:
  1. Set the environment variable. The variable must point to the kubeconfig file.

    $ export KUBECONFIG=cluster-state/auth/kubeconfig
  2. Create the letsencrypt.sh file and paste the following script into it:

    Certificate replacement script
    #!/bin/bash
    yum install -y openssl
    mkdir -p certificates
    export CERT_HOME=./certificates
    export CURDIR=$(pwd)
    cd $CERT_HOME
    
    # Clone the acme.sh utility from the GitHub repository
    git clone https://github.com/neilpang/acme.sh
    sed -i "2i AWS_ACCESS_KEY_ID=\"${AWS_ACCESS_KEY_ID}\"" ./acme.sh/dnsapi/dns_aws.sh
    sed -i "3i AWS_SECRET_ACCESS_KEY=\"${AWS_SECRET_ACCESS_KEY}\"" ./acme.sh/dnsapi/dns_aws.sh
    cd $CURDIR
    # Get API Endpoint URL
    export LE_API="$(oc whoami --show-server | cut -f 2 -d ':' | cut -f 3 -d '/' | sed 's/-api././')"
    # Get Wildcard Domain
    export LE_WILDCARD="$(oc get ingresscontroller default -n openshift-ingress-operator -o jsonpath='{.status.domain}')"
    ${CERT_HOME}/acme.sh/acme.sh --register-account -m user_${RANDOM}@example.com
    ${CERT_HOME}/acme.sh/acme.sh --issue -d ${LE_API} -d *.${LE_WILDCARD} --dns dns_aws
    export CERTDIR=$CERT_HOME/certificates
    mkdir -p ${CERTDIR}
    
    # Transfer certificates from the default acme.sh path to a more convenient directory using the --install-cert - key
    ${CERT_HOME}/acme.sh/acme.sh --install-cert -d ${LE_API} -d *.${LE_WILDCARD} --cert-file ${CERTDIR}/cert.pem --key-file ${CERTDIR}/key.pem --fullchain-file ${CERTDIR}/fullchain.pem --ca-file ${CERTDIR}/ca.cer
    # Create secret
    oc create secret tls router-certs --cert=${CERTDIR}/fullchain.pem --key=${CERTDIR}/key.pem -n openshift-ingress
    # Update Custom Resource for Router
    oc patch ingresscontroller default -n openshift-ingress-operator --type=merge --patch='{"spec": { "defaultCertificate": { "name": "router-certs" }}}'
  3. Make the script executable.

    $ chmod +x ./letsencrypt.sh
  4. Run the script.

    $ bash -x ./letsencrypt.sh
  5. Exit the container after running the script. To do this, use the following command. The container will be deleted automatically.

    Exiting the container
    $ exit

6. Preparing and launching the Installer to deploy and update the Platform on the OKD cluster

Before launching the Installer, you need to prepare the workstation where it will run.

6.1. Deploying from scratch

6.1.1. Prerequisites

Before running the Platform installation script, perform the following steps:

  1. Download the appropriate version of the Installer by running the following sequence of commands.

    $ mkdir ~/installer
    
    $ cd ~/installer
    
    $ sudo aws s3 cp --profile cross-account-role s3://mdtu-ddm-platform-installer/<VERSION>/mdtu-ddm-platform-<VERSION>.zip mdtu-ddm-platform-<VERSION>.zip
  2. Unpack the Installer to a separate directory.

    $ unzip mdtu-ddm-platform-(version).zip -d ./installer-<VERSION>
  3. Copy kubeconfig from the installed cluster.

    $ cp ~/openshift-cluster/cluster-state/auth/kubeconfig ./installer-<VERSION>
  4. Transfer the certificates and digital-signature-ops service support files to the certificates directory and go to the Installer directory.

    $ cp -r /path/to/folder/certificates/ ./installer-<VERSION>
    
    $ cd installer-<VERSION>

6.1.2. Configuring MinIO

When deploying the Platform from scratch, no additional configuration is required for MinIO.

6.1.3. Configuring Vault

When deploying the Platform from scratch, no additional configuration is required for Vault.

6.1.4. Deploying the Platform from the Installer

  1. Run the following commands:

    $ IMAGE_CHECKSUM=$(sudo docker load -i control-plane-installer.img | sed -r "s#.*sha256:(.*)#\1#" \| tr -d '\n')
    $ echo $IMAGE_CHECKSUM
    $ sudo docker tag ${IMAGE_CHECKSUM} control-plane-installer:<VERSION>
  2. Start the installation process of the new Platform with the images:

    $ sudo docker run --rm \
        --name control-plane-installer-<VERSION> \
        --user root:$(id -g) \
        --net host \
        -v $(pwd):/tmp/installer \
        --env KUBECONFIG=/tmp/installer/kubeconfig \
        --env idgovuaClientId=f90ab33dc272f047dc330c88e5663b75 \
        --env idgovuaClientSecret=cba49c104faac8c718e6daf3253bc55f2bf11d9e \
        --env deploymentMode=<DEPLOYMENT_MODE> \
        --entrypoint "/bin/sh" control-plane-installer:<VERSION> \
        -c "./install.sh -i"
    • --rm — this flag will automatically delete the container when it exits. You can remove the flag if you need to inspect the state and logs of the completed container, or if you have an unstable Internet connection.

    • DEPLOYMENT_MODE — this variable can be set to development or production.

6.1.5. Deployment status

The following log indicates the Platform update process was successful:

image

If you removed the --rm flag during the Deploying the Platform from the Installer step, you need to do the following:
  1. Run the following command to verify that the container has exited with a status of 0, which indicates that it has completed successfully.

    $ docker ps --all --latest

    image

  2. Remove the container using the following command:

    $ docker rm $(docker ps --latest -q)

6.1.6. Post-deployment required steps

  1. After installing the Platform, make sure the cluster-management pipeline has started and passed successfully (with a green status). Only after this the Platform will be ready for deploying registries. Without this action, the registries will not deploy.

    You can locate the cluster-management pipeline using the following path:

    OKD Web UI > control-plane NS > Routes > jenkins url > cluster-mgmt > MASTER-Build-cluster-mgmt.

  2. Request access to the IIT widget as described here: https://eu.iit.com.ua/sign-widget/v20200922/.

Additional resources state

After all the steps are completed, you can shut down Bastion and the additional virtual machine. They will not be needed until the next Platform update.

6.2. Updating

6.2.1. Prerequisites

Before running the Platform installation script, perform the following steps:

  1. Download the appropriate version of the Installer by running the following sequence of commands.

    $ mkdir ~/installer
    
    $ cd ~/installer
    
    $ sudo aws s3 cp --profile cross-account-role s3://mdtu-ddm-platform-installer/<VERSION>/mdtu-ddm-platform-<VERSION>.zip mdtu-ddm-platform-<VERSION>.zip
  2. Unpack the Installer to a separate directory.

    $ unzip mdtu-ddm-platform-(version).zip -d ./installer-<VERSION>
  3. Copy kubeconfig from the installed cluster.

    $ cp ~/openshift-cluster/cluster-state/auth/kubeconfig ./installer-<VERSION>
  4. Transfer the certificates and digital-signature-ops service support files to the certificates directory and go to the Installer directory.

    $ cp -r /path/to/folder/certificates/ ./installer-<VERSION>
    
    $ cd installer-<VERSION>

6.2.2. Configuring MinIO

  1. Copy Terraform state data for MinIO from the previous release.

    $ cp ~/installer/installer-<VERSION>/terraform/minio/aws/terraform.tfstate ./terraform/minio/aws/
  2. Copy the MinIO key from the previous release.

    $ cp ~/installer/installer-<VERSION>/terraform/minio/aws/private_minio.key ./terraform/minio/aws/

6.2.3. Configuring Vault

  1. Copy Terraform state data for Vault from the previous release.

    $ cp ~/installer/installer-<VERSION>/terraform/vault/aws/terraform.tfstate ./terraform/vault/aws/
  2. Copy the Vault key from the previous release.

    $ ~/installer/installer-<VERSION>/terraform/vault/aws/private.key ./terraform/vault/aws/

6.2.4. Updating the Platform from the Installer

  1. Run the following commands:

    $ IMAGE_CHECKSUM=$(sudo docker load -i control-plane-installer.img | sed -r "s#.*sha256:(.*)#\1#" \| tr -d '\n')
    $ echo $IMAGE_CHECKSUM
    $ sudo docker tag ${IMAGE_CHECKSUM} control-plane-installer:<VERSION>
  2. Update the Platform version with the images:

    $ sudo docker run --rm \
        --name control-plane-installer-<VERSION> \
        --user root:$(id -g) \
        --net host \
        -v $(pwd):/tmp/installer \
        --env KUBECONFIG=/tmp/installer/kubeconfig \
        --env idgovuaClientId=f90ab33dc272f047dc330c88e5663b75 \
        --env idgovuaClientSecret=cba49c104faac8c718e6daf3253bc55f2bf11d9e \
        --env deploymentMode=<DEPLOYMENT_MODE> \
        --entrypoint "/bin/sh" control-plane-installer:<VERSION> \
        -c "./install.sh -u"
    • --rm — this flag will automatically delete the container when it exits. You can remove the flag if you need to inspect the state and logs of the completed container, or if you have an unstable Internet connection.

    • DEPLOYMENT_MODE — this variable can be set to development or production, depending on the previous installation.

    Run the script twice if the generated log does NOT match the point Update status.

6.2.5. Update status

The following log indicates the Platform update process was successful:

image

If you removed the --rm flag during the Updating the Platform from the Installer step, you need to do the following:
  1. Run the following command to verify that the container has exited with a status of 0, which indicates that it has completed successfully.

    $ docker ps --all --latest

    image

  2. Remove the container using the following command:

    $ docker rm $(docker ps --latest -q)

6.2.6. Required steps after update

After updating the Platform from the Installer:
  1. Navigate to the Update section.

  2. Perform the necessary specific update steps for your version of the Platform.

  3. As part of the particular update steps, refresh the Platform’s infrastructure components via the Control Plane interface.

Status of additional resources

After completing all actions, the bastion and additional virtual machine can be turned off. They will not be needed until the next platform update.

7. Common errors during the Platform deployment

In this section, we review errors that may occur when deploying the Platform from scratch and provide methods to resolve them.

7.1. Bootstrap machine error when deploying the OKD cluster

Problem description

The following error occurs during cluster deployment:

Bootstrap virtual machine error
level=error msg=Attempted to gather ClusterOperator status after installation failure: listing ClusterOperator objects: Get "https://api.<CLUSTER_URL>:6443/apis/config.openshift.io/v1/clusteroperators": dial tcp <CLUSTER_IP>:6443: connect: connection refused
level=error msg=Bootstrap failed to complete: Get "https://api.<CLUSTER_URL>:6443/version": dial tcp <CLUSTER_IP>:6443: connect: connection refused
level=error msg=Failed waiting for Kubernetes API. This error usually happens when there is a problem on the bootstrap host that prevents creating a temporary control plane.

This error is related to the bootstrap virtual machine and usually happens when there is a problem on the bootstrap host that prevents creating a temporary Control Plane.

Solution

  1. Run the command to remove the cluster, leaving the --dir parameter the same.

    Removing the OKD cluster
    $ ./openshift-install destroy cluster --dir /tmp/openshift-cluster/cluster-state --log-level info
  2. Wait until the cluster is removed, then run the command to reinstall it.

    Reinstalling the cluster
    $ ./openshift-install create cluster --dir /tmp/openshift-cluster/cluster-state --log-level=info

7.2. Vault token error when deploying the Platform

Problem description

When deploying the Platform, during the Vault installation stage, an error may occur where the vault_root_token variable returns an empty value:

image

This error can be caused by Vault not starting successfully or skipping some of the Platform installation steps.

Solution

  1. Sign in to your AWS account and locate the platform-vault-<CLUSTER_NAME> virtual machine.

  2. Connect to the virtual machine using EC2 Instance Connect or SSH.

  3. Check the Vault status. The Initialized parameter must be set to true.

    Checking the Vault status
    $ vault status

    image

  4. If the status is different, restart Vault.

    Restarting Vault
    $ systemctl restart vault
  5. If this error occurred during the Platform update, check if the Vault key was copied from the previous release as described in Configuring Vault.

  6. Try running the Platform update process again as described in Updating the Platform from the Installer.

7.3. MinIO SSL certificate error when deploying the Platform

Problem description

When deploying the Platform, during the MinIO installation stage, the following error may occur:

image

Solution

  1. Go to the Installer directory and start the container for Platform installation using the following command:

    Running the container
    $ cd ~/installer/installer-<VERSION>
    $ sudo docker run -it --rm \
        --name control-plane-installer-<VERSION> \
        --user root:$(id -g) \
        --net host \
        -v $(pwd):/tmp/installer \
        --env KUBECONFIG=/tmp/installer/kubeconfig \
        --env idgovuaClientId=f90ab33dc272f047dc330c88e5663b75 \
        --env idgovuaClientSecret=cba49c104faac8c718e6daf3253bc55f2bf11d9e \
        --env deploymentMode=<DEPLOYMENT_MODE> control-plane-installer:<VERSION> bash
  2. Switch to the appropriate directory and set the environment variables.

    Setting the environment variables
    $ cd /tmp/installer/terraform/minio/aws
    $ export AWS_ACCESS_KEY_ID=$(oc get secret/aws-creds -n kube-system -o jsonpath='{.data.aws_access_key_id}' | base64 -d)
    $ export AWS_SECRET_ACCESS_KEY=$(oc get secret/aws-creds -n kube-system -o jsonpath='{.data.aws_secret_access_key}' | base64 -d)
    $ export CLUSTER_NAME=$(oc get node -l node-role.kubernetes.io/master -o 'jsonpath={.items[0].metadata.annotations.machine\.openshift\.io/machine}' | sed -r 's#.*/(.*)-master.*#\1#')
    $ export clusterNameShort="${CLUSTER_NAME::-6}"
    $ export baseDomain=$(oc get dns cluster --no-headers -o jsonpath='{.spec.baseDomain}')
    $ export route53HostedZone="${baseDomain/${clusterNameShort}./}"
  3. Remove MinIO using Terraform.

    Removing MinIO
    $ terraform init
    $ terraform destroy -var cluster_name="${clusterNameShort}" -var baseDomain="${route53HostedZone}" -auto-approve
  4. Wait until Minio is removed. Exit the container and retry the Platform installation process as described in Deploying the Platform from the Installer if you are deploying the platform from scratch, or Updating the Platform from the Installer, if you are updating the platform.

7.4. Error sending images to Nexus when deploying the Platform

Problem description

During Platform deployment, when sending images to Nexus, the following error may occur:

image

This error is related to skopeo, a tool that sends images to Nexus. If the image fails to load in 10 minutes, skopeo returns a timeout error.

Solution

Install the Platform from an additional virtual machine as described in Deploying additional resources to install an OKD cluster on AWS.