Relational database management subsystem

🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions.

1. Overview

The Relational database management subsystem provides reliable and scalable data storage for registry services. The subsystem uses Crunchy PGO to manage PostgreSQL clusters.

The subsystem provides user and schema management, supported by Liquibase's database schema version control. In addition, the subsystem supports geographic objects and geolocation queries using PostGIS, which enables analyzing and visualizing geospatial data.

The subsystem is designed to with scaling requirements in mind, allowing PostgreSQL clusters to scale horizontally to meet changing needs. In the event of data loss or corruption, the subsystem includes powerful backup and recovery features, with scheduled automatic backups and point-in-time recovery.

The subsystem has built-in monitoring and logging capabilities, with real-time metrics via Prometheus and Grafana.

Registry services use this subsystem to store their service data and registry data.

2. Subsystem functions

Storage of registry data, including:
- Business data
- Business data change history
- Audit event logs
- Global settings
- User-specific settings
- Business process execution state
Data integrity control
Support for geographic objects and geolocation queries
Data access control, authentication, and authorization
Backup and recovery, including point-in-time recovery (PITR)
Horizontal scaling
Comprehensive analytics and reporting

3. Technical design

The Relational database management subsystem consists of two database management system (DBMS) instances: Operational Instance and Analytical Instance. The Operational Instance contains services and registry databases, while the Analytical Instance contains the registry analytical database populated by replicating the Operational Instance’s registry database.

The replication process is implemented using Postgres logical replication, which is managed and configured by the Registry deployment subsystem.

The Analytical Instance is primarily used by the Registry analytical reporting subsystem for comprehensive analytics and reporting on the registry database data.

Each instance consists of the components presented in the following diagram:

The PostgresCluster instance component represents a PostgreSQL cluster managed by the Crunchy Postgres operator. This operator is responsible for managing the following resources:
- PostgreSQL primary pod: The primary PostgreSQL database pod used to write data and perform transactions.
- PostgreSQL replica pods: Additional PostgreSQL database read-only replica pods. Created only when scaling to two or more pods.
- PostgresCluster instance services: Kubernetes services that provide load balancing and a stable IP address and domain name.
- Backup settings: Configuration options that define the database backup process and storage policy.
- Metrics export settings: Configuration options that determine which database metrics to export for monitoring and analysis.
The PgPool II component is an intermediate layer between applications and the PostgreSQL cluster. It provides transparent and automatic separation of read and write requests, redirecting them to the corresponding services of the PostgresCluster instance. This component and its service are managed directly without the help of an operator.

To save resources, the PgPool II pod and the pool service are not created for the Analytical Instance of the DBMS since none of the services working with it require automatic separation of read and write requests.

Access to the DBMS instance is provided through the following services:

Primary: Directs requests to the primary pod, which handles database reads and writes. Service applications that need to make changes to the database but don’t require load balancing use this service to connect to the database — for example, the Database object management task.
Replica: Balances requests between replica pods, which provide read-only access to the database. Currently, only PgPool II is used to automatically redirect read requests.
Pods: Balances requests between all pods, primary and replica. It is used by applications that only perform database reads — for example, the Registry analytical reporting subsystem.
Pool: Directs requests to PgPool II, which automatically balances read requests between all pods and directs write requests to the primary pod. It is used by applications that perform both database reads and writes and require load balancing, which includes most of the registry applications.

In a deployed environment, resource names are prefixed with the name of the DBMS instance — for example, operational-pool or analytical-pods.

4. Subsystem components

Component name Registry representation Source Repository Function

Component name	Registry representation	Source	Repository	Function
Operational DBMS Instance	`operational` `operational-pool`	3rd-party	github:/epam/edp-ddm-registry-postgres	A DBMS instance that processes operational service requests. Contains the services and registry operational databases.
Analytical DBMS Instance	`analytical`	3rd-party	A DBMS instance that processes analytical requests of the analytical reporting subsystem. Contains the registry analytical database.
Database object management task	`run-db-scripts-job`	origin	A component responsible for creating and updating databases, users, and DB service schemas.
Crunchy Postgres Operator	`pgo` `pgo-upgrade`	3rd-party	A tool responsible for deploying and configuring PostgreSQL cluster instances.

Operational DBMS Instance

operational
operational-pool

3rd-party

github:/epam/edp-ddm-registry-postgres

A DBMS instance that processes operational service requests. Contains the services and registry operational databases.

Analytical DBMS Instance

analytical

3rd-party

A DBMS instance that processes analytical requests of the analytical reporting subsystem. Contains the registry analytical database.

Database object management task

run-db-scripts-job

origin

A component responsible for creating and updating databases, users, and DB service schemas.

Crunchy Postgres Operator

pgo
pgo-upgrade

3rd-party

A tool responsible for deploying and configuring PostgreSQL cluster instances.

5. Technological stack

The following technologies were used when designing and developing the subsystem:

6. Subsystem quality attributes

6.1. Scalability

The Relational database management subsystem supports vertical and horizontal scaling in case of increased load by allocating additional resources for subsystem pods or using the horizontal scaling mechanisms.

6.2. Security

The Relational database management subsystem protects the cross-service communication channel using SSL/TLS traffic encryption. It also supports SSL/TLS traffic encryption when interacting with other subsystems.

Network policies configured for the subsystem allow network interaction only with the whitelisted services.

Separate database users are created for each service. They are given the minimal privileges required for work.

Data is stored in the Distributed data storage subsystem using its security capabilities.

6.3. Performance

High performance of the subsystem is achieved through the following:

Utilizing best practices in database modeling.
Using a separate database for analytical workloads.
Horizontal scaling.

6.4. Observability

The Relational database management subsystem logs incoming requests and collects performance metrics for analysis through the web interfaces of respective Platform subsystems.

The architecture includes the use of Postgres exporter and the Event monitoring and notification subsystem for monitoring and visualizing PostgreSQL database metrics.

Postgres exporter is a tool that collects metrics from the PostgreSQL server and outputs them in a format usable by the Prometheus service of the Event monitoring and notification subsystem.

Postgres exporter is installed on the PostgreSQL server and configured to collect the database metrics. In turn, the Event monitoring and notification subsystem has dashboards that visualize these metrics.

For details on the subsystem design, see:

6.5. Reliability

Reliability of the Relational database management subsystem is ensured by built-in backup and recovery features, with automatic backup schedules and point-in-time recovery.

In addition to internal backup mechanisms, the Backup and restore subsystem includes database file system backups.