Initial loading of registry data

On this page:

1. PL/pgSQL procedure specification for the Platform database
2. Procedure call examples
3. Loading data into the database
- 3.1. An example of an XML template for loading data
4. File requirements for automatic uploads to the database
5. Recommendations for loading large amounts of data

🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions.

A PL/pgSQL database procedure is used for initial data loading.

Before you start the initial data loading, ensure all the prerequisite steps described in Preparing data for migration are performed.

1. PL/pgSQL procedure specification for the Platform database

Procedure for initial loading

CREATE OR REPLACE PROCEDURE p_load_table_from_csv(
p_table_name text
, p_file_name text
, p_table_columns text[]
, p_target_table_columns text[] DEFAULT NULL::text[]
);

Parameters description:

p_table_name: Target table name.
p_file_name: Full path to the CSV data file.
p_table_columns: An array of field names in the data file (first row in a CSV file).
p_target_table_columns: An array of target table fields. Empty if the field list matches the fields in the CSV file.

The elements of the p_target_table_columns parameter can be provided in different formats:
- <col_name>: The field in the target table receives the value from the corresponding field (with the same name) in the CSV file.
- <col_name>::<some_calculation_based_on_source_cols>: The field in the target table receives the value that is the result of the expression.
  
  For details, jump to the Complex scenario 1 call example.
- <col_name>::ref(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>): For this format, the system runs the following SQL query, which searches the directory table by id:
  
  SELECT <ref_id> INTO <col_name> FROM <ref_table> WHERE <ref_col> = <lookup_col>;
  
  A request like this is used when the field contains a foreign key reference to the directory, and the value in the CSV file is represented not by an identifier but by a descriptive field, but the values of this field must be unique.
  
  For details, jump to Complex scenario 2 call example.
- <col_name>::ref_array(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>, delim:<delimeter_character>): This format differs from the previous one only in that the <lookup_col_name> field in the CSV file contains an array of descriptive values. As a result, an array of matching ids will be returned.

2. Procedure call examples

2.1. Simple scenario

CALL p_load_table_from_csv('research',
'D:\PostgreSQL\csv\uuid\01-Typy_doslidgen_uuid.csv', array
['uuid','research_type'], array['research_type']);

2.2. Complex scenario 1

CALL p_load_table_from_csv('koatuu','${dataLoadPath}03-KOATUU_simple.csv'
, array['code','category','name']
, array['code','category','name'
,'level1::substring(code,1,2)||''00000000'''
,'level2::substring(code,1,5)||''00000'''
,'type::CASE WHEN code ~ ''[0-9]{2}0{8}'' AND code !~ ''(80|85)0{8}'' THEN ''О''
            WHEN code ~ ''[0-9]{2}2[0-9]{2}0{5}'' AND code !~ ''[0-9]{2}20{7}'' THEN ''Р''
            WHEN coalesce(category, ''Р'') != ''Р''
            OR code IN (SELECT DISTINCT substring(code,1,5)||''00000'' FROM koatuu_csv k2 WHERE category = ''Р'') AND category IS NULL
            OR code ~ ''(80|85)0{8}'' THEN ''НП''
            ELSE NULL END']
);

2.3. Complex scenario 2

CALL p_load_table_from_csv('staff','D:\PostgreSQL\csv\staff.csv'

,array['uuid','laboratory','staff_status','researches','full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
  ,array['laboratory_id::ref(lookup_col:laboratory,ref_table:laboratory,ref_col:name,ref_id:laboratory_id)'
        ,'staff_status_id::ref(lookup_col:staff_status,ref_table:staff_status,ref_col:name,ref_id:staff_status_id)'
        ,'researches::ref_array(lookup_col:researches,ref_table:research,ref_col:research_type,ref_id:research_id,delim:#)'
        ,'full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
     );

3. Loading data into the database

To load data into the database, use the standard Liquibase functionality.

3.1. An example of an XML template for loading data

<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.2.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext xsd/dbchangelog-ext.xsd">

    <property  name="dataLoadPath"  value="D:\PostgreSQL\csv\"/>

    <changeSet author="registry owner" id="load data to dictionaries">
        <sql dbms="postgresql" endDelimiter=";" splitStatements="true" stripComments="true">
            CALL p_load_table_from_csv('research','${dataLoadPath}01-Typy doslidgen.csv', array['code','research_type'], array['research_type']);
            CALL p_load_table_from_csv('refusal_reason','${dataLoadPath}02-Prichiny vidmov.csv', array['code','document_type','name','constant_code'], array['document_type','name','constant_code']);
            CALL p_load_table_from_csv('kopfg','${dataLoadPath}05-kopfg.csv', array['code','name']);
            CALL p_load_table_from_csv('solution_type','${dataLoadPath}06-Typy rishen.csv', array['id','name','constant_code'], array['name','constant_code']);
            CALL p_load_table_from_csv('factor','${dataLoadPath}14-Himichni_factory_OBRV.csv', array['code','name','col3','col4'], array['name','factor_type::''Хімічний: ОБРВ''']);
            CALL p_load_table_from_csv('factor','${dataLoadPath}15-Himichni factory dovilni.csv', array['code','name'], array['name','factor_type::''Хімічний: довільні''']);
        </sql>
  </changeSet>

</databaseChangeLog>

For details on creating a physical data model of the registry using the Liquibase tool, see the Creating a physical data model section.

4. File requirements for automatic uploads to the database

The files used to upload data into the database must have a clear structure.

File requirements for automatic uploads to the database:

The file format must be CSV.
The file name must contain only Latin letters.
Files must use UTF8 encoding.
The first row must contain field names.
Starting from the second row, the file must contain field values. Their structure must match the field names in the first row.
One row in the file must contain the values necessary to create a record in the database.
Values are separated by commas or semicolons.
The absence of a value (NULL) is conveyed by a missing value, not a space. For example:

…value 1,,value 3…
The file must contain at least one unique business field (for example, Name) for directories.
Use curly brackets { } to denote arrays of descriptive values.

5. Recommendations for loading large amounts of data

To load large CSV files (tens or hundreds megabytes), standard SQL code can be used instead of the procedure. For proper registry operations, such SQL code should also create historical data (_hst table) and populate metadata fields (ddm_ columns), essentially replicating the operations that the procedure would perform automatically. The following example demonstrates the correct and most efficient method to do this.

Data load using SQL

-- Create a temporary staging table
-- that matches the format of the CSV file.
CREATE TABLE account_csv_stage (username text, bank_number text);

-- Load data into the staging table from the CSV
COPY account_csv_stage (username,bank_number)
FROM '${dataLoadPath}account.csv'
WITH (HEADER, FORMAT CSV);

-- Insert data into the main and historical tables
WITH main_table_cte AS (
    INSERT INTO account (
        username
      , bank_number
      , ddm_created_by
      , ddm_updated_by
      )
    SELECT username
         , bank_number
         , 'admin'
         , 'admin'
    FROM account_csv_stage
    RETURNING *)
INSERT INTO account_hst (
      id
    , username
    , bank_number
    , ddm_created_by
    , ddm_created_at
    , ddm_dml_op
    , ddm_system_id
    , ddm_application_id
    , ddm_business_process_id)
SELECT 	  id
        , username
        , bank_number
        , ddm_created_by
        , CURRENT_TIMESTAMP
        , 'I' as ddm_dml_op
        , (SELECT ss.system_id
             FROM ddm_source_system ss
            WHERE ss.system_name ='initial load') ddm_system_id
        , (SELECT sa.application_id
             FROM ddm_source_application sa
            WHERE sa.application_name ='initial load') ddm_application_id
        , (SELECT sb.business_process_id
             FROM ddm_source_business_process sb
            WHERE sb.business_process_name ='initial load process') ddm_business_process_id
FROM main_table_cte;

-- Remove temporary staging table
DROP TABLE account_csv_stage;

This code lets you swap out the procedure call in the XML template used for loading data. Whenever you use this method to load a file, it’s a good idea to make a new changeset for it.