Initial loading of registry data

On this page:

1. PL/pgSQL procedure specification for the Platform database
2. Procedure call examples
3. Loading data into the database
- 3.1. An example of an XML template for loading data
4. File requirements for automatic uploads to the database
5. Recommendations for loading large amounts of data

🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions.

A PL/pgSQL database procedure is used for initial data loading.

Before you start the initial data loading, ensure all the prerequisite steps described in Preparing data for migration are performed.

1. PL/pgSQL procedure specification for the Platform database

Procedure for initial loading

CREATE OR REPLACE PROCEDURE p_load_table_from_csv(
p_table_name text
, p_file_name text
, p_table_columns text[]
, p_target_table_columns text[] DEFAULT NULL::text[]
);

Parameters description:

p_table_name: Target table name.
p_file_name: Full path to the CSV data file.
p_table_columns: An array of field names in the data file (first row in a CSV file).
p_target_table_columns: An array of target table fields. Empty if the field list matches the fields in the CSV file.

The elements of the p_target_table_columns parameter can be provided in different formats:
- <col_name>: The field in the target table receives the value from the corresponding field (with the same name) in the CSV file.
- <col_name>::<some_calculation_based_on_source_cols>: The field in the target table receives the value that is the result of the expression.
  
  For details, jump to the Complex scenario 1 call example.
- <col_name>::ref(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>): For this format, the system runs the following SQL query, which searches the directory table by id:
  
  SELECT <ref_id> INTO <col_name> FROM <ref_table> WHERE <ref_col> = <lookup_col>;
  
  A request like this is used when the field contains a foreign key reference to the directory, and the value in the CSV file is represented not by an identifier but by a descriptive field, but the values of this field must be unique.
  
  For details, jump to Complex scenario 2 call example.
- <col_name>::ref_array(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>, delim:<delimeter_character>): This format differs from the previous one only in that the <lookup_col_name> field in the CSV file contains an array of descriptive values. As a result, an array of matching ids will be returned.

2. Procedure call examples

2.1. Simple scenario

CALL p_load_table_from_csv('research',
'D:\PostgreSQL\csv\uuid\01-Typy_doslidgen_uuid.csv', array
['uuid','research_type'], array['research_type']);

2.2. Complex scenario 1

CALL p_load_table_from_csv('koatuu','${dataLoadPath}03-KOATUU_simple.csv'
, array['code','category','name']
, array['code','category','name'
,'level1::substring(code,1,2)||''00000000'''
,'level2::substring(code,1,5)||''00000'''
,'type::CASE WHEN code ~ ''[0-9]{2}0{8}'' AND code !~ ''(80|85)0{8}'' THEN ''О''
            WHEN code ~ ''[0-9]{2}2[0-9]{2}0{5}'' AND code !~ ''[0-9]{2}20{7}'' THEN ''Р''
            WHEN coalesce(category, ''Р'') != ''Р''
            OR code IN (SELECT DISTINCT substring(code,1,5)||''00000'' FROM koatuu_csv k2 WHERE category = ''Р'') AND category IS NULL
            OR code ~ ''(80|85)0{8}'' THEN ''НП''
            ELSE NULL END']
);

2.3. Complex scenario 2

CALL p_load_table_from_csv('staff','D:\PostgreSQL\csv\staff.csv'

,array['uuid','laboratory','staff_status','researches','full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
  ,array['laboratory_id::ref(lookup_col:laboratory,ref_table:laboratory,ref_col:name,ref_id:laboratory_id)'
        ,'staff_status_id::ref(lookup_col:staff_status,ref_table:staff_status,ref_col:name,ref_id:staff_status_id)'
        ,'researches::ref_array(lookup_col:researches,ref_table:research,ref_col:research_type,ref_id:research_id,delim:#)'
        ,'full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
     );

3. Loading data into the database

To load data into the database, use the standard Liquibase functionality.

3.1. An example of an XML template for loading data

<databaseChangeLog
        xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
        xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.2.xsd
        http://www.liquibase.org/xml/ns/dbchangelog-ext xsd/dbchangelog-ext.xsd">

    <property  name="dataLoadPath"  value="D:\PostgreSQL\csv\"/>

    <changeSet author="registry owner" id="load data to dictionaries">
        <sql dbms="postgresql" endDelimiter=";" splitStatements="true" stripComments="true">
            CALL p_load_table_from_csv('research','${dataLoadPath}01-Typy doslidgen.csv', array['code','research_type'], array['research_type']);
            CALL p_load_table_from_csv('refusal_reason','${dataLoadPath}02-Prichiny vidmov.csv', array['code','document_type','name','constant_code'], array['document_type','name','constant_code']);
            CALL p_load_table_from_csv('kopfg','${dataLoadPath}05-kopfg.csv', array['code','name']);
            CALL p_load_table_from_csv('solution_type','${dataLoadPath}06-Typy rishen.csv', array['id','name','constant_code'], array['name','constant_code']);
            CALL p_load_table_from_csv('factor','${dataLoadPath}14-Himichni_factory_OBRV.csv', array['code','name','col3','col4'], array['name','factor_type::''Хімічний: ОБРВ''']);
            CALL p_load_table_from_csv('factor','${dataLoadPath}15-Himichni factory dovilni.csv', array['code','name'], array['name','factor_type::''Хімічний: довільні''']);
        </sql>
  </changeSet>

</databaseChangeLog>

For details on creating a physical data model of the registry using the Liquibase tool, see the Creating a physical data model section.

4. File requirements for automatic uploads to the database

The files used to upload data into the database must have a clear structure.

File requirements for automatic uploads to the database:

The file format must be CSV.
The file name must contain only Latin letters.
Files must use UTF8 encoding.
The first row must contain field names.
Starting from the second row, the file must contain field values. Their structure must match the field names in the first row.
One row in the file must contain the values necessary to create a record in the database.
Values are separated by commas or semicolons.
The absence of a value (NULL) is conveyed by a missing value, not a space. For example:

…value 1,,value 3…
The file must contain at least one unique business field (for example, Name) for directories.
Use curly brackets { } to denote arrays of descriptive values.

5. Recommendations for loading large amounts of data

If you need to load a large amount of data (over 1 million rows), we recommend temporarily changing the database configuration. Set the following values for the connection waiting time between the replica and the main database in the postgresql.conf file:

wal_sender_timeout = 900s
wal_receiver_timeout = 900s