Initial loading of registry data
🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions. |
A PL/pgSQL database procedure is used for initial data loading.
Before you start the initial data loading, ensure all the prerequisite steps described in Preparing data for migration are performed. |
1. PL/pgSQL procedure specification for the Platform database
CREATE OR REPLACE PROCEDURE p_load_table_from_csv(
p_table_name text
, p_file_name text
, p_table_columns text[]
, p_target_table_columns text[] DEFAULT NULL::text[]
);
- Parameters description:
-
-
p_table_name
: Target table name. -
p_file_name
: Full path to the CSV data file. -
p_table_columns
: An array of field names in the data file (first row in a CSV file). -
p_target_table_columns
: An array of target table fields. Empty if the field list matches the fields in the CSV file.The elements of the
p_target_table_columns
parameter can be provided in different formats:-
<col_name>
: The field in the target table receives the value from the corresponding field (with the same name) in the CSV file. -
<col_name>::<some_calculation_based_on_source_cols>
: The field in the target table receives the value that is the result of the expression.For details, jump to the Complex scenario 1 call example. -
<col_name>::ref(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>)
: For this format, the system runs the following SQL query, which searches the directory table byid
:SELECT <ref_id> INTO <col_name> FROM <ref_table> WHERE <ref_col> = <lookup_col>;
A request like this is used when the field contains a foreign key reference to the directory, and the value in the CSV file is represented not by an identifier but by a descriptive field, but the values of this field must be unique.
For details, jump to Complex scenario 2 call example. -
<col_name>::ref_array(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>, delim:<delimeter_character>)
: This format differs from the previous one only in that the<lookup_col_name>
field in the CSV file contains an array of descriptive values. As a result, an array of matchingids
will be returned.
-
-
2. Procedure call examples
2.1. Simple scenario
CALL p_load_table_from_csv('research',
'D:\PostgreSQL\csv\uuid\01-Typy_doslidgen_uuid.csv', array
['uuid','research_type'], array['research_type']);
2.2. Complex scenario 1
CALL p_load_table_from_csv('koatuu','${dataLoadPath}03-KOATUU_simple.csv'
, array['code','category','name']
, array['code','category','name'
,'level1::substring(code,1,2)||''00000000'''
,'level2::substring(code,1,5)||''00000'''
,'type::CASE WHEN code ~ ''[0-9]{2}0{8}'' AND code !~ ''(80|85)0{8}'' THEN ''О''
WHEN code ~ ''[0-9]{2}2[0-9]{2}0{5}'' AND code !~ ''[0-9]{2}20{7}'' THEN ''Р''
WHEN coalesce(category, ''Р'') != ''Р''
OR code IN (SELECT DISTINCT substring(code,1,5)||''00000'' FROM koatuu_csv k2 WHERE category = ''Р'') AND category IS NULL
OR code ~ ''(80|85)0{8}'' THEN ''НП''
ELSE NULL END']
);
2.3. Complex scenario 2
CALL p_load_table_from_csv('staff','D:\PostgreSQL\csv\staff.csv'
,array['uuid','laboratory','staff_status','researches','full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
,array['laboratory_id::ref(lookup_col:laboratory,ref_table:laboratory,ref_col:name,ref_id:laboratory_id)'
,'staff_status_id::ref(lookup_col:staff_status,ref_table:staff_status,ref_col:name,ref_id:staff_status_id)'
,'researches::ref_array(lookup_col:researches,ref_table:research,ref_col:research_type,ref_id:research_id,delim:#)'
,'full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
);
3. Loading data into the database
To load data into the database, use the standard Liquibase functionality.
3.1. An example of an XML template for loading data
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.2.xsd
http://www.liquibase.org/xml/ns/dbchangelog-ext xsd/dbchangelog-ext.xsd">
<property name="dataLoadPath" value="D:\PostgreSQL\csv\"/>
<changeSet author="registry owner" id="load data to dictionaries">
<sql dbms="postgresql" endDelimiter=";" splitStatements="true" stripComments="true">
CALL p_load_table_from_csv('research','${dataLoadPath}01-Typy doslidgen.csv', array['code','research_type'], array['research_type']);
CALL p_load_table_from_csv('refusal_reason','${dataLoadPath}02-Prichiny vidmov.csv', array['code','document_type','name','constant_code'], array['document_type','name','constant_code']);
CALL p_load_table_from_csv('kopfg','${dataLoadPath}05-kopfg.csv', array['code','name']);
CALL p_load_table_from_csv('solution_type','${dataLoadPath}06-Typy rishen.csv', array['id','name','constant_code'], array['name','constant_code']);
CALL p_load_table_from_csv('factor','${dataLoadPath}14-Himichni_factory_OBRV.csv', array['code','name','col3','col4'], array['name','factor_type::''Хімічний: ОБРВ''']);
CALL p_load_table_from_csv('factor','${dataLoadPath}15-Himichni factory dovilni.csv', array['code','name'], array['name','factor_type::''Хімічний: довільні''']);
</sql>
</changeSet>
</databaseChangeLog>
For details on creating a physical data model of the registry using the Liquibase tool, see the Creating a physical data model section. |
4. File requirements for automatic uploads to the database
The files used to upload data into the database must have a clear structure.
File requirements for automatic uploads to the database:
-
The file format must be CSV.
-
The file name must contain only Latin letters.
-
Files must use UTF8 encoding.
-
The first row must contain field names.
-
Starting from the second row, the file must contain field values. Their structure must match the field names in the first row.
-
One row in the file must contain the values necessary to create a record in the database.
-
Values are separated by commas or semicolons.
-
The absence of a value (
NULL
) is conveyed by a missing value, not a space. For example:…value 1,,value 3…
-
The file must contain at least one unique business field (for example,
Name
) for directories. -
Use curly brackets
{
}
to denote arrays of descriptive values.
5. Recommendations for loading large amounts of data
If you need to load a large amount of data (over 1 million rows), we recommend temporarily changing the database configuration. Set the following values for the connection waiting time between the replica and the main database in the postgresql.conf
file:
wal_sender_timeout = 900s wal_receiver_timeout = 900s