Initial loading of registry data
🌐 This document is available in both English and Ukrainian. Use the language toggle in the top right corner to switch between versions. |
A PL/pgSQL database procedure is used for initial data loading.
Before you start the initial data loading, ensure all the prerequisite steps described in Preparing data for migration are performed. |
1. PL/pgSQL procedure specification for the Platform database
CREATE OR REPLACE PROCEDURE p_load_table_from_csv(
p_table_name text
, p_file_name text
, p_table_columns text[]
, p_target_table_columns text[] DEFAULT NULL::text[]
);
- Parameters description:
-
-
p_table_name
: Target table name. -
p_file_name
: Full path to the CSV data file. -
p_table_columns
: An array of field names in the data file (first row in a CSV file). -
p_target_table_columns
: An array of target table fields. Empty if the field list matches the fields in the CSV file.The elements of the
p_target_table_columns
parameter can be provided in different formats:-
<col_name>
: The field in the target table receives the value from the corresponding field (with the same name) in the CSV file. -
<col_name>::<some_calculation_based_on_source_cols>
: The field in the target table receives the value that is the result of the expression.For details, jump to the Complex scenario 1 call example. -
<col_name>::ref(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>)
: For this format, the system runs the following SQL query, which searches the directory table byid
:SELECT <ref_id> INTO <col_name> FROM <ref_table> WHERE <ref_col> = <lookup_col>;
A request like this is used when the field contains a foreign key reference to the directory, and the value in the CSV file is represented not by an identifier but by a descriptive field, but the values of this field must be unique.
For details, jump to Complex scenario 2 call example. -
<col_name>::ref_array(lookup_col:<lookup_col_name>, ref_table:<ref_table_name>, ref_col:<ref_col_name>, ref_id:<ref_id_col_name>, delim:<delimeter_character>)
: This format differs from the previous one only in that the<lookup_col_name>
field in the CSV file contains an array of descriptive values. As a result, an array of matchingids
will be returned.
-
-
2. Procedure call examples
2.1. Simple scenario
CALL p_load_table_from_csv('research',
'D:\PostgreSQL\csv\uuid\01-Typy_doslidgen_uuid.csv', array
['uuid','research_type'], array['research_type']);
2.2. Complex scenario 1
CALL p_load_table_from_csv('koatuu','${dataLoadPath}03-KOATUU_simple.csv'
, array['code','category','name']
, array['code','category','name'
,'level1::substring(code,1,2)||''00000000'''
,'level2::substring(code,1,5)||''00000'''
,'type::CASE WHEN code ~ ''[0-9]{2}0{8}'' AND code !~ ''(80|85)0{8}'' THEN ''О''
WHEN code ~ ''[0-9]{2}2[0-9]{2}0{5}'' AND code !~ ''[0-9]{2}20{7}'' THEN ''Р''
WHEN coalesce(category, ''Р'') != ''Р''
OR code IN (SELECT DISTINCT substring(code,1,5)||''00000'' FROM koatuu_csv k2 WHERE category = ''Р'') AND category IS NULL
OR code ~ ''(80|85)0{8}'' THEN ''НП''
ELSE NULL END']
);
2.3. Complex scenario 2
CALL p_load_table_from_csv('staff','D:\PostgreSQL\csv\staff.csv'
,array['uuid','laboratory','staff_status','researches','full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
,array['laboratory_id::ref(lookup_col:laboratory,ref_table:laboratory,ref_col:name,ref_id:laboratory_id)'
,'staff_status_id::ref(lookup_col:staff_status,ref_table:staff_status,ref_col:name,ref_id:staff_status_id)'
,'researches::ref_array(lookup_col:researches,ref_table:research,ref_col:research_type,ref_id:research_id,delim:#)'
,'full_name','hygienist_flag','full_time_flag','salary','fixed_term_contract_flag','contract_end_date','specialization_date','specialization_end_date','dismissal_flag','dismissal_date']
);
3. Loading data into the database
To load data into the database, use the standard Liquibase functionality.
3.1. An example of an XML template for loading data
<databaseChangeLog
xmlns="http://www.liquibase.org/xml/ns/dbchangelog"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:ext="http://www.liquibase.org/xml/ns/dbchangelog-ext"
xsi:schemaLocation="http://www.liquibase.org/xml/ns/dbchangelog http://www.liquibase.org/xml/ns/dbchangelog/dbchangelog-4.2.xsd
http://www.liquibase.org/xml/ns/dbchangelog-ext xsd/dbchangelog-ext.xsd">
<property name="dataLoadPath" value="D:\PostgreSQL\csv\"/>
<changeSet author="registry owner" id="load data to dictionaries">
<sql dbms="postgresql" endDelimiter=";" splitStatements="true" stripComments="true">
CALL p_load_table_from_csv('research','${dataLoadPath}01-Typy doslidgen.csv', array['code','research_type'], array['research_type']);
CALL p_load_table_from_csv('refusal_reason','${dataLoadPath}02-Prichiny vidmov.csv', array['code','document_type','name','constant_code'], array['document_type','name','constant_code']);
CALL p_load_table_from_csv('kopfg','${dataLoadPath}05-kopfg.csv', array['code','name']);
CALL p_load_table_from_csv('solution_type','${dataLoadPath}06-Typy rishen.csv', array['id','name','constant_code'], array['name','constant_code']);
CALL p_load_table_from_csv('factor','${dataLoadPath}14-Himichni_factory_OBRV.csv', array['code','name','col3','col4'], array['name','factor_type::''Хімічний: ОБРВ''']);
CALL p_load_table_from_csv('factor','${dataLoadPath}15-Himichni factory dovilni.csv', array['code','name'], array['name','factor_type::''Хімічний: довільні''']);
</sql>
</changeSet>
</databaseChangeLog>
For details on creating a physical data model of the registry using the Liquibase tool, see the Creating a physical data model section. |
4. File requirements for automatic uploads to the database
The files used to upload data into the database must have a clear structure.
File requirements for automatic uploads to the database:
-
The file format must be CSV.
-
The file name must contain only Latin letters.
-
Files must use UTF8 encoding.
-
The first row must contain field names.
-
Starting from the second row, the file must contain field values. Their structure must match the field names in the first row.
-
One row in the file must contain the values necessary to create a record in the database.
-
Values are separated by commas or semicolons.
-
The absence of a value (
NULL
) is conveyed by a missing value, not a space. For example:…value 1,,value 3…
-
The file must contain at least one unique business field (for example,
Name
) for directories. -
Use curly brackets
{
}
to denote arrays of descriptive values.
5. Recommendations for loading large amounts of data
To load large CSV files (tens or hundreds megabytes), standard SQL code can be used instead of the procedure. For proper registry operations, such SQL code should also create historical data (_hst
table) and populate metadata fields (ddm_
columns), essentially replicating the operations that the procedure would perform automatically. The following example demonstrates the correct and most efficient method to do this.
-- Create a temporary staging table
-- that matches the format of the CSV file.
CREATE TABLE account_csv_stage (username text, bank_number text);
-- Load data into the staging table from the CSV
COPY account_csv_stage (username,bank_number)
FROM '${dataLoadPath}account.csv'
WITH (HEADER, FORMAT CSV);
-- Insert data into the main and historical tables
WITH main_table_cte AS (
INSERT INTO account (
username
, bank_number
, ddm_created_by
, ddm_updated_by
)
SELECT username
, bank_number
, 'admin'
, 'admin'
FROM account_csv_stage
RETURNING *)
INSERT INTO account_hst (
id
, username
, bank_number
, ddm_created_by
, ddm_created_at
, ddm_dml_op
, ddm_system_id
, ddm_application_id
, ddm_business_process_id)
SELECT id
, username
, bank_number
, ddm_created_by
, CURRENT_TIMESTAMP
, 'I' as ddm_dml_op
, (SELECT ss.system_id
FROM ddm_source_system ss
WHERE ss.system_name ='initial load') ddm_system_id
, (SELECT sa.application_id
FROM ddm_source_application sa
WHERE sa.application_name ='initial load') ddm_application_id
, (SELECT sb.business_process_id
FROM ddm_source_business_process sb
WHERE sb.business_process_name ='initial load process') ddm_business_process_id
FROM main_table_cte;
-- Remove temporary staging table
DROP TABLE account_csv_stage;
This code lets you swap out the procedure call in the XML template used for loading data. Whenever you use this method to load a file, it’s a good idea to make a new changeset for it.