Creating a logical data model

1. Data modeling concept

Data modeling is the process of defining and analyzing the data requirements necessary to support business processes within the respective information system. Data modeling typically involves professional data modelers working closely with business stakeholders and potential users of the information system.

The general data modeling process is presented in the diagram below.

datamodel concept flow

The diagram illustrates the way data models are developed and used today.

2. "As is" data model analysis

This section covers the principles and approaches to describing the "as is" data model and aligning it between the analysts and data developers.

Data models are typically designed and developed in the following main iterations:

  • analyzing raw data

  • creating a logical model

  • creating a physical model

  • going into production

The following diagram presents these iterations:

datamodel flow 2

2.1. Raw data analysis

The first step to building a data model is to analyze raw data using the following criteria:

  • data sources (regulatory documents, state standards, other registries)

  • data constraints (validation, calculation rules, and so on)

  • data inconsistencies and gaps (errors related to manual input, outdated data, and so on)

    • source values correction (by data owners)

As a result, the analyst should gain an understanding of the subject domain, homogeneous objects (entities), and their relationships.

2.2. Creating a logical model

The logical data model describes the concepts of the subject domain, their relationships, and data constraints imposed by the domain. A project analyst creates these models to define the elements and functionality of the system to be implemented.

Building a logical model is an iterative process that includes the following steps:

  • Visualizing the entities.

  • Defining the attributes (data types that belong to an entity).

  • Defining relationships between entities.

The rules and constraints of relationships between the entities are described using different types of relationships: for example, "one-to-one," "one-to-many," or "many-to-many." The relationship type is specified according to the entity-relationship model (ER model). For details, refer to this Wikipedia article: Entity–relationship model

2.3. Creating a physical model

The physical data model depends on the specific database management system (DBMS). The physical model contains information about all the database objects. Since there are no common standards for database objects (for example, there is no standard for data types), the physical model depends on the specific DBMS implementation. Therefore, a single logical model can correspond to several different physical models.

While the logical model does not specify the exact data type an attribute should have, it is essential for the physical model to describe all the properties of specific physical objects, such as tables, columns, relationships between entities, indices, procedures, functions, and so on.

If you plan to upload raw data, you need to work out the format of the upload files and ensure they are filled out correctly between the stages of validating the logical model and developing the physical model.

2.4. Going into production

Going into production contains the following stages:

  • Deployment, the engineering part. Includes completing all Jenkins jobs to deploy the created model.

  • Finalizing the job.

3. Designing a "to be" data model

Data represents information in a formalized form suitable for transmission, communication, or processing. In simple terms, data is information organized according to specific rules.

At this stage, it is necessary to develop the "to be" data model of the registry database as thoroughly as possible.

The logical level of modeling refines the conceptual model by transforming it into a logical diagram where previously identified entities, attributes, and relationships are represented according to the modeling rules of a particular database type or even a specific DBMS.

A properly developed logical model must adequately represent the subject domain, which should enable the registry to handle all the operations objectively required in the real-world scenarios for which the registry is intended.