What is data modeling?
Types of data models, the data modeling process, & why data modeling is so important.
Data is the fuel that drives critical business decisions, but massive amounts of raw, unstructured data aren’t very useful to business end-users.
Think about it: A list of support ticket IDs isn’t overly helpful for the customer support team unless all they care about is the total number of tickets submitted. When that information can be analyzed in relation to things like first response time and average time of resolution, all of the data becomes far more valuable.
In other words, you need to understand the relational nature of data if you want to use it to make educated, data-driven decisions.
The process of mapping out relational rules for data is known as data modeling. It’s a complicated process from a technology standpoint, so that’s not what today’s article is about. Instead, we’re going to focus on everything a non-technical leader who’s responsible for data analytics and business intelligence (BI) needs to know about data modeling:
- What is data modeling?
- Why is data modeling important?
- Types of data models
- Types of data modeling
- How does data modeling impact data analytics?
What is data modeling?
Data modeling is the process of creating a visual representation of the data that’s stored inside of your data warehouse (or other data storage system).
It provides a visual description of your organization’s data, the relationships between different datasets, and the underlying business logic that ties it all together.
Data modeling forces companies to analyze, understand, and communicate their data requirements, which is the first step toward having clean, structured data that business end-users can use to drive action and influence decision-making.
Basically, data models bring structure to your data, which leads to improved data analytics. There’s more to it than that, though, so let’s explore exactly why data modeling is such an important part of any data analytics and business intelligence program.
Why is data modeling important?
The data modeling process requires organizations to document what data they have and how they plan to use it. It also organically imposes good data governance practices.
With data modeling, your organization also gets:
1. Higher quality data
Data modeling imposes structure on your data, which leads to improved data quality all around. You’ll have more consistent naming conventions, improved data governance practices, fewer errors, and greater data integrity. During the data modeling process, developers will also define rules to help monitor data quality.
Plus, the data modeling process also includes transformation processes that clean badly formatted data, remove unnecessary fields and records, and define data types.
A good data model defines the metadata so the data itself can be appropriately queried, understood, and reported on. It improves data quality by enforcing editing rules, field constraints, domain definitions, and integrity of relationships.
Data modeling also improves data quality through the early detection of errors – in many cases, data issues aren't discovered until the process is running. Data modeling builds an accurate view of how users interact with your business, which provides insights into where problems exist and how to best employ corrections.
2. Improved internal collaboration
Building data models is a forcing function that encourages a business to define exactly how data is generated and moved throughout applications, tools, and systems.
The process forces collaboration between business stakeholders and the data engineering team; the data team needs to understand the business's needs to create usable data models. As a result, the data modeling process leads to structured collaboration between engineering/data teams and business teams.
Data modeling also gives non-technical stakeholders an easy-to-understand overview of how data moves through the organization.
3. Reduced cost
When building data models, the data modeler needs to think about what data they need, its relations, the overall data architecture, and other important details. It’s a future-focused process that will help you avoid costly data reconstruction efforts down the road.
As your organization goes through the data modeling process, your data team will likely find errors, inconsistencies, and issues. This is great news – it’s much better to find out about problems sooner when they’re easier and more affordable to fix.
4. Enhanced system performance
A well-constructed database that’s full of clean, modeled data runs faster and performs better. This enables more efficient querying of your data (which also reduces costs).
5. More detailed documentation
It’s not uncommon for organizations to struggle with data-related documentation. Metrics change frequently, business objectives shift, and it’s all-too-easy to let proper documentation fall by the wayside.
The data modeling process naturally leads to better documentation because you’ll document what data you have, how the company uses it, and what your requirements are in regards to usage, security, and more.
6. Improved data analytics all-around
At the end of the data modeling process, you’ll have final tables that can then be used for analytics.
This is the biggest benefit of data modeling and the #1 reason it’s so important. Sure, your data is useful straight from the source, but it’s not necessarily understandable. The data modeling process is how you make sure your data is not only usable but also understandable.
Now, let’s review the different data model types and the different kinds of data modeling. Don’t sweat the details, though. If you’re a non-technical data analytics and BI leader, having a solid grasp of these basic concepts will be enough.
Types of data models
While there are many different ways to model data, there are three primary types of data models:
- Conceptual data models
- Logical data models
- Physical data models
Let's briefly go over these data models and their purposes.
1. Conceptual data models
Conceptual data models, also known as domain models, represent the overall structure and content of data without going into technical details. These models are the typical starting points for data modeling, identifying the data flow and the various data sets in an organization.
Generally, conceptual models paint a "big picture" of what the system will contain, how it will be organized, and which rules are involved. These models provide a blueprint for developing logical and physical data models.
2. Logical data models
Logical data models are based on the conceptual data models and define the project's data elements and relationships.
These models identify individual entries in a database as well as their core attributes. For example, in a logical ecommerce model, the products are identified through product ID with attributes like category, description, and unit price.
Data scientists and business analysts use the logical data model when implementing a database management system.
3. Physical data models
Things start to get more technical with a physical data model – this is the version of your logical data model that will actually be implemented by the data team.
A physical data model provides an internal schema for how the data will be stored within a data storage system. It specifies the type of data you'll store along with the technical data requirements.
These models are also specifically designed for the data storage system they’ll be used in. If multiple data warehouses or data storage systems will be used, you may have multiple physical data models for a single logical data model.
Types of data modeling
There are also three main types of data modeling in use by the majority of organizations:
- Relational data modeling
- Dimensional data modeling
- Entity-relationship (ER) data modeling
These aren’t the only kinds of data modeling, but they are the most common. There are plenty of other data modeling techniques that are not in wide use, including network, hierarchical, multi-value, and object-oriented data modeling.
We’re not going to worry about those right now, though.
1. Relational data modeling
In a relational data model, data assets are stored in tables with specific elements pointing to information in other tables. Relational databases utilize the Structured Query Language (SQL) for accessing and managing data.
This data modeling type is commonly used in point-of-sale (POS) systems and other kinds of information processing.
2. Dimensional data modeling
The dimensional data modeling structure is optimized for online queries and data warehousing. While relational database structure emphasizes efficient storage, dimensional models increase redundancy to make retrieving information for reporting purposes easier.
3. Entity-relationship (ER) data modeling
This model shows the relationship between different entities in a graphical format (aka boxes of different shapes and sizes to represent various “entities” and lines to represent “relationships”). An entity can be anything — data, a concept, an object, etc. A relationship is an association or dependency.
Once the ER model is finished, it’s used to create a relational database where each row represents an entity and each field in the row contains attributes.
How does data modeling impact data analytics?
Data modeling and data analytics are two peas in a pod. If you want your data analytics program to pack a punch, you need quality data models.
As we mentioned earlier, the data modeling process is a forcing function that brings everyone to the table. It forces key business stakeholders to collaboratively outline their goals and to think critically about how data will help them meet those goals.
From an efficiency perspective, data models also optimize your analytics performance so that things run smoothly, no matter how much your data volume and complexity increase.
Here’s the TLDR; If your organization wants to get the most out of its data analytics and BI efforts, it needs well-modeled data.
Data modeling will help you develop an analytics program that’s functional, valuable, and accurate, and you’ll be one step closer to solving the data adoption problem. Modeled data also equals higher quality data, enhanced data governance, improved cross-functional collaboration, and lower overall costs.
Now that you understand what data modeling is and why it’s an important part of your overall data analytics and business intelligence program, it’s time to get tactical. Stay tuned for our next post, where you’ll learn how to design your company’s foundational data models.