Conceptual, Logical, Physical - the Truth(*) About Data Models
(*)my personal, extremely opinionated view, which just seems to make a lot of practical sense
Well, here we go - we’re finally broaching the subject on data modeling on this Substack. It was always inevitable: data modeling is my Thing, and has been for at least the last dozen-or-so years. This is going to be a bit longer long-form piece, but that is to be expected when it comes to data models!
There’s nothing new in data modeling. It’s been around since, what, the 1970s? The reason why it’s now a big topic of discussion on data-LinkedIn and various conferences is that for a few years, we as an industry forgot about data modeling, and now we’re starting to realize we need to bring it back.
I’m not going to a deep history lesson here; others have done it way better than me (you can, for example, go read Joe Reis’s excellent Practical Data Modeling for more detailed discussions on this and other related topics). Suffice to say, the big data craze killed all design thinking in data, and then we realized no-one can find or understand anything anymore, and then later LLMs appeared and started requiring context around data. And suddenly, we needed data models again!
Instead of a history lesson, what I want to do is to present my extremely simple way of thinking about the levels of data models. See, a huge problem in the industry is that we lack context not just about the data we use, but also about the very terms we use to describe our work. “Data model” has come to mean all kinds of different things to different people1.
One trigger for this post being made right now was a recent discussion about data architectures (and more specifically, the Medallion architecture pattern) by Chris Gambill, Ramona C Truta, and Shane Gibson. You can see the session here on YouTube. During the discussion, there was some talk about conceptual modeling patterns, and Ramona rightly pointed out that Shane was in fact not talking about conceptual modeling at all when he was listing various methods like Data Vault and Dimensional modeling. This is true! It’s a common mixup - and when even Shane, a globally respected expert and a good mate of mine, gets this mixed up, there’s a need for some clarifying content.2
So I’ll set it right! Right here, right now, the one and only Truth you’ll ever need to hear about Conceptual, Logical, and Physical data models.
What is a “data model” in the first place?
Alright, I guess since I called out dbt Labs for terminological crimes in the footnote above, I have to start here.
You’ll find a million definitions for “data model” online. Joe has a good article about the various definitions in his Substack, and he ends up providing the following definition:
A data model is a structured representation that organizes and standardizes data to enable and guide human and machine behavior, inform decision-making, and facilitate actions.
That’s very good and comprehensive, and in my opinion one of the best overall definitions I’ve seen on the matter.
But there’s still a question of “map vs. territory”, or if you’re more artistically inclined, it’s the whole thing about Magritte’s The Treachery of Images. That is to say, are we talking about the artefact that is being modeled, or the model itself?
Without going too deeply into the implications of Magritte, Plato’s cave, Kant’s das Ding an sich, and the whole Wittgenstein spiel on data and analytics work3, I’ll just jump straight into how I think4 about this.
For me, data modeling is a design activity5. The outcome of that activity, i.e. the data model, is a design. It’s a design for something that might or might not have been created yet. That is to say, a data model (the design) can exist without any technological artefacts.
The reverse, however, is not true! If you have a data solution at hand, made of whatever technical components, you will have a design - i.e. you have a data model.
“But wait!” you say now. “I’ve been vibecoding my reporting solutions all day and I’ve not once wasted time on drawing boxes and lines, it’s all delivery all the time baby!”
No, you still have a data model: implicitly, in your head. You just didn’t do conscious and deliberate design, and you didn’t document your design decisions - but you did design all the same (or maybe the LLM assistant did it for you). Everyone has a data model, but some of us have a deliberately designed one. And even better, some of us have our designs documented for future (re)use and reference.
So, all in all, for me a data model is the design more than the artefact that is being designed6. This is important to understand, because I’ve seen many times in this business that for many of the more technically-oriented people among us, it’s the other way around - they will point at a bunch of tables or other technical objects and call that a “data model”.
This confusion is, I believe, largerly caused by the fact that many of these technically-oriented people only see and know data modeling at the physical level, which is the closest to the actual implementation. But that’s just one level - and in my opinion, the least valuable and interesting one!
Three levels of data modeling
Now that we got the philosophical part out of the way, we can start slicing and dicing data modeling into sub-categories.
What follows is how I see things. It’s not necessarily a very fancy or theoretically robust approach. You’ll find different categorizations with different numbers of levels or different names for them online, and that’s fine. I’ll cover some of the most common arguments and questions in a FAQ below. But I’ve seen over many years that the way I do this is both a) relatively easy for people to understand and b) practically beneficial. It just makes sense, which is good enough for me.
With that disclaimer, let’s dive in.
Conceptual models: understanding the business
Conceptual is the highest level in terms of abstraction. It’s also closest to the business - in fact, it has absolutely nothing to do with any technologies or databases or frankly anything related to IT.
I would actually consider conceptual modeling to be more of a business analysis activity rather than a technical design activity.
The only thing we want to achieve with a conceptual model is the following:
Understand and document the business entities that exist in the real life, and map how they are connected to each other.
There’s nothing else to it.
We can formulate this in various ways. My dear friend and mentor in data modeling7, Alec Sharp, defines this level of modeling as being
a description of the business in terms of the things it needs to know about.
See that? A description of the business, not of a database. The key thing in understanding conceptual modeling is that it’s about modeling the business; our scope is a slice of reality. We may choose that slice of reality according to a business process, a particular problem space, or even the business area that is covered by a specific system, but at the conceptual level we are never modeling a system directly.
There are different methods for performing the actual modeling part, or as I like to think, for fishing out the information we need from the business. You could use something like BEAM (by Lawrence Corr) or ELM (by Hans Hultgren & Remco Broekmans) or various other facilitation methods or templates. But the end result doesn’t change: you need to get entities and their relationships.
In short, the following statements define conceptual modeling for me:
It only has business entities and their relations.8
Business entities are named after actual things that exist in the real world (e.g. “Customer”, “Order”, “Product”, “Delivery”).9
Relations are named after the verbs that describe the association between the entities (e.g. Customer “makes” an Order).10
From this follows an interesting and extremely valuable benefit, once you know how to utilize it: conceptual models are infinitely reusable. If your business doesn’t change, the conceptual model will stay valid over time.
Another important thing to keep in mind for the future (yes, there will obviously be more writing on this later) is that the conceptual model is where we really document semantics. A properly made conceptual model connects business entities to each other - this results in the basic elements of an ontology being captured. But more on that another time!
Logical modeling: optimizing for the use case
Whereas the scope of the conceptual model was a slice of reality, the scope of a logical model is a use case. We have some end result in mind, for which we create a suitable design. The process of turning the general conceptual model into an optimal structure for a specific use case is logical modeling.
The key thing about logical modeling is that its shape matters. When we do logical modeling, we decide what is the “target shape” we want to aim for. Different shapes are good for different kinds of use cases.
And what are these shapes? Well, these are the various modeling methods you hear so much about!
Dimensional, Data Vault, One Big Table, Activity Schema, 3rd Normal Form…
These are all different options you have for a logical model. No single shape/method works for every use case; you have to know, as Joe Reis says, Mixed Modeling Arts. Say, you’re designing a solution for BI dashboards. A dimensional model is a good choice for that - so that’s the target shape you’ll go for. But for some AI/ML model feature store, perhaps you’ll go for a OBT shape.
Whatever the target shape, the logical model should be derived from the conceptual model. From one conceptual model, you can create an endless number of logical models of all kinds of shapes - but you need to be able to map them back to your concepts, otherwise you’ll lose the plot on what your data means.
If you’re doing a dimensional model, you need to be able to point at the conceptual model and say “I’m going to make a fact out of that entity, and dimensions out of those entities”. Perhaps the mapping is more complex in some cases, but it must exist nevertheless. This is the only way to maintain semantic mapping between the business entities and your solution design at the logical level.
Practically, what you’ll do is also documenting attributes and keys (primary key / foreign key). In effect, the level of detail in the model increases as you move from the conceptual model to the logical model.

Physical modeling: optimizing for the technology
A well-made logical model gives you an implementation instruction: “we need this data to be structured like this, so that it’s suitable for our use case”. Something that the logical model doesn’t take into account, however, is technology-specific optimization.
The physical level is the one that most people think about when they think about “data models”. This is because it’s the most technical of the levels, and it’s closest to the actual implementation - so close, in fact, that some people see no difference between The Treachery of Images and the actual pipe they’re smoking!
But in fact, given how good our data engineering tools are nowadays at automating stuff, perhaps there doesn’t need to be a huge difference at this level. If you have a good logical model that defines all the objects (tables), their attributes (including keys), and the relationships between them, and if you’re able to map this logical model of the target solution and its attributes to your actual data sources, then perhaps you can just click “generate” and a physical model pops out!
Mapping of sources to targets is an important design question. Serge Gershkovich (whom I also respect greatly) writes about this as “transformational modeling”. Personally, I am more inclined to wrap this into the physical model, but it is good to emphasize that the mapping is important.
Sometimes it’s also necessary to add some extra technical detail into the physical-level design, of course. Things such as constraints, partitions, and indexes should be designed at this point. These are dependent on the chosen technology: you can implement a single logical model in two different technologies, where technology-specific optimization necessitates two different physical models.
But all this means that the physical model doesn’t necessarily need to be a separate diagram. In fact, if you have a good conceptual model and a good logical model, I see little reason to create and maintain a separate physical model diagram. Perhaps it’s all auto-generated and versioned as code? Or perhaps, like in my old days as a Data Warehouse developer11, the physical model is a sprawling spreadsheet with all the technical information as well as mappings and business logic written into it, from which you can then auto-generate the actual ETL code.
My argument has been for a while now that if you have conceptual & logical modeling levels properly done and if you have good enough tooling that is capable of automating stuff, a separate human-made physical model might not be necessary at all. Yet, because of general lack of data modeling knowledge and because of the tech focus in our industry, quite often physical modeling is the only kind of modeling that is done. Which is sad and really inefficient! But I’m hopeful that this is changing now, as the value of good design & semantic linking of data is more widely understood.
The big picture
All in all, the big picture for data modeling is this:
You do conceptual modeling to understand the business information within a slice of reality.
You do logical modeling to design a suitable structure from that business information for a particular use case.
You do physical modeling to design an implementation of that logical structure in a particular technology.

How you do each step can differ: all kinds of methods and notations and tools exist. Not everything has to be a diagram (though diagrams are a pretty good way to represent things being related to other things). If you do diagrams, your notation of choice can be UML or Barker’s or Chen or something else - your notation choice doesn’t change the fact that you’re working on one of these three levels. And remember, the various modeling methods like Data Vault and Dimensional - they are logical level methods. They don’t affect the work you do on the conceptual level.
That’s it - that’s my way of thinking on data models. You can now ponder that, and read the FAQ below (perhaps, if I get a lot of comments, I’ll edit that part later). We’ll return to data modeling many times on this Substack!
FAQ - why my truth might not be yours
Okay yeah so obviously this was never going to be “the” Objective Truth, whatever that is. People have lots of opinions on data modeling! I’m adding this small FAQ here pre-emptively: while I absolutely appreciate and desire comments and discussion, there are also some things I can already anticipate coming my way, and it’s better to cover those right away I think!
Very nice, but what you describe as “conceptual modeling” is in fact conceptual data modeling / concept modeling / business term modeling…
Just… no. Sorry! I know some people, far cleverer than I, have defined theoretical constructs that more accurately describe the differences between business terms and data entities and stuff like that. But frankly, I don’t find those distinctions practically very useful! I call the top-level stuff “conceptual modeling” because that’s how I learned it, and I define it like this because that’s what seems to be work in practice. You do you!
You didn’t mention anything about my favourite method which is XYZ, why?
Isn’t this post long enough already?
What about unstructured data or streaming?
Excellent question, thank you! I’ll try to write more about this later. But the important thing is that in my thinking, the conceptual level is still the same, regardless of the type of data. Differences will appear in logical/physical levels, but then again those are use case-specific in any case, so that’s to be expected! The big picture still stands.
What about Medallion architecture?
Medallion is to data models what a car park is to cars, and besides, I just wrote a ton about how it all starts at a completely tech-agnostic conceptual level!
This sounds nice in theory but I don’t have time for this.
I find it peculiar that some people in the data industry are so ready to entirely dismiss design from their workflows. Who builds a house without a blueprint??
You mentioned semantics - where do knowledge graphs come in?
That is a JUICY topic which I hope to cover later. Suffice to say, the knowledge management world is very much connected!
If you managed to read this far, thank you! This is a topic close to my heart. All comments and discussion are very much welcome!
Until next time - cheerio!
And various vendors have made this even more difficult: for all the love I have for the tool they’ve built, I will never forgive dbt Labs for calling what is practically just a SQL query a “data model”. Shame on you! SHAME!
I know Shane quite well, and Ramona and I have also interacted a lot, and I have huge respect for both of them (I don’t know Chris at all yet, but he seems super smart too!). Ramona in fact mentioned me at this point in the video, which was also a major trigger for this piece - I have to say something now that I was called out!
I mean I would really love to go into all of this “what exists, what we experience, and how we can talk about it” stuff, but first of all that would be a whole another wall of text, and secondly for the reader it would probably feel like a lot of pseudo-intellectual, umm, self-pleasure as it were (which tbh it would be).
Because hey, it’s my Substack, so it’s perfectly fine for me to mention myself in the same sentence as Plato, Kant, and Wittgenstein!
But data modeling is not ONLY design - there’s another point to be made about using data models not just for designing data solutions but for other purposes as well, and this will be a topic for this Substack later!
So in effect, we’re talking about The Treachery of Images and not the pipe! Or, we’re talking about the map and not the territory. Choose your own metaphor to follow, I just happen to love Magritte.
Alec has also taught me the art of craft cocktails, for which I’m equally grateful (though my liver might not be).
It is perfectly possible to have some more detail to the conceptual model, such as adding some important attributes. Usually I choose not to add them, but adding attributes doesn’t change the fact that the entities and relations form the core of the model.
The business entities in a conceptual model should be part of your Business Glossary. Definitions should be written for every single entity!
Again, you can have more or less detail about the relations. I like to add cardinality, some people don’t. Just do the verbs at least!!!
“Data Engineer” as a title didn’t exist back then, even though we did pretty much the same stuff (but with worse technology) as DEs do now.




Great article! Thank you for sharing the discussion with Ramona and Shane!
Great article. Glad there’s someone else out there still believing in data modelling. I’m the author of Practical Data Migration (PDM) and working on putting PDM into an online training package. I’m a big fan of data models. Always start projects with a conceptual data model. For starters I use it to define my migration scope. What’s in, what’s not in but impacted, and what’s not in. Then I use it tag all the artefacts. 800 open data issues is not unusual on a lager migration. Given that there may be hundreds of potential data sources, once you add in all the spreadsheets out there, it means that we can narrow down potential solutions to numbers we can eyeball. I was half thinking of cutting this from the training pack. There’s not much appetite for data models these days. But you have inspired me to stick with the models. (We also have the equivalent of logical models and physical models but I’ve taken up enough of your time). Thank you.