This month Mickey Stuewe hosts the T-SQL Tuesday and she has a great topic. Data Modeling is something few people seem to do, especially developers, and it often can go wrong. That’s the topic, and I have a short story.
T-SQL Tuesday is a monthly blog party, on the second Tuesday of the month. The rules are on Mickey’s blog, and you can read them there, but essentially you write a post every month.
The Aging Model
I once worked at company that shall rename nameless. We had a system built before my time which had a hierarchical set of entities. I can’t disclose the exact industry, but imagine that we had a list of items like this:
- Manufacturers of products
- Distributors of products
- Warehouse holding products
- Vendors selling products.
In this structure, we have a strict linkage where each item below the next is contained in the item above it. In other words, a manufacturer works with specific distributors and only those distributors. Distributors don’t work with other manufacturers.
I know this sounds contrived, and it is for a supply chain. However not for the industry in which I worked. So imagine we’re 100 years ago when power was more concentrated with supply chains.
This resulted in tables like this:
CREATE TABLE Manufacturers ( manufacturerid INT IDENTITY(1, 1) CONSTRAINT manufacturer_PK PRIMARY KEY ( manufacturerid ) , manufacturername VARCHAR(200) , primarycontactid INT -- ... ); GO CREATE TABLE Distributors ( distributorid INT IDENTITY(1, 1) CONSTRAINT distributor_PK PRIMARY KEY ( distributorid ) , manufacturerid INT CONSTRAINT Distributor_Manufacturer_FK FOREIGN KEY REFERENCES dbo.Manufacturers ( manufacturerid ) , manufacturername VARCHAR(200) , PrimarySalesPersonid INT -- ... ); GO CREATE TABLE Warehouses ( warehouseid INT IDENTITY(1, 1) CONSTRAINT warehouse_PK PRIMARY KEY ( distributorid ) , distributorid INT CONSTRAINT Warehouse_Distributor_FK FOREIGN KEY REFERENCES dbo.Distributors ( distributorid ) , warehouse VARCHAR(200) , regionid INT -- ... ); GO
Each of these links to the item above it. This means that I might have a Manufacturer table like this:
manufacturerid manufacturername …
2 Big Product Corp
With warehouses linked as shown.
warehouseid manufacturerid warehousename …
———– ————– ————-
1 1 Denver Central
2 1 Kansas City East
3 2 Santa Fe
4 1 Billings Depot
This would mean that I used distributors that worked with a warehouse, and their data would be.
distributorid warehouseid distributorname …
————- ———– ————-
1 1 Denver City
2 1 Denver South
3 1 Aurora
4 2 Kansas City Distributors
5 3 New Mexico Wholesale
If I wanted to get a list of the distributors that carried a manufacturer’s products, I’d have to join through the warehouse table.
SELECT manufacturerid , ... FROM dbo.Manufacturers AS m INNER JOIN dbo.Distributors AS d ON d.manufacturerid = m.manufacturerid INNER JOIN dbo.Warehouses AS w ON w.distributorid = d.distributorid ...
Not a big deal, but we had 5 levels of this appear over time. Which means that queries that might need a higher level had to cross tables in between to join data. These also weren’t narrow tables, with a decent amount of meta data for each entity. Indexing helped, but certainly we needed better rollups of data, and performance suffered as the amount of legacy data grew.
What’s more, over time, we learned that business changes. A warehouse might start to work with multiple distributors, and our model couldn’t cope.
Eventually we embarked upon a project that set of link tables between entities, so that we had just IDs in a table that showed linkages where appropriate. This was ongoing when I left, but it was a major disruption and problem for the company.
I’ve tried to avoid embedding hard relationships based on business contracts in my models. Certainly some FKs make sense, like order details linked to a specific order, but many other relationships aren’t as tightly bound, and it is possible that they will change. It doesn’t cost me a lot in modeling to treat suspect relationships as many:many, and if application development does this from the start, it’s an easy architecture to incorporate in their development.
Data modeling is always a struggle as we often work with incomplete information because an organization will morph over time. It is, however, worth taking some time to think about the possible ways in which things might change in the future and allow for some flexibility in your model where possible.