Relational By Default

I’ve read a few pieces from Vladimir Khorikov and I tend to like the thoughtful way in which he approaches building software. When I saw a link to his SQL v NoSQL databases, I was intrigued to see what his thoughts would be. It’s a good read, though with relatively few comments or debates posted at the time I read the article. I was hoping for more and I encourage you to add your thoughts.

The main contentions in the piece are that any of the simpler development techniques that we can use with NoSQL databases don’t remove the need for implementing data quality or management features. They just require those features to be implemented by the developer. Specifically the need to deal with multiple schema versions over time, or the requirement we enforce parent child relationships.

Perhaps that’s not too bad. After all, if we evolve schemas and the code can easily deal with the changes, that’s good. However I think the point about having relational storage when you can, and adding in other stores makes sense. This is what I really believe as well, since we often need to query and combine data, which is something relational stores do very well and very efficiently.

Ultimately I’m sure we will continue to see arguments about relational and NoSQL models being better or worse for particular problems. I actually welcome the arguments  because there are certainly domains of problems where one system works better than the other, and it’s good to develop some clarity about those particular problems. However there are also likely many situations where multiple platforms can work, and in those cases, we should use what works well for our team.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

Poor Data Modeling – T-SQL Tuesday #72

tsqltuesdayThis month Mickey Stuewe hosts the T-SQL Tuesday and she has a great topic. Data Modeling is something few people seem to do, especially developers, and it often can go wrong. That’s the topic, and I have a short story.

T-SQL Tuesday is a monthly blog party, on the second Tuesday of the month. The rules are on Mickey’s blog, and you can read them there, but essentially you write a post every month.

Or later.

The Aging Model

I once worked at company that shall rename nameless. We had a system built before my time which had a hierarchical set of entities. I can’t disclose the exact industry, but imagine that we had a list of items like this:

  • Manufacturers of products
  • Distributors of products
  • Warehouse holding products
  • Vendors selling products.

In this structure, we have a strict linkage where each item below the next is contained in the item above it. In other words, a manufacturer works with specific distributors and only those distributors. Distributors don’t work with other manufacturers.

I know this sounds contrived, and it is for a supply chain. However not for the industry in which I worked. So imagine we’re 100 years ago when power was more concentrated with supply chains.

This resulted in tables like this:

CREATE TABLE Manufacturers ( manufacturerid INT IDENTITY(1, 1) CONSTRAINT manufacturer_PK PRIMARY KEY ( manufacturerid ) , manufacturername VARCHAR(200) , primarycontactid INT -- ... ); GO CREATE TABLE Distributors ( distributorid INT IDENTITY(1, 1) CONSTRAINT distributor_PK PRIMARY KEY ( distributorid ) , manufacturerid INT CONSTRAINT Distributor_Manufacturer_FK FOREIGN KEY REFERENCES dbo.Manufacturers ( manufacturerid ) , manufacturername VARCHAR(200) , PrimarySalesPersonid INT -- ... ); GO CREATE TABLE Warehouses ( warehouseid INT IDENTITY(1, 1) CONSTRAINT warehouse_PK PRIMARY KEY ( distributorid ) , distributorid INT CONSTRAINT Warehouse_Distributor_FK FOREIGN KEY REFERENCES dbo.Distributors ( distributorid ) , warehouse VARCHAR(200) , regionid INT -- ... ); GO


Each of these links to the item above it. This means that I might have a Manufacturer  table like this:

manufacturerid  manufacturername   …

————–  —————-  

1               Acme

2               Big Product Corp

With warehouses linked as shown.

warehouseid  manufacturerid warehousename   …

———–  ————– ————-  

1            1              Denver Central

2            1              Kansas City East

3            2              Santa Fe

4            1              Billings Depot

This would mean that I used distributors that worked with a warehouse, and their data would be.

distributorid warehouseid distributorname  …

————- ———– ————-  

1            1            Denver City

2            1            Denver South

3            1            Aurora

4            2            Kansas City Distributors

5            3            New Mexico Wholesale

If I wanted to get a list of the distributors that carried a manufacturer’s products, I’d have to join through the warehouse table.

SELECT manufacturerid , ... FROM dbo.Manufacturers AS m INNER JOIN dbo.Distributors AS d ON d.manufacturerid = m.manufacturerid INNER JOIN dbo.Warehouses AS w ON w.distributorid = d.distributorid ...

Not a big deal, but we had 5 levels of this appear over time. Which means that queries that might need a higher level had to cross tables in between to join data. These also weren’t narrow tables, with a decent amount of meta data for each entity. Indexing helped, but certainly we needed better rollups of data, and performance suffered as the amount of legacy data grew.

What’s more, over time, we learned that business changes. A warehouse might start to work with multiple distributors, and our model couldn’t cope.

Eventually we embarked upon a project that set of link tables between entities, so that we had just IDs in a table that showed linkages where appropriate. This was ongoing when I left, but it was a major disruption and problem for the company.

I’ve tried to avoid embedding hard relationships based on business contracts in my models. Certainly some FKs make sense, like order details linked to a specific order, but many other relationships aren’t as tightly bound, and it is possible that they will change. It doesn’t cost me a lot in modeling to treat suspect relationships as many:many, and if application development does this from the start, it’s an easy architecture to incorporate in their development.

Data modeling is always a struggle as we often work with incomplete information because an organization will morph over time. It is, however, worth taking some time to think about the possible ways in which things might change in the future and allow for some flexibility in your model where possible.

Positive or Negative

While watching a presentation recently on a SQL Server topic, I was surprised by a design technique that the speaker used. I hadn’t ever thought of the issue, but it caught my eye and made me rethink the way I look at design, and even data.

In this case, the speaker was building a table and looking to make a row active or inactive. This person chose to use a column name of “inactive,” which surprised me. I’ve always used an “active” column if I am trying to mark a row in a binary fashion.

That got me thinking. I wonder if there is some impact to designing entities either way, but I wanted to ask this question:

Are you positive or negative. Would you use an active flag or an inactive flag?

This could be extended to any other type of choice, but do you look to flag an action as happening, or not happening. As readable or not readable? As usable or ignored? I’m not sure it matters, but I’d certainly be curious if anyone has any reason why you might choose one or the other. Is there some meaning conveyed? Let us know this week.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

Logging Tables

I was chatting in one of our DLM webinars and someone asked a question about logging data. I mentioned a few things, but this individual followed up and wanted more information. Specifically, how do you build a logging table?

It’s an open ended question, but it’s perfect for a poll. How do you construct a table that you use for capturing lots of information that you may or may not query later, but you might only query in a very ad hoc fashion when something has gone wrong. Usually that’s the only time I look through logs, when I’m investigating some type of issue. With that in mind, this week the question is:

How do you design a logging table?

I’m looking for two things here: the DDL and the rational for the design. What is good about this design? How does this particular structure (with indexing) allow me to find answers to the questions I might need to ask.

I realize this is very open ended, but the concept of logging is also open-ended, used different ways for different purposes. If you have problems with logging structures, let us know what design techniques to avoid as well.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.7MB) podcast or subscribe to the feed at iTunes and LibSyn.

ER Modeling

I saw someone recently note that an Entity-Relationship model is important for your database development project. That UML is important, and the model should be actively maintained. I agree.

However, I’ve usually not seen this done. Typically when I’ve seen an ER diagram in a shop, it resides on a wall, printed out on multiple sheets of paper. Often there are as many handwritten additions on the pages as there are marks from a printer. The diagram is also usually out of date, even allowing for handwritten notes.

Maintaining a diagram is hard. Deciding to actually spend time working on one instead of just talking about a few entities and building them, may be harder. This week, I wanted to ask you if you’re up to the challenge in your organization.

How many of you maintain an ER model for your database?

Do you have an ER diagram you’ve built? Whether it’s in ErWin, Visio, or Paint. Is there a diagram that’s actively maintained and that everyone can use for designing changes to the database(s) in use? Do people actually consult this and use it in design meetings? Let us know this week.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

No More Foreign Keys?

I struggle with removing FKs as a general idea. It’s presented in a piece from a developer. It appears to me that if you do this, you are merely moving the problem, and potentially causing other issues. Apart from Grant’s response on performance, which is very true, there are other challenges with microservices and separate databases for each service.
Let’s say we are doing something trivial, like a simple e-commerce site. I could have a separate database for products, one for orders, one for shipping actions, one for customers, etc. However as I scale, I’m not using intra-database queries or FKs/joins/other techniques to verify information for an order. Now I’m sending messages. At small scale, these are trivial. At large scale, my network traffic and messaging is vastly growing and potentially a bottleneck.
Also, since there is a “products” database, and an “orders” database and a “customers” database, I still can’t necessarily scale out these microservices beyond their own machine. Perhaps I can scale them higher in terms of each being as large a machine as I can have now with a single Oracle/SQL Server/MySQL/etc. box, but I’m still have a scaling issue. I also now have a new messaging problem I need to manage and architect. If I lose one database, how gracefully can my application degrade or does it start to fail with unforeseen interactions? Do I create more frustration when customers cannot place an order because we can’t find the customer db or because the site is down?
Certainly there are domains of problems that would work fine here. Spotify is probably a good example of this. There may be e-commerce systems that allow this, perhaps being willing to notify customers after some time (hopefully minutes, perhaps hours) that a product is out of stock because the message from the orders database didn’t get to the application. There are certainly ways to copy and replicate some  data, assuming that “fresh enough” works for your application. However that’s a business decision more than a coding decision. This also means that you also have a new load on your system to move data. I also think the more you work with singleton rows and single objects in an application context, the less you need RDBMS capabilities.
Ultimately I think FKs and strong RDBMS systems work very well in many situations. They work less well in many situations and your domain may qualify for those areas where a NoSQL, a multi-db, or other architecture works. We should certainly investigate and write about where things work and don’t work, but I wouldn’t assume the RDBMS with it’s PKs and FKs isn’t a valid, well thought out and incredibly useful architecture.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Making Guesses

I was reading a bit about the design and planning stage for a new database that will support some random application. The piece isn’t important, but I did notice one thing. The guidelines are generic, encompassing most of the things we might want to think about: structure of data, constraints, volumes of data and rate of growth, and more. It’s a good list, and one many of you have probably considered at some point if you’ve designed a system.

However, I’ve rarely been able to actually do more than guess at many of the answers. Even much of the data I can write down is often little more than a guess. Even when I know the DRI constraints, I hesitate to document them separately from the code itself. Mostly because I find designs change early and often, and the documentation is rarely able to keep up with code. The same thing applies with the administrative type decisions I make.

I’m also nervous about documenting guesses.   I find that far too often any answer I give takes on some level of truth with the developers, often being considered an immutable fact. The assumption that an early guess will match reality later seems to often blind everyone and limits the level to which we can adapt to changing situations.

As an example, if I tell a SAN admin that we think we need 5GB of space, but we find that we need 20GB in a short period of time, I meet resistance. A debate, or argument ensues, we search for evidence that we need to change, and a very inefficient process ensues. The same thing occurs when developers, security admins, and especially managers hold too tightly to estimates and guesses.

Certainly some guidelines are needed, but when I’m not sure, I tend to give wide ranges, just to prevent too many expectations from taking hold in people’s minds. We should know that our plans will rarely survive production deployments and workloads, but somehow we forget that when looking at the guesses we make early on.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Defining Foreign Keys at Table Create Time

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

How many of you can define a foreign key when you create the table? Probably a few of you, but I bet most of you are like me and don’t necessarily know the syntax. I have often defined these later, which is fine. As long as they get defined.

However I knew I needed a specific key when I was creating a table and couldn’t remember the syntax, so I had to search and learn how. I used Google and saw a few links from MSDN, but those tend to be overly documentation heavy. One of the links was to SQL Authority, run by Pinal Dave. He does a great job of simplifying things (and he’s a friend), so I followed that link. I could see the syntax and tested it in minutes.

It’s easy to create a Primary Key in CREATE TABLE, and I wrote about that for one of my first SQLNewBlogger posts. The Foreign Key is similar, but not quite as simple.

Imagine that I have a parent table:

CREATE TABLE orders ( orderid INT IDENTITY(1, 1) CONSTRAINT Orders_PK PRIMARY KEY ( orderid ) , orderdate DATETIME , complete BIT ); GO

I now want to create a child table and link the orderid in the child to the parent. I can do it like this:

CREATE TABLE OrderLines ( orderlineid INT IDENTITY(1, 1) CONSTRAINT OrderLines_PK PRIMARY KEY ( orderlineid ) , orderid INT CONSTRAINT orderlines_order_fk FOREIGN KEY REFERENCES orders ( orderid ) , qty INT ); GO

Note that I define a constraint inline, just as I did for the parent. However I note this one is an FK and it "references" another table. In this case, I list the Orders table and put the columns in parenthesis.

Quick, easy, build your FKs inline when you know about them in advance.


While trying to remember how to create an FK, I ran a search and chose the reference below to start. A matter of seconds had me seeing the syntax and writing the code.

Putting this together was less than ten minutes.


Creating Primary Key and Foreign Key Constraints –

Microservices for Databases

I ran into a talented developer last year that was talking about microservices. It’s an interesting concept, one the Netflix has adopted with success. I was excited about the possibilities of using microservices until this guy said that everything could be a microservice and the day of the RDBMS was over.

That was silly, and I’ll admit I struggled to remain polite in the discussion. Eventually I had to walk away because the idea of no RDBMS for any application is a ludicrous as the concept of using an RDBMS for every single system dealing with data. It’s frustrating to talk with someone that views our industry as too black and white. There are many ways to solve any problem and many problems can be handled by a variety of techniques.

However I am intrigued by microservices. It’s an area that I want to continue to research, as I suspect that the idea of small, loosely coupled applications, working in a service-oriented architecture, is a great way to scale systems.

From scratch.

I’m not sure that many of the monolithic, large applications we have in banking, in retail, in supply chain, in a variety of industries are worth rewriting to use SOA concepts. The return just isn’t there, as many of these systems can be served with bigger, faster hardware as they upgrade.

Microservices are interesting, and I suspect we’ll see more of them in the future. I also think that SOA, using messaging services like Service Broker, are a very robust way to build applications. I hope that more developers out there learn about SOA and find ways to start building system that can work well together, but aren’t so highly dependent that changes are difficult or stressful on everyone.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.3MB) podcast or subscribe to the feed at iTunes and LibSyn.

Microservices and Databases

I ran across a post on microservices recently and was intrigued. I always like the idea of loosely coupled, independent items in software. However the idea of microservices causes issues with databases. Here’s the section in the piece by Netflix: Create a separate data store for each microservice.

I tend to think of microservices is very small, tightly bound components of software. If that’s the case, how can I separate each item in a database? Do I want dozens of databases?

I got in an argument with a developer a few months ago about this. The developer was sure that having separate databases (or stores) for each component wasn’t an issue. For things like banking and similar transfers, I’m not sure a microservice can be separated safely and still ensure integrity.

This is a tough concept for architecture, and I need to read more and understand how this could work in practice. I suspect that in places where things could be separate, you could easily use schemas in a database to separate out microservices from each other. If each service includes its own connection information, then using schemas would work and still allow for scale out to other databases if needed.

This is an area that certainly fits larger scale applications like Netflix and Spotify, but for many of us, our applications seem to be more tightly tied together.

Or are they?

Microservices are much like Service-Oriented Architecture (SOA), which I think fits many applications. I’m not sure how similar the concepts are, but this is certainly an area I’d like to learn more about.