Email or Username

Quite a few of us will work with application databases where we need to identify a particular user for working with the various parts of the application. I’m not talking about the authentication and verification of identity, but just having a token to identify a user once they’ve been authenticated. I don’t want to talk security here since most of you aren’t great at building these systems and should just implement an authentication scheme that’s been vetted (OAuth, Live, etc).

However, once you get a way of authenticating a user, how do you identify them in the database? Many people store a username, but there are issues with these. They aren’t consistent across systems, a user might find theirs taken in one application, and then needs a second one for the next application. This leads to the complexity of people needing a way to retrieve their username, which can be a whole separate set of application programming pain. Often the default is to move back to an email associated with the username.

However what does someone do when they don’t have access to that email? Therein lies the issue with using email as an authentication mechanism. I use my email for many things, most of them related to this site. However, if I were to ever leave this job, then I wouldn’t be able to access some sites if I couldn’t remember the login information. Perhaps I shouldn’t be using this email, but there are some services that are related to both my career as a DBA/developer as well as my position here at SQLServerCentral.

Plenty of people have a separate email for personal use, but not necessarily many users of a generic application. There are plenty of people using their work email for registration at sites, assuming they’ll always have access to the application through a username of some sort. You might even change your email, deciding at some point that isn’t the one you want to give out for your kid’s school registration. In that case, if you’ve used that email at a variety of places, then what can you do?

PASS recently changed to using email instead of username, which is fine. I think more sites use email, but in our case, as DBAs and developers, are emails less likely to change? Should we be sure that our PASS email is always a non-corporate email so that we can keep membership as we change employers? I’d think username is more stable for people in technology, and it can be fun. I love my way0utwest moniker, and try to ensure I reserve it on many applications I may want to use.

There’s no good answer here, and there are issues with using either one as an identifier, but what do you think? Are there compelling reasons to choose one over the other? No matter which one you use, I think the need to have multiple ways for users to gain access to their account is important.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

Have You Designed a Database from Scratch?

One of the things that I try to regularly promote is the #SQLNewBlogger project from Ed Leighton-Dick (and #SQLNewBlogger hashtag on Twitter). In keep a column open scanning for the tag and try to respond to those that post. I also write a series of posts under that tag, mainly beginner type posts, that showcase things I think are good, basic topics that I’d want a DBA or developer to know about SQL Server. These are exactly the types of posts that you write for your next resume (or next hiring manager).

I saw recently someone begin their blogging journey with a new post about designing a database. What struck me was this person with a lot of experience noted they had never designed a database from scratch. I’m sure that most of you have worked with a database, and have even modified various objects inside of a schema. Certainly you’ve changed the code of stored procedures or functions. I’m even sure many of you have modified tables, adding columns where you need them. After all, that’s a favorite first technique for many people that need a place to store data.

Probably the majority of you have built tables, at least temporary tables. I’m sure you spent a few minutes thinking about the columns, datatypes, etc. I’d also hope that most of you have added indexes and constraints to tables. Those are certainly skills that more people should better understand, especially the developers that work on software powered  by SQL Server. Above all, please learn how to index OLTP tables.

However, have you designed a database from scratch? Not a standalone database, but a database the actually has an API or application built against it. Have you tried to use the database and found places where your modeling was lacking? Did others complain about your design?

I know that lots of school coursework requires people to design parts of a database, but what about a full, complete database. One that fulfills the requirements to actually manage a set of data well? I’ve done a few, though admittedly, with input and help from other DBAs and developers. I am better for that, and I think that a good design requires some interaction and input from others.

With that in mind, whether you’ve built one or now, what about tackling this as a project for your own blog? Certainly there are some good problem sets out there, but I’d actually give you a new one if you’d like. The SQL Saturday site for PASS has a database behind it. How would you design a database, given the requirements that you need to support that site, and actually generate out the XML feed for the Guidebook app? Maybe that would be a fun, multi-part series that looks at your decisions for the various entities required. You’d learn something and get lots of feedback.

If you’re like to publish a series here at SQLServerCentral, let us know. We’d love to have some design series on databases that back applications, especially if you have a side project with an application.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and LibSyn.

The Clustered Index is not the Primary Key

I was reading through a list of links for Database Weekly and ran across this script from Pinal Dave, looking for tables where the clustered index isn’t the PK. It struck me that this is one of those facts I consider to be so simple, yet I constantly see people confusing. If you click the Primary Key icon in the SSMS/VS designers, or you specify a PK like this:

    SomeUniqueVal INT PRIMARY KEY

What will happen is that a clustered index is created on this field by default. It’s not the the PK must be clustered, but that SQL Server does this if you don’t tell it otherwise. Tables should have primary keys, and while you can debate that, most knowledegable SQL Server people I know want a PK on tables. There are exceptions, but if you can’t name them now, use a PK.

However the PK isn’t a clustered index. They are separate concepts. The PK can be clustered or non-clustered, and what you choose it up to you. However, I like Kimberly Tripp’s advice. Choose the clustering key separately from the PK, keep it unique, fixed, and narrow.. If they’re the same, fine, but don’t try to make them the same. Choose what works well for your particular table, which means thinking a bit.

You get one clustering key, and it’s worth spending five minutes debating the choice with a DBA or developer, or even post a note at SQLServerCentral. Changing the choice isn’t hard, but it can interrupt your clients’ work on your database, so try to make good design choices early, without blindly accepting defaults. It’s worth a few minutes of your time to make a good choice.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio (2.0MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Naming Is Hard

I ran across Aaron Bertrand’s naming post recently, which I liked overall. Your team needs to agree how things are named, and be consistent, but I agree that some of the rules I’ve seen put down by people aren’t practical. However I also know that naming things becomes hard, especially over time as your systems evolve and the scale grows.

I’ve run into some strange naming patterns in the last few years that struck me as odd. I grew up in the Hampton Roads area of Virginia, and with a parent selling real estate, I traveled all over the area, often reading a map to navigate. I can rarely remember seeing the same names used over and over, despite the fact this was a relatively populous area.

However when I came to Colorado, I found things to be different. Either the people in charge of naming weren’t creative, or they were very, very lazy. In some of the developments I entered, I’d find the same name used many times on adjacent streets. You can find Wolfe St, Wolfe Pl, Wolfe Ct, and more all next to each other. While a computer can easily differentiate these sets of characters, this can easily create confusion in humans.

On a note closer to our livelihood, I have seen some strange databases objects. The tables in a JD Edwards system were coded with a short alpha sequence and then a 4 or 5 digit number. Always fun to remember what “FS3401” means. However I saw one recently that stunned me. I ran across a SQL Server database that was quite large, not in data size but in the number of objects. Apparently a developer or DBA at some point decided it wasn’t worth trying to name tables to match some entity and used GUIDs are names instead. I don’t know about many of you, but comprehending which GUID stores which data might have me resigning from that position relatively quickly.

We won’t all necessarily agree on how we should best name objects and entities, and that’s fine. However I would implore you to at least consider the humans that will have to work with your database. Use some creativity to build names that are easy to differentiate and pronounce for the future DBAs and developers that need to enhance and query your system.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.0MB) podcast or subscribe to the feed at iTunes and LibSyn.

Relational By Default

I’ve read a few pieces from Vladimir Khorikov and I tend to like the thoughtful way in which he approaches building software. When I saw a link to his SQL v NoSQL databases, I was intrigued to see what his thoughts would be. It’s a good read, though with relatively few comments or debates posted at the time I read the article. I was hoping for more and I encourage you to add your thoughts.

The main contentions in the piece are that any of the simpler development techniques that we can use with NoSQL databases don’t remove the need for implementing data quality or management features. They just require those features to be implemented by the developer. Specifically the need to deal with multiple schema versions over time, or the requirement we enforce parent child relationships.

Perhaps that’s not too bad. After all, if we evolve schemas and the code can easily deal with the changes, that’s good. However I think the point about having relational storage when you can, and adding in other stores makes sense. This is what I really believe as well, since we often need to query and combine data, which is something relational stores do very well and very efficiently.

Ultimately I’m sure we will continue to see arguments about relational and NoSQL models being better or worse for particular problems. I actually welcome the arguments  because there are certainly domains of problems where one system works better than the other, and it’s good to develop some clarity about those particular problems. However there are also likely many situations where multiple platforms can work, and in those cases, we should use what works well for our team.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

Poor Data Modeling – T-SQL Tuesday #72

tsqltuesdayThis month Mickey Stuewe hosts the T-SQL Tuesday and she has a great topic. Data Modeling is something few people seem to do, especially developers, and it often can go wrong. That’s the topic, and I have a short story.

T-SQL Tuesday is a monthly blog party, on the second Tuesday of the month. The rules are on Mickey’s blog, and you can read them there, but essentially you write a post every month.

Or later.

The Aging Model

I once worked at company that shall rename nameless. We had a system built before my time which had a hierarchical set of entities. I can’t disclose the exact industry, but imagine that we had a list of items like this:

  • Manufacturers of products
  • Distributors of products
  • Warehouse holding products
  • Vendors selling products.

In this structure, we have a strict linkage where each item below the next is contained in the item above it. In other words, a manufacturer works with specific distributors and only those distributors. Distributors don’t work with other manufacturers.

I know this sounds contrived, and it is for a supply chain. However not for the industry in which I worked. So imagine we’re 100 years ago when power was more concentrated with supply chains.

This resulted in tables like this:

CREATE TABLE Manufacturers ( manufacturerid INT IDENTITY(1, 1) CONSTRAINT manufacturer_PK PRIMARY KEY ( manufacturerid ) , manufacturername VARCHAR(200) , primarycontactid INT -- ... ); GO CREATE TABLE Distributors ( distributorid INT IDENTITY(1, 1) CONSTRAINT distributor_PK PRIMARY KEY ( distributorid ) , manufacturerid INT CONSTRAINT Distributor_Manufacturer_FK FOREIGN KEY REFERENCES dbo.Manufacturers ( manufacturerid ) , manufacturername VARCHAR(200) , PrimarySalesPersonid INT -- ... ); GO CREATE TABLE Warehouses ( warehouseid INT IDENTITY(1, 1) CONSTRAINT warehouse_PK PRIMARY KEY ( distributorid ) , distributorid INT CONSTRAINT Warehouse_Distributor_FK FOREIGN KEY REFERENCES dbo.Distributors ( distributorid ) , warehouse VARCHAR(200) , regionid INT -- ... ); GO


Each of these links to the item above it. This means that I might have a Manufacturer  table like this:

manufacturerid  manufacturername   …

————–  —————-  

1               Acme

2               Big Product Corp

With warehouses linked as shown.

warehouseid  manufacturerid warehousename   …

———–  ————– ————-  

1            1              Denver Central

2            1              Kansas City East

3            2              Santa Fe

4            1              Billings Depot

This would mean that I used distributors that worked with a warehouse, and their data would be.

distributorid warehouseid distributorname  …

————- ———– ————-  

1            1            Denver City

2            1            Denver South

3            1            Aurora

4            2            Kansas City Distributors

5            3            New Mexico Wholesale

If I wanted to get a list of the distributors that carried a manufacturer’s products, I’d have to join through the warehouse table.

SELECT manufacturerid , ... FROM dbo.Manufacturers AS m INNER JOIN dbo.Distributors AS d ON d.manufacturerid = m.manufacturerid INNER JOIN dbo.Warehouses AS w ON w.distributorid = d.distributorid ...

Not a big deal, but we had 5 levels of this appear over time. Which means that queries that might need a higher level had to cross tables in between to join data. These also weren’t narrow tables, with a decent amount of meta data for each entity. Indexing helped, but certainly we needed better rollups of data, and performance suffered as the amount of legacy data grew.

What’s more, over time, we learned that business changes. A warehouse might start to work with multiple distributors, and our model couldn’t cope.

Eventually we embarked upon a project that set of link tables between entities, so that we had just IDs in a table that showed linkages where appropriate. This was ongoing when I left, but it was a major disruption and problem for the company.

I’ve tried to avoid embedding hard relationships based on business contracts in my models. Certainly some FKs make sense, like order details linked to a specific order, but many other relationships aren’t as tightly bound, and it is possible that they will change. It doesn’t cost me a lot in modeling to treat suspect relationships as many:many, and if application development does this from the start, it’s an easy architecture to incorporate in their development.

Data modeling is always a struggle as we often work with incomplete information because an organization will morph over time. It is, however, worth taking some time to think about the possible ways in which things might change in the future and allow for some flexibility in your model where possible.

Positive or Negative

While watching a presentation recently on a SQL Server topic, I was surprised by a design technique that the speaker used. I hadn’t ever thought of the issue, but it caught my eye and made me rethink the way I look at design, and even data.

In this case, the speaker was building a table and looking to make a row active or inactive. This person chose to use a column name of “inactive,” which surprised me. I’ve always used an “active” column if I am trying to mark a row in a binary fashion.

That got me thinking. I wonder if there is some impact to designing entities either way, but I wanted to ask this question:

Are you positive or negative. Would you use an active flag or an inactive flag?

This could be extended to any other type of choice, but do you look to flag an action as happening, or not happening. As readable or not readable? As usable or ignored? I’m not sure it matters, but I’d certainly be curious if anyone has any reason why you might choose one or the other. Is there some meaning conveyed? Let us know this week.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

Logging Tables

I was chatting in one of our DLM webinars and someone asked a question about logging data. I mentioned a few things, but this individual followed up and wanted more information. Specifically, how do you build a logging table?

It’s an open ended question, but it’s perfect for a poll. How do you construct a table that you use for capturing lots of information that you may or may not query later, but you might only query in a very ad hoc fashion when something has gone wrong. Usually that’s the only time I look through logs, when I’m investigating some type of issue. With that in mind, this week the question is:

How do you design a logging table?

I’m looking for two things here: the DDL and the rational for the design. What is good about this design? How does this particular structure (with indexing) allow me to find answers to the questions I might need to ask.

I realize this is very open ended, but the concept of logging is also open-ended, used different ways for different purposes. If you have problems with logging structures, let us know what design techniques to avoid as well.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.7MB) podcast or subscribe to the feed at iTunes and LibSyn.

ER Modeling

I saw someone recently note that an Entity-Relationship model is important for your database development project. That UML is important, and the model should be actively maintained. I agree.

However, I’ve usually not seen this done. Typically when I’ve seen an ER diagram in a shop, it resides on a wall, printed out on multiple sheets of paper. Often there are as many handwritten additions on the pages as there are marks from a printer. The diagram is also usually out of date, even allowing for handwritten notes.

Maintaining a diagram is hard. Deciding to actually spend time working on one instead of just talking about a few entities and building them, may be harder. This week, I wanted to ask you if you’re up to the challenge in your organization.

How many of you maintain an ER model for your database?

Do you have an ER diagram you’ve built? Whether it’s in ErWin, Visio, or Paint. Is there a diagram that’s actively maintained and that everyone can use for designing changes to the database(s) in use? Do people actually consult this and use it in design meetings? Let us know this week.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

No More Foreign Keys?

I struggle with removing FKs as a general idea. It’s presented in a piece from a developer. It appears to me that if you do this, you are merely moving the problem, and potentially causing other issues. Apart from Grant’s response on performance, which is very true, there are other challenges with microservices and separate databases for each service.
Let’s say we are doing something trivial, like a simple e-commerce site. I could have a separate database for products, one for orders, one for shipping actions, one for customers, etc. However as I scale, I’m not using intra-database queries or FKs/joins/other techniques to verify information for an order. Now I’m sending messages. At small scale, these are trivial. At large scale, my network traffic and messaging is vastly growing and potentially a bottleneck.
Also, since there is a “products” database, and an “orders” database and a “customers” database, I still can’t necessarily scale out these microservices beyond their own machine. Perhaps I can scale them higher in terms of each being as large a machine as I can have now with a single Oracle/SQL Server/MySQL/etc. box, but I’m still have a scaling issue. I also now have a new messaging problem I need to manage and architect. If I lose one database, how gracefully can my application degrade or does it start to fail with unforeseen interactions? Do I create more frustration when customers cannot place an order because we can’t find the customer db or because the site is down?
Certainly there are domains of problems that would work fine here. Spotify is probably a good example of this. There may be e-commerce systems that allow this, perhaps being willing to notify customers after some time (hopefully minutes, perhaps hours) that a product is out of stock because the message from the orders database didn’t get to the application. There are certainly ways to copy and replicate some  data, assuming that “fresh enough” works for your application. However that’s a business decision more than a coding decision. This also means that you also have a new load on your system to move data. I also think the more you work with singleton rows and single objects in an application context, the less you need RDBMS capabilities.
Ultimately I think FKs and strong RDBMS systems work very well in many situations. They work less well in many situations and your domain may qualify for those areas where a NoSQL, a multi-db, or other architecture works. We should certainly investigate and write about where things work and don’t work, but I wouldn’t assume the RDBMS with it’s PKs and FKs isn’t a valid, well thought out and incredibly useful architecture.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed