Relational By Default

I’ve read a few pieces from Vladimir Khorikov and I tend to like the thoughtful way in which he approaches building software. When I saw a link to his SQL v NoSQL databases, I was intrigued to see what his thoughts would be. It’s a good read, though with relatively few comments or debates posted at the time I read the article. I was hoping for more and I encourage you to add your thoughts.

The main contentions in the piece are that any of the simpler development techniques that we can use with NoSQL databases don’t remove the need for implementing data quality or management features. They just require those features to be implemented by the developer. Specifically the need to deal with multiple schema versions over time, or the requirement we enforce parent child relationships.

Perhaps that’s not too bad. After all, if we evolve schemas and the code can easily deal with the changes, that’s good. However I think the point about having relational storage when you can, and adding in other stores makes sense. This is what I really believe as well, since we often need to query and combine data, which is something relational stores do very well and very efficiently.

Ultimately I’m sure we will continue to see arguments about relational and NoSQL models being better or worse for particular problems. I actually welcome the arguments  because there are certainly domains of problems where one system works better than the other, and it’s good to develop some clarity about those particular problems. However there are also likely many situations where multiple platforms can work, and in those cases, we should use what works well for our team.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

NoSQL Inside SQL Server

There has been quite a debate the last few years between using NoSQL and SQL Server as data stores for different applications. We’ve seen a great deal of interest from developers in document stores as the data structures are closer to what a C# or Java developer sees in their code. Graph databases are becoming popular for certain types of data as well, since their query performance can be extremely powerful in some cases. There are also the scalability aspects of NoSQL systems touted as a better way to build Internet based applications.

However NoSQL databases aren’t the ultimate data store. There are pros and cons to using them, and while it might be easier to being development on one for an application developer, I would argue it’s just as easy to get caught mis-using a graph database or a document database as it is to poorly program a relational database.
This week, in advance of the SQL Server 2016 release, I caught a post from the SQL Server storage engine team showing how JSON storage inside of a table can provide some NoSQL-like benefits to applications. It’s a good idea, but like many NoSQL solutions, you need to spend a bit of time examining your particular requirements. If you need to update your JSON often, especially in large batches, this might not prove to be a good idea, especially if you need consistency.
I think ultimately that everything does relate to the CAP theorem, precisely because data has to exist somewhere, and anytime you need to duplicate it, move it, aggregate it, or anything to relieve some other issue, then there isn’t any magic taking place. You’re going to have to do work to move the data. Some platforms may make that easier, but I bet there are other drawbacks that you will have to consider.
Ultimately I think you need to pick the platform that best suits your environments. That’s not only the application requirements, but also the developers’ skills, the operational staff’s ability to manage the platform, and the maturity of the platform. Those are tough questions, and while I’m glad people experiment with new systems, I do hope that they think carefully about the positives and negatives, and more importantly, share their experiences with the rest of our industry.

Steve Jones

Pre-built or Ad Hoc

One of the advantages of NoSQL databases id that the schema and organization of data is very flexible. The various types of databases usually allow the schema or organization of data to vary across entries. I hesitate to call them rows, but essentially each time you add data to a store, you can alter the format of the data inserted.

For relational database professionals, this seems to be a recipe for disaster, with entirely too much data being captured in an un-organized fashion. At some point a user will want this data to be returned in a report format, which almost always seems to be rows and column related data, even behind the scenes of the incredible visualizations that appear in modern dashboards.

I had someone recently note that their users don’t want to write ad hoc queries or try and discern the meaning of varying structures of information. They want pre-built structures they can count on and use reliably to answer questions. I suspect many users don’t want to decode the meaning of structures that change, despite the fact that so many users want to reformat and change the shape of data in Excel. Those of you that have to re-import some of these spreadsheets know just how unstructured a set of rows and columns can become.

I really think that it is important that structures of data be decided upon and ordered in a known way so that users can easily understand the meaning behind the data. However we are gathering more and more data in new ways, from new sources, and we don’t have consistent ways of recording that information. That will continue in the future, and I do think that learning how to access new sources, like Hadoop, and present that data back to users in a familiar format will become a way to show you are a valuable resource for your organization.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.3MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

You Better Learn to Work at Scale

Early in my career, I worked on a large Novell Netware installation. We bought a server that was about 1/4 the size of a modern rack, which contained a number of blade slots each holding some type of hardware: CPUs, disk adapters, network adapters, and even disk drives. The disk slots held ~30MB drives, which were large for that time. Both in capacity, and physical size. Each of these monsters weighed a pound or more, and felt like it was built to withstand a fall from a table top.

That’s one reason that I’m amazed by the 64GB storage in my phone and stunned by the 500GB mSata card I recently bought. The storage capacities we have available to us today, in small companies and startups, are truly stunning to us as data professionals, but not to the people that have the space available. It seems on a regular basis people easily fill up the space they have and clamor for more. Whether that’s valuable data or not, I do know that much of that data still requires management and takes effort from us as data professionals.

We’re not going to get more help, so as our databases grow, we need to find ways to manage more data, of disparate forms. Not just rows in tables, but Excel files, binary files of all sorts, XML extracts, Office documents, Hadoop hives and who knows what else. We need to be better at working with varying patterns of data, and perhaps files, at scale in order to continue to efficiently manage our workloads.

Just like system administrators that are starting to deal with hundreds or thousands of virtual instances, we have to be able to write code that helps us manage data in large quantities. I like PowerShell for more and more tasks, but without a doubt, writing efficient T-SQL that works with sets, learning SSIS, package design patterns, and more will become the skills that differentiate the valuable and capable DBAs from those that just keep the lights on.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

No Handwaving Away the DBA

There’s a great quote I read, at the end of this article. It says: “…if you think that switching to NoSQL will just let you hand-wave away all of the challenges of running a database, you are terribly misguided.” The context is that all too often people looking to move away from some of the hassles of working with RDBMS platforms, which includes working with the DBA, haven’t completely thought through the issues.

I do think NoSQL has a place in the world. There are domains of problems that I’m sure Riak, MongoDB, and others, solve in a more efficient way than SQL Server, Oracle, MySQL, and other relational systems. I’m not sure what they are, and to some extent, I haven’t seen good guidance on where particular platforms excel. Most of the articles and pieces on choosing NoSQL seem to be trying to sell me “why a particular platform can replace my other one”, and telling me to add in things like transactions, but not explaining the drawbacks.

However in all platforms, we often forget that there are really two frames of reference that matter. We need quick ways to work with data, insert it, update it, query it, etc. This is the development frame of reference, and it often dominates discussions of platforms. For good reasons, as developers are expensive, but that’s only part of the system. We also need to consider the operational portion of managing data and applications. When I have those needs to rebuild indexes in relational platforms, or the requirement to periodically merge/remove old versions of documents, or even manage clustered, horizontally scaled resources, we need operational maturity.

In some sense the DevOps movement is built around merging these two frames of reference into the minds of all those involved. I hope that movement continues to grow and mature, and we learn that developers and operational staff are both necessary, and both need to function in a symbiotic, harmonious fashion.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at

Moving from NoSQL to an RDBMS

I saw this link about a company that moved (Goodbye, CouchDB) from a NoSQL database (CouchDB) to a relational one (MySQL) because of some problems they experienced. Sauce Labs provides testing services in the cloud for developers using the Selenium framework.

The first link is a blog post that talks about some of the things they initially liked about CouchDB and then the problems they experienced. Their move to MySQL showed better performance and stability over the CouchDB they were using. It’s an interesting read, not too long, and it seems well thought out.

I think that this isn’t a knock on NoSQL databases, and I do think there are problem sets they are well suited to solve. I don’t know that I think because your developers don’t like SQL is a good reason, but there are problems with scale and size that are better handled with some NoSQL solutions. I’m an RDBMS person, but I don’t expect everyone to be that way.

However this does say a few things about NoSQL maturity. Many of these products have gotten popular quickly, with some success stories, but that doesn’t meant they are necessarily ready for your application unless you have staff that have lots of experience with the tools. These are young platforms, they have problems inside the code, especially if you deviate from the way the authors used the platform, and you may have issues.

Experience matters. Just as many people have issues with SQL Server scaling when they don’t know how to build a good data model or write efficient queries, I suspect that picking a NoSQL database because it’s cool, or seems easy to implement, or it worked well at some other company. I would almost always suggest that you stick with platforms that your staff is very experienced with, since they are likely going to have issues at scale on every platform, and their experience can make a difference.

Note that this assumes you have good people. Just because you have a DBA with 10 years of experience doesn’t mean he or she is necessarily good. They could easily have 1 year of experience 10 times. You want people that have worked on a variety of systems in different areas.

NoSQL is Not the Answer

The NoSQL Logo

There’s a lot of talk and press around NoSQL databases, which are a class of systems that are different from relational databases (RDBMS). Most of us are familiar with the competitive RDBMS’s to SQL Server; systems like Oracle, DB/2, MySQL, PostgreSQL, and a few more. NoSQL systems are less well known, going by names like Cassandra, MongoDB, and the best known system used by Google, BigTable.

It seems there are a lot of ranting and raving about which system is better, and even some comparisons that look at performance in different ways that we use systems. I found a SQL Server v MongoDB comparison recently. It seems to show that SQL Server lags way behind MongoDB in terms of simple insert, update, and select operations. I’m not sure it’s the best real world test, but it shows some interesting results.

Those comparisons  are important, but ultimately we do a lot more than simply insert rows and select from basic small sets of data in many of our applications. I see many systems making use of complex queries with aggregations, multi-table joins, and complex sorting operations. These might work well in a NoSQL system, but they also might not. Who knows how well complex business transactions and reports will perform on these platforms. Someone will have to try and see if these systems can handle those needs.

NoSQL is worth examining further and learning about. I would not be surprised to see some of the features, such as easier scale-out, coming to SQL Server in the next few years. However I’m certainly not ready to view NoSQL systems as any type of evolution of the RDBMS. I think it’s much more likely that NoSQL systems solve a certain domain of problems better than a row-oriented RDBMS, much like column-oriented databases seem to be much more efficient at working with some types of data warehouses.

Steve Jones

BTW, if you want to listen to a little NoSQL humor, check out this video. It’s Not Safe For Work (NSFW), so be sure that you have a properly configured “cone of silence” so as not to offend others.

The Voice of the DBA Podcasts

NoSQL Basics

Last year there was quite a bit of press devoted to the NOSQL movement, emphasizing the scalability and power of this class of databases. They have not been well known, but have been used to power some of the very well known Internet companies out there. Companies like Google, Facebook, and others are using NoSQL databases in places, achieving the scalability that many RDBMS’s have struggled to achieve without enormous costs.

I ran across an article that talks about some of the basics of what is classified as a NOSQL database. It’s written more for managers, and it does a good job of warning them that NOSQL is not the “answer” to all your performance problems. It’s a piece of technology that fits in some places and in some situations.

Should you care about NOSQL? I think you should, if for no other reason than some manager will ask you about it at some point. Or a developer will want to build an application using a NOSQL database. A flat “no” from the data group is counter-productive,  and is not appropriate. If you are going to rationally argue against a new technology, you should have some basic understanding of how it works, what problems it solves, and engage in a logical debate.

There are lots of good NOSQL articles out there, and perhaps the first thing you should learn is that there are different types of technologies for solving different problems. Cassandra is not interchangeable with MongoDB, and Voldemort is different from the other two. None of them is a direct replacement for SQL Server (or another RDBMS), but they do work in different areas. If you’ve never seen the CAP triangle, take a look at it now.

Many of you are data professionals, and SQL Server specialists, and that’s fine. That doesn’t mean you shouldn’t learn a little something about other technologies, and how they might compare and contrast with SQL Server. Here’s a nice basic article from SQL Server Master to get you started.

Steve Jones

(originally published at


Plan for Capacity

In many applications, it seems that performance often falls off a cliff at some point because no one is planning for capacity increases. Everyone talks about the potential load on the system when it’s first being built. However since we often over-buy hardware, we rarely monitor the load until something goes wrong.

I was reminded up this with a write-up from Simon Sabin recently that examined a lesson on capacity planning from FourSquare. Apparently FourSquare had some capacity issues with their MongoDB server. Despite the realization that they were growing and the addition of a second node to their database, they didn’t learn the lesson and didn’t add a third or fourth node until they were over capacity.

Most of us don’t experience this type of hyper-growth in our systems. The majority of applications I’ve seen have undergone a quick growth in load at some point, but then leveled off to a steady, or slowly growing load on the system. However that doesn’t necessarily change the planning needed for growth. It just means that you need to grow your system at different times than if you were doubling in size every few months.

A solid monitoring plan, and buy-in from management that you will periodically need to upgrade your servers, is necessary. Typically you will add storage over time, perhaps RAM, and potentially new CPUs or servers. You might not be able to exactly predict when you will have performance issues, but you can plan for issues. You can proactively let management know that space is running low, or that you may run into issues if data sets continue to grow in size.

Funding is often hard to get, but by preparing people in advance that your database will need improvements over time, you are more likely to be able to respond quickly when there are performance issues.

Steve Jones


I found a comparison of MongoDB and SQL Server recently, from a C# MVP and a person that works with SQL Server regularly. It’s an interesting read, and on the surface, you might think that MongoDB is much better than SQL Server in terms of raw performance.

That certainly might be the case, especially for large data loads. MongoDB has the ability to scale out widely, and that always can help with performance. Even in single server environments, I have no doubt that MondDB has less overhead, and likely requires less resources, so it can be faster.

However it also lacks some of the robustness and security that comes with an ACID compliant systems. It’s entirely possible that you could lose work you think you have committed in a MongoDB system. Not likely, but possible.

These are, however, very different products. SQL Server is designed to handle some domains, and some loads very well. MongoDB has other purposes and ideas in it’s design. They aren’t interchangeable in some loads, but for many simpler systems, like web site back ends, they might either work fine.

I like SQL Server, and I think it offers a very rich, and well rounded environment on which to store and work with your data. I’m not sure how MondoDB would fair in some situations, especially those that involve lots of calculations and summarizing of data sets, but it does seem to be a very robust database system for managing documents and scaling widely across servers.

It will be interesting to see in the future of NoSQL systems, like MongoDB, start to become more prevalent in those places where ACID compliance is not critical.