How Much Eventual Consistency Do You Have?

There’s a never ending debate about NoSQL v RDBMS systems that seems to polarize the people that prefer one technology over the other. In fact, it seems that every time I talk to someone that dislikes NoSQL technology in general, the topic of eventual consistency comes up. The idea that not all data in our system might be up to date seems to be one of the concepts that scares many DBAs most of all.

I was chatting with some people recently about complex SQL Server configurations, and the topic of replication came up. While this is a technology with so much potential, it seems that replication has been a bit neglected by Microsoft and is both amazing and brittle in its implementation. However, if you think about it, replication results in data that isn’t consistent across the systems.

I’m sure many would argue that this isn’t an issue, but how many of you have businesses that make decisions or have processes built on data in replicated databases? I’m sure plenty of you, and most of the time, the data is consistent enough for use by our clients.

There are plenty of ways in which we implement data movement across our databases that results in potentially inconsistent results for our clients. In fact, I’ve had no shortage of discussions with clients that can’t understand why two reports run minutes apart show different results.

Today I’m curious how many of you have systems that your businesses depend on where the data is eventually consistent because of some technology that moves information from one database to another. Perhaps you might even share some of the tricks you use to ensure that delays or problems in your transfer process are detected and fixed before your clients realize just how inconsistent their data might be.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.8MB) podcast or subscribe to the feed at iTunes and LibSyn.

NoSQL Complaints

I think that NoSQL databases are a really interesting set of technologies, however they aren’t necessarily a panacea for many problems, including scalability. There are likely some problem domains where a particular NoSQL technology might work better than an RDBMS, and perhaps some situations where many types of databases would work well.

I ran across a piece that looked at some of the complaints that companies have with NoSQL technologies. It’s a generic piece, perhaps too generic, since a graph database is not like a document database. However it does talk about the issues that companies see, such as performing analytic queries. While a row-based RDBMS isn’t ideal for these queries, often the platforms work very well for a wide variety of reporting requirements.

I do think some of the problems people have noted have to do with the maturity of the systems. There are a lack of references for how to run a high uptime NoSQL database, not many data models to review that may or may not work well, and relatively few administrative techniques to maintain these databases. The problem isn’t that NoSQL databases can’t meet your needs, or run quickly, or provide many 9s of uptime, but rather many enterprise professionals might not know how to find the information to accomplish these goals. That’s something easily solved across time.

Ultimately I think the NoSQL world has a lot of maturing to do, and we, as professionals and media, need to start to differentiate among the various flavors of databases. I am glad to start seeing articles and blogs refer to graph databases when they cover a product like Neo4j rather than writing NoSQL. Slowly I am seeing MongoDB referred to more often as a document database, not a NoSQL database. It’s good to start to separate technologies and not confuse one with another.

One thing in the piece I did notice is that it’s likely that we will see more platforms and products incorporate both RDBMS and non-RDBMS technologies together. We see that in SQL Server already with the addition of columnstore technologies in the last few versions and Polybase to SQL Server 2016. I expect we’ll also start to see some of the capabilities of DocumentDB added to SQL Server, along with, perhaps, graph based technologies in the future.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.7MB) podcast or subscribe to the feed at iTunes and LibSyn.

Relational By Default

I’ve read a few pieces from Vladimir Khorikov and I tend to like the thoughtful way in which he approaches building software. When I saw a link to his SQL v NoSQL databases, I was intrigued to see what his thoughts would be. It’s a good read, though with relatively few comments or debates posted at the time I read the article. I was hoping for more and I encourage you to add your thoughts.

The main contentions in the piece are that any of the simpler development techniques that we can use with NoSQL databases don’t remove the need for implementing data quality or management features. They just require those features to be implemented by the developer. Specifically the need to deal with multiple schema versions over time, or the requirement we enforce parent child relationships.

Perhaps that’s not too bad. After all, if we evolve schemas and the code can easily deal with the changes, that’s good. However I think the point about having relational storage when you can, and adding in other stores makes sense. This is what I really believe as well, since we often need to query and combine data, which is something relational stores do very well and very efficiently.

Ultimately I’m sure we will continue to see arguments about relational and NoSQL models being better or worse for particular problems. I actually welcome the arguments  because there are certainly domains of problems where one system works better than the other, and it’s good to develop some clarity about those particular problems. However there are also likely many situations where multiple platforms can work, and in those cases, we should use what works well for our team.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

NoSQL Inside SQL Server

There has been quite a debate the last few years between using NoSQL and SQL Server as data stores for different applications. We’ve seen a great deal of interest from developers in document stores as the data structures are closer to what a C# or Java developer sees in their code. Graph databases are becoming popular for certain types of data as well, since their query performance can be extremely powerful in some cases. There are also the scalability aspects of NoSQL systems touted as a better way to build Internet based applications.

However NoSQL databases aren’t the ultimate data store. There are pros and cons to using them, and while it might be easier to being development on one for an application developer, I would argue it’s just as easy to get caught mis-using a graph database or a document database as it is to poorly program a relational database.
This week, in advance of the SQL Server 2016 release, I caught a post from the SQL Server storage engine team showing how JSON storage inside of a table can provide some NoSQL-like benefits to applications. It’s a good idea, but like many NoSQL solutions, you need to spend a bit of time examining your particular requirements. If you need to update your JSON often, especially in large batches, this might not prove to be a good idea, especially if you need consistency.
I think ultimately that everything does relate to the CAP theorem, precisely because data has to exist somewhere, and anytime you need to duplicate it, move it, aggregate it, or anything to relieve some other issue, then there isn’t any magic taking place. You’re going to have to do work to move the data. Some platforms may make that easier, but I bet there are other drawbacks that you will have to consider.
Ultimately I think you need to pick the platform that best suits your environments. That’s not only the application requirements, but also the developers’ skills, the operational staff’s ability to manage the platform, and the maturity of the platform. Those are tough questions, and while I’m glad people experiment with new systems, I do hope that they think carefully about the positives and negatives, and more importantly, share their experiences with the rest of our industry.

Steve Jones

Pre-built or Ad Hoc

One of the advantages of NoSQL databases id that the schema and organization of data is very flexible. The various types of databases usually allow the schema or organization of data to vary across entries. I hesitate to call them rows, but essentially each time you add data to a store, you can alter the format of the data inserted.

For relational database professionals, this seems to be a recipe for disaster, with entirely too much data being captured in an un-organized fashion. At some point a user will want this data to be returned in a report format, which almost always seems to be rows and column related data, even behind the scenes of the incredible visualizations that appear in modern dashboards.

I had someone recently note that their users don’t want to write ad hoc queries or try and discern the meaning of varying structures of information. They want pre-built structures they can count on and use reliably to answer questions. I suspect many users don’t want to decode the meaning of structures that change, despite the fact that so many users want to reformat and change the shape of data in Excel. Those of you that have to re-import some of these spreadsheets know just how unstructured a set of rows and columns can become.

I really think that it is important that structures of data be decided upon and ordered in a known way so that users can easily understand the meaning behind the data. However we are gathering more and more data in new ways, from new sources, and we don’t have consistent ways of recording that information. That will continue in the future, and I do think that learning how to access new sources, like Hadoop, and present that data back to users in a familiar format will become a way to show you are a valuable resource for your organization.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.3MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

You Better Learn to Work at Scale

Early in my career, I worked on a large Novell Netware installation. We bought a server that was about 1/4 the size of a modern rack, which contained a number of blade slots each holding some type of hardware: CPUs, disk adapters, network adapters, and even disk drives. The disk slots held ~30MB drives, which were large for that time. Both in capacity, and physical size. Each of these monsters weighed a pound or more, and felt like it was built to withstand a fall from a table top.

That’s one reason that I’m amazed by the 64GB storage in my phone and stunned by the 500GB mSata card I recently bought. The storage capacities we have available to us today, in small companies and startups, are truly stunning to us as data professionals, but not to the people that have the space available. It seems on a regular basis people easily fill up the space they have and clamor for more. Whether that’s valuable data or not, I do know that much of that data still requires management and takes effort from us as data professionals.

We’re not going to get more help, so as our databases grow, we need to find ways to manage more data, of disparate forms. Not just rows in tables, but Excel files, binary files of all sorts, XML extracts, Office documents, Hadoop hives and who knows what else. We need to be better at working with varying patterns of data, and perhaps files, at scale in order to continue to efficiently manage our workloads.

Just like system administrators that are starting to deal with hundreds or thousands of virtual instances, we have to be able to write code that helps us manage data in large quantities. I like PowerShell for more and more tasks, but without a doubt, writing efficient T-SQL that works with sets, learning SSIS, package design patterns, and more will become the skills that differentiate the valuable and capable DBAs from those that just keep the lights on.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

No Handwaving Away the DBA

There’s a great quote I read, at the end of this article. It says: “…if you think that switching to NoSQL will just let you hand-wave away all of the challenges of running a database, you are terribly misguided.” The context is that all too often people looking to move away from some of the hassles of working with RDBMS platforms, which includes working with the DBA, haven’t completely thought through the issues.

I do think NoSQL has a place in the world. There are domains of problems that I’m sure Riak, MongoDB, and others, solve in a more efficient way than SQL Server, Oracle, MySQL, and other relational systems. I’m not sure what they are, and to some extent, I haven’t seen good guidance on where particular platforms excel. Most of the articles and pieces on choosing NoSQL seem to be trying to sell me “why a particular platform can replace my other one”, and telling me to add in things like transactions, but not explaining the drawbacks.

However in all platforms, we often forget that there are really two frames of reference that matter. We need quick ways to work with data, insert it, update it, query it, etc. This is the development frame of reference, and it often dominates discussions of platforms. For good reasons, as developers are expensive, but that’s only part of the system. We also need to consider the operational portion of managing data and applications. When I have those needs to rebuild indexes in relational platforms, or the requirement to periodically merge/remove old versions of documents, or even manage clustered, horizontally scaled resources, we need operational maturity.

In some sense the DevOps movement is built around merging these two frames of reference into the minds of all those involved. I hope that movement continues to grow and mature, and we learn that developers and operational staff are both necessary, and both need to function in a symbiotic, harmonious fashion.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at

Moving from NoSQL to an RDBMS

I saw this link about a company that moved (Goodbye, CouchDB) from a NoSQL database (CouchDB) to a relational one (MySQL) because of some problems they experienced. Sauce Labs provides testing services in the cloud for developers using the Selenium framework.

The first link is a blog post that talks about some of the things they initially liked about CouchDB and then the problems they experienced. Their move to MySQL showed better performance and stability over the CouchDB they were using. It’s an interesting read, not too long, and it seems well thought out.

I think that this isn’t a knock on NoSQL databases, and I do think there are problem sets they are well suited to solve. I don’t know that I think because your developers don’t like SQL is a good reason, but there are problems with scale and size that are better handled with some NoSQL solutions. I’m an RDBMS person, but I don’t expect everyone to be that way.

However this does say a few things about NoSQL maturity. Many of these products have gotten popular quickly, with some success stories, but that doesn’t meant they are necessarily ready for your application unless you have staff that have lots of experience with the tools. These are young platforms, they have problems inside the code, especially if you deviate from the way the authors used the platform, and you may have issues.

Experience matters. Just as many people have issues with SQL Server scaling when they don’t know how to build a good data model or write efficient queries, I suspect that picking a NoSQL database because it’s cool, or seems easy to implement, or it worked well at some other company. I would almost always suggest that you stick with platforms that your staff is very experienced with, since they are likely going to have issues at scale on every platform, and their experience can make a difference.

Note that this assumes you have good people. Just because you have a DBA with 10 years of experience doesn’t mean he or she is necessarily good. They could easily have 1 year of experience 10 times. You want people that have worked on a variety of systems in different areas.

NoSQL is Not the Answer

The NoSQL Logo

There’s a lot of talk and press around NoSQL databases, which are a class of systems that are different from relational databases (RDBMS). Most of us are familiar with the competitive RDBMS’s to SQL Server; systems like Oracle, DB/2, MySQL, PostgreSQL, and a few more. NoSQL systems are less well known, going by names like Cassandra, MongoDB, and the best known system used by Google, BigTable.

It seems there are a lot of ranting and raving about which system is better, and even some comparisons that look at performance in different ways that we use systems. I found a SQL Server v MongoDB comparison recently. It seems to show that SQL Server lags way behind MongoDB in terms of simple insert, update, and select operations. I’m not sure it’s the best real world test, but it shows some interesting results.

Those comparisons  are important, but ultimately we do a lot more than simply insert rows and select from basic small sets of data in many of our applications. I see many systems making use of complex queries with aggregations, multi-table joins, and complex sorting operations. These might work well in a NoSQL system, but they also might not. Who knows how well complex business transactions and reports will perform on these platforms. Someone will have to try and see if these systems can handle those needs.

NoSQL is worth examining further and learning about. I would not be surprised to see some of the features, such as easier scale-out, coming to SQL Server in the next few years. However I’m certainly not ready to view NoSQL systems as any type of evolution of the RDBMS. I think it’s much more likely that NoSQL systems solve a certain domain of problems better than a row-oriented RDBMS, much like column-oriented databases seem to be much more efficient at working with some types of data warehouses.

Steve Jones

BTW, if you want to listen to a little NoSQL humor, check out this video. It’s Not Safe For Work (NSFW), so be sure that you have a properly configured “cone of silence” so as not to offend others.

The Voice of the DBA Podcasts

NoSQL Basics

Last year there was quite a bit of press devoted to the NOSQL movement, emphasizing the scalability and power of this class of databases. They have not been well known, but have been used to power some of the very well known Internet companies out there. Companies like Google, Facebook, and others are using NoSQL databases in places, achieving the scalability that many RDBMS’s have struggled to achieve without enormous costs.

I ran across an article that talks about some of the basics of what is classified as a NOSQL database. It’s written more for managers, and it does a good job of warning them that NOSQL is not the “answer” to all your performance problems. It’s a piece of technology that fits in some places and in some situations.

Should you care about NOSQL? I think you should, if for no other reason than some manager will ask you about it at some point. Or a developer will want to build an application using a NOSQL database. A flat “no” from the data group is counter-productive,  and is not appropriate. If you are going to rationally argue against a new technology, you should have some basic understanding of how it works, what problems it solves, and engage in a logical debate.

There are lots of good NOSQL articles out there, and perhaps the first thing you should learn is that there are different types of technologies for solving different problems. Cassandra is not interchangeable with MongoDB, and Voldemort is different from the other two. None of them is a direct replacement for SQL Server (or another RDBMS), but they do work in different areas. If you’ve never seen the CAP triangle, take a look at it now.

Many of you are data professionals, and SQL Server specialists, and that’s fine. That doesn’t mean you shouldn’t learn a little something about other technologies, and how they might compare and contrast with SQL Server. Here’s a nice basic article from SQL Server Master to get you started.

Steve Jones

(originally published at