An Out of Date CAP

I was reading this piece from Martin Kleppmann asking us to stop using the CAP theorem to talk about distributed systems. It’s of interest to me because we often see this applied to databases, and it seems I’ve seen database platforms described as fitting along one of the sides of the CAP triangle.

However the complaint is good, although tough, reading. As I read through it, a number of the concepts and tradeoffs, and concerns are similar to what I see discussed in Azure systems, or any “cloud-like” distributed systems. We expect consistency and availability, as well as scaling through partitions. However, it seems that we are always trading consistency (what Keppmann notes is really linearizability) and availability in the data world. We simply can’t guarantee both of these things in a data world.

Or can we? At some point, perfection isn’t required, and we can tolerate some level of inconsistency in the results clients might receive. I’m sure many of you have dealt with this in the past, and perhaps even gone so far as to add the report execution time to outputs to reduce client complaints. If two sets of data are compiled at different times, most clients tend to understand some discrepancies.

Certainly many of us are starting to consider using database platforms that might not work like SQL Server. In those cases, availability and scale is often touted as the reason to abandon a RDBMS. On one hand, the lack of linearizability across nodes is often tolerable, and many of our businesses aren’t affected dramatically by them. On the other hand, if you can tolerate some delays in nodes, than perhaps SQL Server can work for you with multiple nodes in a replicated or AlwaysOn scenario.

Distributed systems are hard, and certainly your decision shouldn’t be a simple one when it comes to choosing an architecture. However it also helps to think deeply about the problem in terms of not only the CAP theorem, but also the in terms of practical measures such as latency and practical failures.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

Serious Hacking

There’s a piece that calls the US Office of Personal Management (OPM) data breach the biggest government hack ever. It might be, but give us a few months and I’m sure something will be bigger. After all, we constantly acquire more data, so the next breech is likely to contain more information. I’m also not sure most of us are actually getting much better at security.

There were a few notes about this that would apply to every company I’ve worked in. Such as the OPM not having a comprehensive list of devices and databases. I’m not sure any company does, and having worked with people that run SCOM-type systems, it’s a hard problem to solve. This doesn’t even cover the problems of Excel. Access, and soon, PowerBI data being scattered across systems.

However there was one problem I think we could fundamentally improve in most companies. The article noted that OPM didn’t have control over how it’s systems were configured, meaning an attacker could reconfigure things. Far, far too many companies allow a (too) large group of people to deploy changes to servers. Even when larger companies limit rights for developers, I’ve too often seen operations staff log in and allow developers to change systems to get them working.

As an industry, we really need to solidify and build better systems for ensuring the security of our hardware and software and preventing, or detecting, unauthorized changes. Certainly there will always be social engineering and other techniques that bypass security, but we should be able to prevent malicious changes to systems with solid architectures from our vendors/FOSS developers. We should also decide upon, and be sure, that our staff learn, understand, and follow, best practices.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

The New DBA

What does DBA mean? It’s traditionally meant Database Administrator for technologists. However it’s also been a synonym for a database developer, or even anyone that has raised privileges on a database system. The term has been expanded to people designing databases, and usually includes the BI developers or analysts that work with database related technologies.

In short, it’s been a vague term that means different things to different people. However as I look forward, I think there’s another interpretation I have. The people functioning as DBAs, are the Data Bridges to Applications. We take the data and find ways to ensure it’s useful to applications. We make sure it’s available. We connect systems together. We do the dirty work of cleaning data. We secure the bridge.

We really are the person that needs to answer any question about the database. Throughout my career, when a company has decided to use DBAs, I have found the DBA is usually the catch-all person that really needs to be able to do whatever is asked of them that might tangentially relate to a database.

While there have always been DBAs that tightly controlled their systems and acted as impediments to change, I think those positions are disappearing fast. More and more companies, if they even bother to hire DBAs, want those individuals to ensure data is useful. Whether that’s administering, developing, configuring, or anything else related to data, whether the data is in an RDBMS, Access, Excel, the cloud, NoSQL, any anywhere else, we need to be able to provide some solution to gets data to an application.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.1MB) podcast or subscribe to the feed at iTunes and LibSyn.

The Rename Game

As I’ve been talking about database development, Continuous Integration, and Continuous Deployment, I keep hearing complaints and concerns about the problems that occur when we make database changes. In particular, I often hear about breaking changes made to databases, that require some manual scripting work.

It seems that the most common breaking change is adding a not null column to a table with existing data, but certainly there are other problematic issues. The other common change is renaming objects. In the past, I’ve rarely renamed anything once I’ve deployed tables to production, but I’ve run into quite a few developers that note they regularly change names. This week I’m wondering how common this is for most of the data professionals out there.

How often do you rename objects?

It could be tables, views, columns, constraints, whatever. If you rename anything in your database, I’m curious if it’s a regular event. Is this an event that causes you issues with your software development process, let us know. I’d also be curious with how you manage tracking and deploying the code from development through to production when you are making these types of breaking changes. Are things still scripted and able to be automated or do you force manual steps.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

When Should We Be Cleaning Data?

I was reading Grant Fritchey’s update from the Business Analyst Conference  and noticed this quote: ” There were lots of sessions on the need to clean data, which implies that we need to do a better job at data collection in order to support this wild new world.” Grant was talking about the fact that many of the sessions seemed to imply that it’s mostly the processes around data that cause issues with data, not the people.

However is that really what we should do? Do we want to do more and more cleansing at the point where people enter data? I’m not so sure that’s the case. The more that I deal with various applications and forms, the less I want to see too many fields and values required. Certainly there are pieces of data that are necessary for action, but I often find there are lots of additional fields that analysts want, but are more of a burden than a necessity.

Most of us as data professionals design tables to handle the needs of an application. We put fields in tables that we expect to fill with data, based on a specification. However the real world is messy, and often the data we want to collect and store isn’t always available. The question is, do we force failures in the application or do we handle missing data?

I don’t want to eliminate DRI, or go to an EAV models for lots of data. However I do think that we need to ensure we allow defaults and work with developers to allow data in our systems that might not be complete now, but perhaps will be in the future. We should work with analysts to help them build reports that can handle defaults or missing fields. We can work with developers to allow applications to request updates to data later and then design ETL that can efficiently fill in the updated information.

Applications and software need to be flexible to work with problematic data. We have the ability, as data professionals, to help our clients still find meaning in data that might not be as complete as we’d prefer, or they’d prefer. However we can still find valuable information in what data they have..

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

QA for Data

Here’s a typical DBA scenario. You’re at your desk, working on something, when you get a trouble ticket, phone call, or tap on your shoulder. There’s a problem in production and an order, or some other important piece of data, is incorrect. Can you fix it?

Many of us are happy to oblige. Often we know what needs to be changed and can write the INSERT/UPDATE/DELETE statement to correct the issue. Even if we have to dig in further, we can usually find the data in a table and determine what should be changed.

However, what do we do then?

If you are asked the make data changes to production, do you QA the script? Do you save the scripts in a VCS or a log? Does anyone else examine the data or code? Do you scan the script yourself, have SQL Server parse it, or bother to look for triggers that might fire? Do you bother to put the script in a transaction with error handling?

I have to say that in most of my career, I didn’t bother with any of that. I’d examine a script, perhaps have another DBA or developer look at it, but most of the time I’d run it in production if I thought it was ready. I did often wrap the code in a transaction, with a query to check the results, and an immediate rollback to prevent blocking. However if I thought the results looked good, I’d rerun the query without the transaction. Most of the time that worked well. A few triggers and fat fingers caused issues that I had to fix again, sometimes after quite a bit of work to correct my mistakes.

It’s hard to prevent data change errors, especially if it’s something we do often, or we work in a small environment with limited staff. Ultimately, we might not even care because if we update the data incorrectly, we can run another update to fix the new issue. That’s not necessarily what most managers would want to hear, but it’s reality.

The one thing I have insisted upon in my environments was logging any queries run. A simple cut and paste with a timestamp. That way if things were done incorrectly, at least I know what was done.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

Deploy the Database First

One of the patterns I’ve seen in some environments is people are trying to deploy changes rapidly to their database backed applications. That’s not news, but what is interesting is some of them are staging the deployment of the database changes first. Not as in I deploy database changes at 8:00pm and then application changes at 8:30pm. These people try to deploy the database changes on Monday, and their application changes will follow on Tuesday, Wednesday, or even a month later.

It’s an interesting concept, though I think it requires a lot of forethought in your designs, as well as very tight coding from your front ends that won’t be disturbed by extra columns in a result set.  That’s not easy to do, but it’s certainly possible, and it can even be useful if you deploy feature flags extensively in your application.

As we become more dependent on databases for our applications, and our customers expect systems to be running constantly, I think it behooves us to find ways to make alter and enhance our applications without downtime. While there are patterns to keep applications running when the database goes down, I expect that the reality is that we need to find ways keep the database up as we alter it, which for me means making non-breaking changes.

I think it’s possible in many cases to upgrade a database over time by carefully planning your schema changes and accounting for those changes in your front end architecture. However it’s not easy, as I mentioned, and you do need to commit to very stable and careful programming practices for your developers.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

What’s Coming True?

It’s the start of the new year, and the first day of work for many of us. As I start to work this year, I see that much of my work is based on that planning we did at the end of last year. We made predictions for our business, set goals, and today begin to execute on things. However, we are assuming our predictions are somewhat accurate in order to achieve success. What if our predictions aren’t correct?

There’s no shortage of prognosticators out there, and I found a number of predictions about IT, the Cloud, and Microsoft. I have no idea if any of these will actually come true, but I wanted to ask you this week:

Which of these predictions will come true in 2015?

  • Increased automation and less staff
  • more BYOD acceptance and support
  • More telecommuting
  • containerization of software in the cloud (or the data center)
  • More hybrid applications using the cloud
  • IoT growth – more sensors, more data for you
  • Windows 10 will be a hit
  • SQL Server 2015 will come out
  • Windows Phone will become competitive
  • You’ll get hacked at your company.
  • You will encrypt your databases?

The IT trends listed are fairly general, and all of them are really underway now, so I’m not sure there’s much of a prediction there. For you individually, will you see more automation and less people? More BI and cloud usage? Any of the Microsoft predictions likely to come true? Do any these apply to your environment?

This isn’t an exhaustive list, and certainly there’s lots of people expecting 3D printing and more mobile technologies to emerge. If you think any of these will, or won’t, definitely come true, take a vote and we’ll rerun this thread at the end of the year. If you have other predictions, let us know as well.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.1MB) podcast or subscribe to the feed at iTunes and LibSyn.

2014: The Review

I tried to keep a running list of headlines from 2014, and as I look back at them, I find a few things standing out. The first item is that we got a new version of SQL Server. In April, SQL Server 2014 was released, which was a bit over two years after SQL Server 2012. There was a lot of interest and excitement in the Hekaton, In-Memory technology, but the reality of the limitations intruded and it seems relatively few people have been willing to upgrade for this technology.

In fact, I might argue that apart from Hekaton, this wasn’t necessarily worthy of a full release. PowerBI, BPE, the cardinality estimator changes, some Azure improvements and AlwaysOn changes, all were included, but this felt like a bit of a mish-mosh of features. We didn’t see many of the technologies from previous versions (Service Broker, Contained DAtabases, SSRS, SSIS, etc) enhanced or improved. With the additional costs for core licensing that were introduced in SQL Server 2012, it still seems that many companies are trying to continue to use SQL Server 2008 R2 and below to handle their workloads where possible.

This isn’t to say that the product hasn’t improved quite a bit. It’s just that the value received for the increased licensing costs is becoming lower. That concerns me a bit as other platforms mature at lower price points. We’ll see what this means as we move forward.

It does seem that 2014 was year of the data breech. We had Target, Yahoo Mail, Home Depot, Kmart, Sony, and more. I know there were plenty more, but these were the top ones I tracked in 2014. I expect more to occur in 2015, and I would not be surprised to find more attacks against smaller companies as the techniques and tools used by hackers spread. I wouldn’t be surprised to find hackers practicing on smaller targets, like the companies you and I work for. Security will become more important, so learn more, set up auditing, and continue to improve your monitoring.

This year we also see SQL Server really evolving for the professionals. We’ve had Hadoop use grow quite a bit, and a continued emphasis from Microsoft on the PowerPivot/Power Query/Tabular technologies. The press for BI technologies from both Microsoft and PASS, almost one and the same now, seems to be regular and consistent. I’m not sure if this push will become commonplace for most data professionals, but I do know quite a few BI consultants that are very busy. We will see how much adoption increases, but if more organizations don’t start using these technologies more, it’s not for a lack of trying.

We had lots of events in 2014, over 20 for me, and I expect to see more opportunities, in more places, for people to learn about SQL Server. More SQL Saturdays, more smaller conferences, and of course, plenty of big conferences (DevConnections/DevIntersection/PASS Summit) to choose from. If you want an event near you in 2015, think about organizing one. It doesn’t have to be a ton or work if you can get 3-4 people to help, and it seems there is no shortage of speakers to help teach people about the platform. Send a note if you’re interested, and if you move quickly, maybe I’ll come.

Steve Jones


The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.2MB) podcast or subscribe to the feed at iTunes and LibSyn.

How Often Does Production Drift?

One of the new products that Red Gate Software is working on is SQL Lighthouse, which is designed to track and alert you on “database drift“, which are the changes in your schema. I hadn’t really heard the concept in the past, though I’ve found there are a few people using the term to describe the changes being made to production that aren’t inline with the development project that created (and updates) the database.

Vendors that sell products backed by a SQL Server database, like Microsoft Dynamics, often consider database drift to be a violation of their EULA. When a customer changes the database, even by just adding indexes, it causes them support costs, though arguably those costs are lower than if no indexes were ever added by enterprising DBAs.

However this also happens in databases built by companies. It seems there’s no shortage of “quick fixes” in production, in addition to the “improvements” that DBAs and others seem to make to their live databases in order to solve some short term issue. I know there’s a need for this, and even seen recommendations to ensure production differs from development schemas. However I haven’t had many issues with this in the past and I’ve tried to prevent any drift.

Maybe because I’m a controlling DBA, or maybe because I’ve been lucky. In either case, while I’ve had to implement hot fixes and changes in production, it’s been rare. Most of the time I’ve been able to backfill these changes in development (and test) environments, or delay the fix until it could be tested.

However I’m sure there’s a need for a way to manage drift, and detect it as the number of databases you manage grows, so I’m excited by SQL Lighthouse and hope it helps DBAs avoid those late night phone calls by being aware of what’s changing in their systems.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn.