The Biggest Data Breech (For Now)

I keep reading the words “the largest data breach in history” in a variety of stories. In fact, given the tremendous growth of data acquisition, I’m guessing that this headline will continue to repeat itself over and over. I think I’m getting to the point where I’d rather just see a story say that xxx million customers were affected. At least then I’d be able to easily put some scale to the loss of data.

What’s interesting in this case involving JP Morgan is there are indictments being handed down, to at least two men that somehow participated in hacks that copied over 100million people’s data. JPMorgan admits 76 million households and 7 million small businesses were compromised, which isn’t 100, but perhaps there’s something I’m missing. However the data wasn’t just sold, but rather hackers used the information to market stocks to the individuals compromised. That’s an interesting level of sophistication, and a scary one.

Can you start to imagine criminals using the information intelligently to not directly sell the data but to make a secondary use of the information. Perhaps they will enagage social engineering by bundling the information with other data to perform some other attack on individuals? It’s entirely possible that we will see more sophisticated uses in the future as criminals work to evade or avoid the fraud detection systems that have been put in place.

I have no doubt that bigger data breaches are coming. Perhaps we could reduce the impact and frequency with better security frameworks and development practices, but I’m not sure that any company out there will place a high priority on security over ease of access and speed of development. I do continue to hope that market forces will drive companies to build better detection and protection mechanisms, and our vendors will build better security mechanisms into all platforms.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Cloud Security Issues

Bruce Schneier wrote a three part series (part 1part 2part 3) on cloud computing recently, part of a debate at the Economist. It’s a general look at the cloud from a few perspectives, and I think the thoughts are interesting. Whether they apply to you, or to what extent, you’ll have to decide.

The first part asks if companies should use cloud services. I love the answer at the beginning: “Yes. No. Yes. Maybe. Yes. Okay, it’s complicated”.

The decision is complicated and it’s not a binary decision. You may choose to use a cloud service like Dropbox to share video files, but not move any of your databases or Excel spreadsheets to the cloud. Your company might choose to outsource email, but keep all sales, finance, and inventory applications in house, or vice versa.

I think as each of us debates the decision, we’ll be driven by data. Not only cost data from each side, not only a risk or security analysis, but actually by the data we are talking about. We have to consider the risk of losing a particular set of data through cloud provider incompetence or disclosure to third parties (successful hacks or government intrusion). As data professionals, I’d like to think we’ll be intimately involved with the discussions and arguments about the reliability, security, performance, and control we need over our data. 

Is source code too valuable to trust outside the company? Is it worth managing email? Is a service providing CRM a better choice? There are no easy answers here. I’ve said more than a few times that I would never bother setting up or managing an email server again. However as I think about it, that might not be true. If I worked for a law firm, could I trust anyone outside of my company to prevent a breach of client confidentiality? The implications are unknown here and I wonder if a custom Gmail or Office365 solution is even defensible?

More and more, I think any debate with regards to cloud computing has to begin with “it depends” and dive deeply from there into the potential risks and rewards.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Normalizing Performance

This week I ran across a post from Glenn Berry on hardware performance that I thought framed the concept of performance value well. In the post, Glenn talks about the latest TPC-E benchmark and the SQL Server performance values that vendors release. Glenn mentions that he always tries to divide out the numbers to provide better comparisons for the various hardware choices.

That makes a lot of sense to me, as I think very few of us would be able to afford the top of the line systems on which vendors run the benchmarks. Most of us try to somehow compare the results in some way and then make our own decisions for our smaller systems. I don’t know many people that run 36 core machines, but I do know lots that need to decide which 4 or 8 core systems they should choose.

The idea of normalizing performance to smaller loads is something we do often. We need to do this, because we often can’t get the same size, scale, or specifications in our test systems as we have in production. As much as we’d like to have them, resources are limited, and we do need to have some way of extrapolating the results in our development and test systems forward to production.

Glenn has a way of doing this for CPUs, and while you might not agree with his method, at least he has an organized way of doing things, and then letting empirical results provide feedback on whether this works well. You should do the same thing, whether you’re trying to gauge disk or T-SQL speed. Develop a hypothesis (or read about how other do so)  for measuring your performance on a lesser, and then your primary system. Take time to run some test the same way, even if it’s single query performance on a production system while it’s live.

You won’t get perfect results and scalability, but you’ll develop a few metrics that allow you to determine if your decisions in development will have severe negative impacts. There still might be problems, but you should eliminate more and more of these over time.

Better Presentations–Hide those Windows

This is part of a series of tips for speakers to make your presentations better.

I wanted to give some specific SQL Server presentation items that have bothered me recently. These aren’t big things, but they do cause problems for attendees, and that might mean the difference between someone learning what you are presenting and getting lost because they can’t easily see.


How does this look?


It’s bad. Imagine if you were 15 feet back from the presenter, which is how this looks on a screen. I can barely see code.

If you look at the Object Explorer, there’s this little item in there .I’ve highlighted it below.


There’s also one on the Properties window.


In fact, my SQL Test window at the bottom, most SSMS add ins and  Visual Studio windows have them.

Click them. They’ll hide the windows, like so.


This is a much cleaner view of things.

But, Steve, you’ll say. I need those windows. I get it, I need them, too. They’re on the side of your screen and you can pop them open. They’ll stay open when you work in them, and disappear when you don’t.

Gone when I don’t need it.


Here when I do:


It’s a quick tip, and it’s easy to learn. Once you practice with hiding and using windows, I’m sure you’ll find that you work more efficiently all the time, not just when on stage.


I’ve always wondered about this. When I create a stored procedure I do this:

  @param1 int
   -- add code here

As is often the case, I realize that I’ve made some mistake and need to change the code later. So I’ll do this:

  @param1 int
   -- add better code here

In both cases, I’ve repeated lots of the code that I used the first time, though hopefully less of the bugs. If I create a function or view, I do something similar. However when I build a table, I do this:

 ( MyInt int

If I decide that’s not enough data storage, and it’s likely not, I would do this:

  ADD MyChar varchar(50);

We’re used to this, but why do we do this? Why not this?

 ( MyInt int
 , MyChar varchar(50)

It’s almost as though DDL mixes the idea of code submission with architectural scaffolding. It’s inconsistent, and it’s the big reason why we can’t use comments in our table code like this:

 ( MyInt int  -- integer to store a pointer to this row, requires unique index for integrity
 , MyChar varchar(50) -- random value of some data I need to store for this example.

I don’t have any hopes that things will change, but it does make me wonder why SQL, which is often simple and highly versatile with a few consistent structures, would create this strange inconsistency.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and Mevio . feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at

Two Steps Ahead


think ahead
Are you thinking ahead? Using the data from your systems to be proactive?

Exceptional DBAs do more than respond to events and issues in their environments. In many cases, I think they even go beyond using metrics that detect problematic activity on their systems before users notify them. I think the best DBAs will actually mine the information they have about their systems to anticipate problems in advance.

In the past I’ve had monitoring systems that would respond to issues, and I had alerts setup on the system to notify of unusual events, like an unexpected data growth. What I had started to do before I became a manager was start to write system checks that anticipated future problems and allowed me as much lead time as possible to prepare for issues. An example of this was a set of queries I wrote that calculated data growth for all databases on an instance and then used that to calculate how many days would elapse before I ran out of space on the data drives.

You can write similar queries to look for other trends. Tracking the execution times of often-run queries, or those queries which are important to the application can allow a DBA to find potential issues. If the execution times are growing, the DBA can anticipate a problem occurring in the near future and begin taking action to rewrite, tune, change indexing, or some other measure. A broad spectrum of queries taking longer might be an indication that hardware needs to be upgraded. There’s even a site devoted to metrics.

Instrumentation is important in understanding, analyzing, and predicting system performance. More and more tools are being released to gather detailed metrics on .NET code, in browsers, and more, but there is a wealth of information inside SQL Server on the performance of the platform. A little work can help you track and monitor the performance of your system and proactively maintain performance before your users complain.

Steve Jones

The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.

Manage By Delegation

Powershell might be a great skill if you need to manage lots of instances.

More and more SQL Server instances are being deployed all the time. In fact, with the ease with which we can build a new virtual machine (VM) through snapshotting and cloning, it seems that many administrators are finding that the number of servers for which they are responsible might be doubling or tripling.

Even moving to the cloud doesn’t completely remove the need for some administration of your data and databases, though it does require you to rework the type of administration that you perform. I foresee more hybrid solutions over time, which will require DBAs to not only manage data, but help analyze the financial impacts of moving data (and analysis) to, or back from, the cloud.

In SQL Server 2008 we had the chance to begin managing our servers through a set of declared rules with  Policy Based Management (PBM). I haven’t seen that feature take off, and it seems relatively few people are using PBM to manage their servers. I think it’s a great platform for ensuring that your instances are conforming to certain rules, though I think there is a bit of creativity needed to ensure that this system works well for you.

Powershell is becoming integrated into all Microsoft products. Virtually everything in SQL Server, perhaps even every thing by now, can be managed through Powershell scripts that access the SMO objects. I hear various people say that Powershell is a critical skill for DBAs of the future. I’m not sure of that, but I do think it will be used more and more if you have the need to perform repeated actions on multiple servers. Whether you use it now or not, it doesn’t hurt to learn how it works and what it can do for you.

It just might be the tool to make your job easier as you get more and more instances to manage, something that seems to happen more and more.

Steve Jones

The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.

Wasting Time

Wasting time is wasting the talent you have.

I ran across this infographic on wasting time at work. From the title I was thinking this would point out the ways in which people avoid work, and perhaps it does, but it’s really geared towards showing us the framework and structure that many people have for work. I agree with the first two sections of the page that show email and meetings waste a lot of time. I certainly think they have in many jobs for me, especially when they are used as “catch-all” techniques for including everyone that is remotely relevant to an issue.

Interruptions at work are hard to quantify as a problem. They definitely can be, but stopping by someone’s office to ask a question or take a quick break can be a way to recharge yourself between long periods of concentration. The issues come into play when the other person is in the middle of focusing on a task and you force a context switch on them. Recovering from the interruption can take time, time that’s often wasted as a person tries to remember exactly what they were focusing on.

Meetings certainly interrupt the day, but since they are often planned in advance, you can be mentally prepared for the break. It makes me wonder if there wouldn’t be some benefit to scheduling some “open time” in your day where you plan on taking a break. Others that needed a minute of your time would know to come find you at that time.

Time is very valuable, one of the most valuable resources we have at work. Management should be aware of this and working to limit interruptions, whether through email, meetings or anything else that prevents work from getting done. The fewer meetings and emails you require of your developers, the more time they have to work on the tasks they are being paid to complete.

Steve Jones

Jacks of all trades

I'm a jack of all trades at the ranch, but not in software.

Does your envrionment look like the one at Instagram? I’d bet that you ha there are a few of you that have an applicaiton or two that contains as many servers, components, pieces and parts, all held together with the proverbial duct tape and baling twine. I think I’ve had a few environments that were close to this complicated, but in general I try to avoid this type of mish mosh of technologies tools and platforms.

When I look at the SQLServerCentral architecture, while much smaller, still scales nicely on a single database server (clustered) and a web server. If I needed more performance, I’d hope that I could do something more similar to the setup at StackOverflow, with better development and fewer parts of my architecture than adding the type of complexity that powers Instagram.

That’s not to say that one environment is better than the other. I know that the staff at Instagram is learning a lot about integrating disparate systems, building new tools that can better manage their environment. As I read through the list, I’m not sure it’s a lot different than some of the environments I’ve worked in. If I listed all the pieces of software I’ve used in some applications, it might be just as complex, though it didn’t feel that way at the time.

I tend to prefer a simple environment, using as few pieces of software as necessary, but using the pieces that are appropriate for the job. Rather than build a complex XML processor, or fumble with XML in T-SQL, if I could buy a module that handled that function, I’d be happy to do so.

Working in technology is about choices, making the build/buy decision over and over, on a regular basis, and making the best decisions we can.

Steve Jones

The Voice of the DBA Podcasts

Brent Ozar Blitz at SQL Server Connections

The first session of the day for me was Brent Ozars Blitz talk on how to quickly get information on a new server. I have seen Brent’s video on this and read some of his blog posts, but i wanted to see him go over the script in a session.

Brent is funny and has a way of interacting with the audience that makes his talks enjoyable. In this one, he walks you through the script, taking you through the list of items that he finds important to check when he first takes over a server. You can get the script from, or search Brent and Blitz in your favorite engine.

It’s important to go over certain things in a new server, and i would agree with what Brent checks. He looks for backups, DBCC execution, jobs, privileged accounts and more. He has scripts that query the system databases to find out this information. It’s a good practice to use these types of scripts to ensure that you have self-documenting information from your instance. It prevents your information from being out of date.

A few things I learned in here. One was to check for backup history in msdb since that can slow down your backup process over time. The msdb table for backup tracking isn’t well indexed and can fill over time 30-60 days should be enough history. Another is to check for encryption of your databases. I typically don’t work with Enterprise Edition so I don’t run into TDE. However its good to know this and prep your keys for a DR situation. Nothing worse than trying to restore and not having those keys. Also, once you encrypt a database, tempdb is always encrypted. Even if you remove the individual database encryption. That’s good to know. It might not e a big deal, but it is extra overhead.

I also found a nice check for objects in master or model, which isn’t a recommended practice, and you might not think to look and be aware of anything that is stored here. One thing i was not aware of is a query to check for Enterprise Edition features being used in a database since you cannot restore these databases in other editions. actually you can, but at the end the restore process will throw an error and then fail.

I would definitely recommend this session if you find Brent speaking at an event near you.