Finding Problem Code

Performance of a SQL Server seems to be one of the main concerns for most developers and DBAs. We are under pressure to constantly squeeze more performance out of our applications and reduce the time it takes for users to interact with systems. Reports can be especially problematic as users don’t want to wait for results, but certainly data entry tasks can’t be impacted either.

Whether or not we can actually make changes to the system, or even have time to bother might be irrelevant. I’m sure there are plenty of databases that some of us don’t even bother to try to tune because we’re busy elsewhere. Likely writing more code for additional features or reports that have been requested. As a result, we may pile up lots of code that isn’t necessarily run often. However when that code is executed, we’ll receive no shortage of criticism if the code doesn’t perform well.

This week I wanted to know how you might go about finding the code that is problematic if there isn’t a complaint. If you do have time to pro-actively tune your system, what are the techniques you use to examine a system. I suspect the more advanced people will have answers, and I hope they share them as there’s no shortage of readers that may struggle to improve the impressions of their database server.

What’s the best way to find the SQL statements you need to tune?

It’s a simple question, but imagine that you know there are complaints, but don’t know what is the best way to focus your time. You, as a new or accidental DBA, want to improve the system, but where do you start? What code is the problem? Is the report that your boss complains about that’s the issue or are there other queries that slow down the report execution?

Share the way you’d tackle a system when you don’t have a specific query in mind to tune.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn.

Searching for Plans

Today’s editorial was originally published on Sept 13, 2010. It is being re-run as Steve is away at training.

I was re-reading Conor Cunningham’s “Does Join Order Matter?” recently and found this quote in the post that stood out. As an aside, read this post. It’s fantastic.

“…SQL Server’s QP is actually not set up to give you the best plan.  Instead, it is set up to give you a good enough plan quickly (where good enough is very close to “best”).”

I get that, and it makes sense. Why spend 4 hours optimizing a query that runs in 2 minutes to get it down to 1:50 (Conor’s example) when the optimizer can spend 1 sec and have it execute in those two minutes. If the optimizer looked for the best plan, I’d be very angry as a customer. Especially if this happened with any frequency.

Or would I?

I’m not suggesting that we ought to optimize every query, or that I’d prefer the optimizer didn’t try to normalize queries, or that I want to see the internal tree and try to figure out if I can rewrite the query. I might, however, want to do that for some queries. There are queries that I might want to find the “best” plan for, either because they take a long time to run, or they are run often. I could then use that in something like a USE PLAN.

Computers are cheap these days, as is disk. If I were to somehow able to copy my database onto an instance of SQL Server on a cheap PC, wouldn’t it be cool if I could set that optimizer loose on my query and let it try ALL possible plans? I’d be happy if it spent that 4 hours and gave me back an optimal plan that I could carry back over to my expensive, busy server, and shave that 10sec off a query that I run 10,000 times a day.

I could see some value in a “test” instance of SQL Server that just looked for optimal plans. Now if I could just get Microsoft to build it.

Steve Jones

Checking Baselines with SQL Monitor

We’ve got a demo of SQL Monitor available that takes data from the actual SQLServerCentral systems and presents it publicly. The data is slightly delayed, but it’s the real data and it shows the performance of our system.

I actually use this at times when the site isn’t responding well. It’s quicker for me to hit than log into VPN and check the real instance.

I was reading a report for a customer recently and they mentioned they used the baseline feature to track performance of the system when they make changes, using this to decide if they think the changes were positive or negative.

Note: they rarely roll back, instead using this data to determine if they need to dig into more specifics and then create a new deployment (roll forward) that can address issues.


There has been a lot written about baselines, but essentially these are a way of capturing what is “normal” for your system. If current performance deviates from normal, you might have cause for concern.

Of course, this is a gross way of looking at a system and you have to apply some of your own knowledge to determine if you have issues, or some transient workload is altering performance during your sample period.

You can see a demo of baselines online and play with the ranges. It looks similar to this image below, in which case I can see the last 24 hours of my CPU time, and then compare that with the 24 hours before that.


Using this type of diagram, I can quickly visualize issues that might be occurring without resorting to complex analysis.

This doesn’t give me the exact problem. I’d still need to dig into the code being run, look at execution plans and other detailed data (much of which I can get from SQL Monitor as well), but this does let me make an initial judgment on where to focus my time.

I would highly encourage you to run, keep, and update baselines on your system. At the very least, you should have an idea of what types of performance should exist on critical systems.

Normalizing Performance

This week I ran across a post from Glenn Berry on hardware performance that I thought framed the concept of performance value well. In the post, Glenn talks about the latest TPC-E benchmark and the SQL Server performance values that vendors release. Glenn mentions that he always tries to divide out the numbers to provide better comparisons for the various hardware choices.

That makes a lot of sense to me, as I think very few of us would be able to afford the top of the line systems on which vendors run the benchmarks. Most of us try to somehow compare the results in some way and then make our own decisions for our smaller systems. I don’t know many people that run 36 core machines, but I do know lots that need to decide which 4 or 8 core systems they should choose.

The idea of normalizing performance to smaller loads is something we do often. We need to do this, because we often can’t get the same size, scale, or specifications in our test systems as we have in production. As much as we’d like to have them, resources are limited, and we do need to have some way of extrapolating the results in our development and test systems forward to production.

Glenn has a way of doing this for CPUs, and while you might not agree with his method, at least he has an organized way of doing things, and then letting empirical results provide feedback on whether this works well. You should do the same thing, whether you’re trying to gauge disk or T-SQL speed. Develop a hypothesis (or read about how other do so)  for measuring your performance on a lesser, and then your primary system. Take time to run some test the same way, even if it’s single query performance on a production system while it’s live.

You won’t get perfect results and scalability, but you’ll develop a few metrics that allow you to determine if your decisions in development will have severe negative impacts. There still might be problems, but you should eliminate more and more of these over time.

The Cardinality Estimator in SQL Server 2014 – Going Forward and Backward

I saw a talk from Joe Sack (b | t) on the cardinality estimator (CE) in SQL Server 2014 and found it very interesting. To be fair, some of the "how it works" isn’t something I care about much, but I did like Joe sharing some places in which you might find problems with your queries and how the cardinality estimator might affect you. The talk is worth seeing if you get the chance.

However one of the really interesting things, and an item I appreciate Microsoft building, is a switch to turn off the new CE. Actually, it’s not turned off, but you can set it to pre-SQL Server 2014 behavior (essentially 2005-2012) or to SQL Server 2014 behavior. What’s even better is that you can set this in a number of ways.

Setting Database Behavior for all Queries

Turning on the new CE is as simple as setting the compatibility level to 120. This will turn on the new CE for your queries in this database.

The flip side is setting your compatibility level to something below 120 (110, 100, etc.) and your queries will use the old CE behavior in the query optimizer.

Setting Behavior for Queries

You can also specifically test queries with either the new or old CE. The QUERYTRACEON option can be used with these two flags.

  • 9481 – Uses the 2012 (pre-2014) CE with queries. This is used when the database is in SQL Server 2014 (compatibility mode 120) mode.
  • 2312 – Uses the new 2014 CE when the database, or defaults, are set to use the 2012 CE.

This is documented in KB 2801413 from Microsoft.

Setting the Server Level

There is a trace flag that you can use at Server startup that globally sets the CE behavior. Set 9481 at startup and your SQL Server 2014 databases will use the old CE by default. Joe documented this on his blog.


I think this is great because if you are concerned about workloads being effected by the CE changes, then you can set the old CE as the default and test on your real production server by executing specific queries with the new CE and the query plan.

Alternatively, if you upgrade and find problems, you can duplicate the old CE by using a query hint and see if the query performs better.

I’d like to see this upgrade/downgrade granularity in more features that can potentially affect performance and I would say this is a fantastic architectural win by the SQL Server development team.

Performance Tuning Through Panic

We’ve all been there at one time or another: panicked by a performance issue. I once upgraded a FoxPro for DOS application to a VFP front end with a SQL Server back end. We had a much more powerful server, generations ahead of the previous file server share. When the first users logged in and searches ran slower, I was in a panic. My team worried about the size of our inventory data in memory and quickly upgraded our 4GB, 1996-era server to 8GB of RAM, hoping for improvement. We also hurriedly split out queries and separated some data in an effort at quick tuning. It wasn’t much above a knee-jerk reaction, but we had users, and management, yelling at us on an hourly basis.

I’m sure many of you have had other performance problems that resulted in some panic during your career. I’ve had more since then, and will probably have more in the future. Many of us do, and many of us perform knee-jerk performance tuning. However you can, and should, avoid tuning in a panic, as pointed out by Paul Randal. Doing so usually doesn’t fix the problem, and when it reoccurs, it’s more embarrassing with people less likely to listen to your next great idea.

Tuning a system is a science, but it’s a complex science. Without experience, without having knowledge about all the complex interactions, and a deep understanding of how all the parts fit together, it often looks like a hopeless task. Who can keep all that information in their head and relate all the various components of the application? The network, the OS, the SAN, clients, all can cause problems that appear to be in the database.  As Paul points out, failing to understand the actual root cause can lead you to make changes that will have no effect on the actual underlying issue. You’ll often realize this when the problem re-occurs in a short time.

I think one of the best reason to hire consultants is that they see so many problems that they can recognize patterns in performance that many of us don’t see. What they do appears to be an art because they can seemingly understand what information is relevant, and what isn’t. They walk down a list of potential issues, gathering evidence in a logical way, but one that can appear random to someone with much less knowledge and experience.

All of us can learn more, we can read about problems others have had and learn how they solved the issue. However we often face the same reaction that many consultants do: no one wants to implement the advice. Until clients and vendor are willing to tune their code, many problems will only be masked by hardware.

For a time. Then they’ll reappear.

Steve Jones


Database Queries and Scalability

I ran across this post with a very provocative title:  (thanks Brent Ozar Unlimited). It’s an interesting read from a developer, talking about how so many website designs aren’t built for scalable because they rely too heavily on a set of database queries place on every page. If you expect a database query to retrieve or store a user id, or other data on every page, then you are limited to the scalability of your database back end. There’s no doubt there, but the author has ideas that can help you overcome these issues.

One of the problems is that systems read and write account information (or registrations) to the database directly. That’s definitely a scale issue, and the author suggests generating the client information on the web server and storing it in a more robust and scalable backend. That makes sense, but it’s with the caveat that ” As long as you eventually write the data to the db later on, you’re good to go.” Good point, but now you need a replication process of some sort (or messaging) that gets the data to your database. This ignores some of the synchronization issues across multiple servers as well, but it’s not a bad idea.

The idea of avoiding joins, by using a cached set of data to iterate through and pull data from other tables may work in some cases, but I’m not sure this is the most efficient way of grabbing data from a database, ensuring it’s the latest data, and not impacting concurrency. This technique seems to invalidate the power of an RDBMS by not using joins to effectively, and efficiently, retrieve data. I suspect replicated copies of slowly changing data is a better way of scaling your system than avoiding joins.

Overall I don’t love the article, and depending on which platforms and technologies you’re using, it might be better, or worse, advice. I would say that building better systems involves considering alternative techniques, allowing for failures, and using new techniques like messaging, caching, asynchronous processes and more. As we try to improve the quality of our software, we should be talking about, writing about, and trying out new techniques, learning what can work better in our environments.

Steve Jones

Video and Audio versions

Today’s podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at

Follow Steve Jones on Twitter to find links and database related items and announcements.
Steve Jones Windows Media Video ( 22.3MB) feed

MP4 iPod Video ( 25.7MB) feed

MP3 Audio ( 5.2MB) feed

Feeds are available at iTunes and Mevio

To submit an article, rant or editorial,
log in to the Contribution Center

Achieve Better Performance

One of the hottest topics at most every SQL Server event I attend is performance. The sessions that talk about improving performance through various means are well attended, and that tells me a few things. One is this is an important topic, and two, people need to learn more about it.

This past week I noticed a lot of performance related topics appearing in blogs and headlines. I’m not sure why there were so many this week, but I think it does deserve pointing out that this is a good week to beef up your skills in this area. I saw a little bit of something for everyone, no matter what your job.

Database administrators, who often deal with the end product from developers and can make limited changes, should make sure they understand statistics, which can clue them into problems in a number of situations. You ought to be able to troubleshoot various issues, like blocking or tempdb issues. For those dealing with merge replication issues, you ought to ensure you understand how to deal with conflicts and resolve them. No matter who you are, you need to understand the impact of IO on your server, and that means ensuring you can measure the load. Measuring CPU can be tricky, but there’s an interesting idea from the SQL Server team this week.

The developers, who can make changes early in the process of software development, really should learn to write better SQL. That means studying the various ways people solve problems and practicing using some of them. Do you know the issues with Scalar UDFsDo you understand temporary objects? Can you PIVOT data? There are so many writings on T-SQL that you can’t keep up, but you should be getting better over time if you write T-SQL code in your job.Developers also should understand the basics of indexing, and deliver a few basic indexes along with their software. That’s the least you can do.

This week we started to see various blogs on SQL Server 2014, especially the Hekaton feature. While I’m sure the next version of SQL Server will perform better and speed up some of your workloads, it can’t cover from fundamental mistakes, at least not completely. Do your career, and your current and future employers a favor and work on your performance skills this week.

What Do You Want to Know About I/O?

Hope to see you at the conference.
Hope to see you at the conference.

There’s lots I want to know. What affect does an SSD have if I use it for a log file? One of my filegroups with indexes? What about tempdb? How do you dig in and prove latency to the storage groups from within SQL Server? Will the query optimizer take advantage of information about drive performance? There are lots of questions I have, and I’m hoping to get answers. I’m hoping to attend the “Ask Anything I/O” panel at SQL Intersection in a few weeks. I’m leaving the event Wed night, so unless there’s someone I’m trying to catch in a last minute meeting, I’ll be sitting in the audience.

I am guessing that Brent Ozar, Kevin Farlee, and Mat Young will have lots of answers for me and others. I’m also guessing most of the questions will be met with references that are already out there, and I hope that the presenters will publish a list of questions and answers for attendees. That’s fine, as there are lots of times when I’m looking for information that has been published, but I can’t find in the ocean of Google results. Attending a session given by experts can be a way to shortcut some research and searching to find where the answers are located.

That’s one of the reasons to attend a conference. Getting quick answers from experts, or learning how to better target your research. All the speakers at SQL Intersection are well known, longtime, expert SQL Server users who are friendly. They’re more than willing to answer questions, or help guide you to solve a problem or learn something new.

The other great thing about conferences? You get to hear some great questions from other attendees. We all attack problems differently, and work in very diverse environments. There are many times I’ve heard how another person has implemented SQL Server and it’s intrigued, or even inspired, me to try something similar with my instances.

If you can find the funding, SQL Intersection is a great event to attend, with an impressive list of speakers, each hand picked by Kimberly Tripp and Paul Randal of SQLskills. They provide amazing training in their Immersion events, and are trying to bring even more options to you with SQL Intersection. You can even walk away with something tangible in addition to all the knowledge you’ll gain: a Surface Tablet.

Steve Jones

The Voice of the DBA Podcasts

We publish three versions of the podcast each day for you to enjoy.

The Default Fillfactor for an Index

I ran down the rabbit hole on transaction logs recently. I started with Paul Randal’s post over at the SQL Sentry blog on trimming the transaction log, then went to his video on log analysis. I also glanced at the posts on index cleaning and what index stats don’t tell you. What started out as a quick “what can I do to help transaction logs perform better” became a few hours of reading, executing code and thinking.

However the post that caught my eye was Paul’s post on choosing an index fill factor. I’ve seen various notes on the fact that fill factor can matter for performance and maintenance, but I haven’t often seen someone give some good concrete rules on what you should choose. In a nutshell, here’s Paul’s advice:

  1. Don’t set a system wide fill factor with sp_configure
  2. Start with 70 for specific indexes that seem to experience lots of fragmentation

I like this advice. It’s simple, and easy to start using, although the caveat to #2 is that you need to monitor and perhaps adjust the fill factor (up or down from 70) and possibly change your maintenance schedule. I might lean towards leaving my maintenance alone, especially with a script like the SQL Fool Index Defrag Script running and playing with fill factor to ensure I minimized page splits.

There’s also the trade-off of requiring more space for your index (and maintenance) if you move to 70 from 100.

I do think that changing the system wide level is a bad idea. If you aren’t sure what your system wide fill factor is, here’s a post on checking it.