Visual Studio Subscriptions

Many of us that work with SQL Server do so exclusively through SQL Server Management Studio (SSMS). I find so many people really do the majority of their jobs with SSMS, Outlook, and a web browser. Even back in 2003 when I was a full time DBA, I probably spent the majority of my time in those three applications.
However I also see more and more people using Visual Studio and other tools to accomplish their jobs. The growth of new tools, like Powershell, the expansion of our work into BI areas, and more mean that more and more people are using tools besides SSMS to work with SQL Server data.
This past week there was an announcement that MSDN subscriptions were changing. At most of my jobs, I’ve had an MSDN subscription available to me. In fact, some of you might remember the large binders of CDs (and later DVDs) that arrived on a regular basis and contained copies of all Microsoft software. However many of you out there haven’t had MSDN available to you, or you’ve struggled to justify the yearly $1000+ cost, but you do want to work on your careers and practice with Microsoft software.
At first I saw the yearly cost of MSDN at $799, which is a pretty large investment. However as I looked to the side, I saw a monthly subscription, no large commitment, available for $45. That’s not an extremely low cost for much of the world, but it’s very reasonable in the US. It’s also a great way to build a setup that allows you to work with a variety of Microsoft technologies at an affordable cost. What’s more, you can stop paying at any time. Or start again at any time.
I know that it can be a struggle to invest in your own career, probably more difficult to find time than money. However this is a good way to get access to the various development and server tools for a period of time if you want to tackle a project or force yourself to learn a new skill.
I’m glad that Microsoft has moved to a subscription model for MSDN. I expect to see this subscription growing as small companies use a small investment that scales linearly with new hires to provide their employees with tools. I can only hope that many other vendors adopt this same model and allow us to rent our tools, and upgrade, for a very reasonable cost. I just hope they all let us backup and save our settings in case we interrupt our subscription for a period of time.
Steve Jones

NoSQL Inside SQL Server

There has been quite a debate the last few years between using NoSQL and SQL Server as data stores for different applications. We’ve seen a great deal of interest from developers in document stores as the data structures are closer to what a C# or Java developer sees in their code. Graph databases are becoming popular for certain types of data as well, since their query performance can be extremely powerful in some cases. There are also the scalability aspects of NoSQL systems touted as a better way to build Internet based applications.

However NoSQL databases aren’t the ultimate data store. There are pros and cons to using them, and while it might be easier to being development on one for an application developer, I would argue it’s just as easy to get caught mis-using a graph database or a document database as it is to poorly program a relational database.
This week, in advance of the SQL Server 2016 release, I caught a post from the SQL Server storage engine team showing how JSON storage inside of a table can provide some NoSQL-like benefits to applications. It’s a good idea, but like many NoSQL solutions, you need to spend a bit of time examining your particular requirements. If you need to update your JSON often, especially in large batches, this might not prove to be a good idea, especially if you need consistency.
I think ultimately that everything does relate to the CAP theorem, precisely because data has to exist somewhere, and anytime you need to duplicate it, move it, aggregate it, or anything to relieve some other issue, then there isn’t any magic taking place. You’re going to have to do work to move the data. Some platforms may make that easier, but I bet there are other drawbacks that you will have to consider.
Ultimately I think you need to pick the platform that best suits your environments. That’s not only the application requirements, but also the developers’ skills, the operational staff’s ability to manage the platform, and the maturity of the platform. Those are tough questions, and while I’m glad people experiment with new systems, I do hope that they think carefully about the positives and negatives, and more importantly, share their experiences with the rest of our industry.

Steve Jones

Gauges and Alarms

I was looking through the news this week and found a piece by Michael J. Swart that I really enjoyed. It talks about different types of metrics that he has used when monitoring and alerting on his database servers. Michaal talks about the difference between alarms and gauges, which are usually base on the time scale in which a DBA or administrator needs to react to a situation. Alarms are created when something grows in a short term and causes an issue. The gauges tend to be longer term items that eventually cause a problem.

I’ve had a similar view in the past. However I look at these types of metrics from the standpoint of how I tackle my job as a DBA. I also try to think about these issues as a developer, ensuring that telemetry helps track issues that occur over time and acute issues cause a graceful degredation (or exit) and not an unexpected one.

When I’ve needed to manage systems, I ensure that I have alarms set up. I need these for the situations where something goes wrong in a short period of time. However I never want alarms to be based on a lack of visibility into the system over time. I want to try and eliminate as many alarms as possible with better long term monitoring that looks for indicators that the system might have issues. I’ve found that many (not all) of the alarms that I receive could have been prevented if I’d been paying close attention.

In other words, I’m a proactive DBA, looking at gauges (using MIchael’s terms), not a reactive one counting on alarms.

I can’t predict everything, but when I find that looking back at data is helpful. If I find a trend that could have alerted me, I’ll evolve my monitoring system to try and track those potential issues in the future. I’ve written my own custom metrics in the past, which track data, either PerfMon type counters or business data, and then let me know when the growth, decline, or change is unexpected. In this way, I tend to find I can often proactively prevent issues. If I can use my data to “predict” an issue my boss doesn’t believe in, I’ve also found my budget for resources may increase a bit, allowing me to manage systems even better in the future.

Steve Jones

The Challenges of Space

It wasn’t that long ago in my career when a survey showed the vast majority of SQL Server databases were < 10GB in size. These days I’m sure the average has risen as there are some very, very large SQL Server databases. I know a few people managing PB sized systems, which certainly will move the average upa bit. It’s likely, however, that the majority of databases are still very small, powering smaller applications with < 100GB.

However almost every databases continues to grow, and as systems grow, new challenges come into play. While we can still store a 100GB backup file on a thumb drive or SD card, it’s still a significant amount of data to write, which can take more time than our clients would like. Archivals, backups, restore, all tend to come from non-flash media, so larger systems certainly mean longer RTOs if we have issues, even if we’re just restoring a copy of a database to extract some deleted information.

However there can be more fundamental problems. I ran across a post from the CSS Engineers at Microsoft that talks about exhausting NTFS resources as more and more writes occur on a filesystem. I’d never heard of this, or expected it to be an issue, but apparently it’s occurring more often. I suspect even the NTFS developers hadn’t considered the scales to which we probably stress the file system daily. What is interesting is that this can occur with DBCC operations, database snapshots, and even bcp operations. There are workarounds, and certainly I’m sure that Microsoft has people working hard to increase the capacities of ReFS, which I expect will take over from NTFS, just as NTFS has largely displaced FAT.

As we grow to larger and larger scales of data sizes, with multi-terabytes on laptops and petabytes on servers, I think we’ll need to adapt a number of our techniques and operations in the future. We might end up needing different types of backups (anyone considering lookup table backups or daily backups of a set of rows, like orders?), as well as the possibility of continuing to spread the load of our server across multiple drives. I just hope that Microsoft recognizes this and looks to ensure mount points work smoothly across all of their technologies and features. At some point, 26 drive letters won’t be enough for the majority of us.

Steve Jones


How often have you been caught off guard by some issue with one of your databases? I would guess most people have received a phone call, email, or trouble ticket that alerts them to the fact that something has gone wrong. A few of you might have even had an irate customer or manager wander over to your desk and loudly inform you of an issue. Those are not great times, and most of us would prefer to find out about problems before others bring them to our attention.

This past week was the 66th T-SQL Tuesday blog party. If you haven’t heard of it, I would encourage you to look each month for the announcement on Twitter and participate. It’s a great chance to share some knowledge, and showcase your own experiences or viewpoint on a particular topic. I’d actually encourage all of you to go back and write posts about the past topics at some point. It’s a great way to challenge yourself to document your own experiences in a concrete way.

The theme this month was monitoring (thanks to Catherine Wilhelmsen b | t), and this is exactly the way that you build awareness in your environment. Monitoring allows you to ensure that you, as the DBA, learn about issues, or potential issues, first. Monitoring doesn’t solve your problems; that’s up to you to do, but it can help you proactively detect issues early and perhaps mitigage them before users notice them.

The blog entries this month cover all sorts of issues that you might think about watching for. The 825 errors in your log, metadata about your backups, jobs, and more. There’s a variety of posts that cover all sorts of home grown systems, as well as noting that third party products work well. You have a huge variety of choices, from building your own tools and processes or purchasing something like SQL Monitor from Red Gate. However no matter what your budget, resources, time, desire, and more, I’d highly recommend that you really think about ensuring that you are monitoring your systems in a way that lets you know how things are working.

What to do when they’re working poorly, that’s up to you. Hopefully the information we publish at SQLServerCentral, and the links we send out every week from Database Weekly, can help you learn to be better at your job.

The Devil’s in the Details

This week I was scanning through a number of SQL Server posts (in between working on April Fool’s Jokes) and a couple of them caught my eye. They dealt with simple subjects, but subjects whose details are important. I’ve often found people have had performance issues in SQL Server because of simple misunderstandings of how the system works, or because not enough weight is given to the impact of small details.
Kimberly Tripp (of SQLskills) wrote a basic post on parameters, variables, and literals, which would think most programmers would understand. However I realize more and more that lots of people that write code in the world taught themselves. That’s amazing, but it does mean that people have many different holes in their knowledge. What I might consider basic or common knowledge might be something that another person never had been exposed to. In any case, Kimberly does a fantastic job of laying out the differences in these three concepts, how they work side by side, and how the choice can dramatically impact performance of your code. Read this one.
The other post this week was a fairly simple post on data conversions by Rob Sheldon. Again, precedence and conversions are something I dealt with when writing C code, and you do not want to get things wrong there. However many people might not have the same background and wouldn’t understand by varchar and nvarchar wouldn’t be a basic, simple conversion with little cost. Or why the type of conversion might result in far different results than you expect. Since we often test with one set of data and our application work with a (much) wider set of data, this one is a must read as well.
There are so many things you should consider when writing code it’s really a constant and regular effort to improve your skills in this business. You won’t learn everything, or learn something every day, but pick things up and practice with them. Learn to build better code, one technique at a time, and cement that knowledge with actual implementation in your work.
You’ve got at least two new things to look at this week, so pick one and try to ensure your code next week considers these details.
Steve Jones

Which Problem Are You Solving?

One of the problems I’ve seen for years in computing is people seeing the forest or the trees. Not that either is good or bad, but far too often I’ve run across people that only see one of these, meaning they examine problems from a high level perspective, or a very low level perspective. While either of those can help you solve your issues, there will be a problem or a time when you need to examine the situation from another perspective. Knowing when to look at a higher or lower level can be invaluable in troubleshooting.

This past week I ran across a great post from Grant Fritchey called Understand the True Source of Problems. In this post, Grant talks about some strange DBA “best practices” that were implemented to solve problems, but aren’t necessarily the best of anything. It’s an interesting list, and you might read it for entertainment, though I hope none of you are following these ideals at your place of employment.

However the thing I liked about this post is that in the beginning Grant begins with an old doctor joke, which is still funny to me, but it often is the approache taken by people trying to solve problems. They just avoid the issue, and never solve it. That can work in the short term, but I’ve seen this build up over time to the point where people are afraid to make any changes to a system or code because of the fear of causing some unknown issue.

I don’t think that every issue needs to be solve to the root cause. There are plenty of times something goes sideways on one of my computers and a reboot fixes the issue. I’m fine with that, but if it occurs more than two or three times, I do need to solve the problem. I need to be able to dig deeper to diagnose what resources are failing, what performance issue exists, or maybe what line of code is broken. Perhaps I need to step back and examine if some other part of my system, like the disk or network, is causing an issue that appears to be isolated to code. In other words, I need to be able to look at the wider environment (the forest) or dive into the particulars of a process (the trees).

Knowing how to strike this balance is a bit of an art, and it’s what really makes the best problem solvers experts. I can’t teach you that part of computing, but I can teach you this. When you don’t think you’ve completely solved the issue, stop for a minute and take a wider or narrower view. Or maybe both. Look for what you might not have thought of. If you’re like me, there are probably plenty of possibilities that come to mind when you stop and re-examine the issue. Then work your way through those items, discarding the ones that aren’t relevant. Sometimes opening your mind a bit is the best way to find the exact cause of an issue.

Steve Jones

Data Lakes

I heard a new phrase this week: the data lake. It comes from a Radar Podcast episode where Edd Dumbill talks about data as an asset that should be exposed everywhere in an organization. There’s a blog post on the subject as well.

The idea is somewhat centered on Hadoop, but it could apply elsewhere. Data comes into an organization and then tends to seek and get stored with other data in a large lake. Applications are just a way of accessing the data in the lake, but all the data really lives in a large Hadoop lake of information. In some sense, this isn’t far away from the “single view of the truth” that I’ve seen plenty of organizations attempt. In a relational world this means all data moves from OLTP systems to a large data warehouse, and is then moved to smaller data marts (really warehouse subsets) and is accessed from there.

It’s a good idea in theory, but in the practice of trying to move data around with any velocity between users, with all the copying, cleaning, transforming, and more going on doesn’t work. OLTP systems are needed because there are transactional actions that must be completed quickly and accurately. Moving this data to other systems becomes harder as data volumes and the sheer number of clients (whether users or other systems) increase. The idea that we can keep all of our processes working quickly enough that users won’t get frustrated is likely a dream. The more a value exists in a set of data, the more users will access it. The more accesses, the slower it often becomes, which starts a cycle of smaller subsets of data and applications that subsist on those small data puddles.

Excel is probably the most common example of a data puddle that exists in your organization. A set of data, perhaps out of date, but useful enough to make decisions based on. Infinitely flexible and convenient enough that updates, changes, and more often spawn more and more puddles where the information never gets transferred back to the large lake of a database, whether that’s an RDBMS, Hadoop clusters, or something else.

I think the idea of a large data lake is great, but in a practical sense, much of an organization’s data will never live in the lake. If it does, it will most likely be data that’s been superceeded by information in a puddle somewhere on an employee’s laptop, tablet, or personal cloud.

Steve Jones

Normalizing Performance

This week I ran across a post from Glenn Berry on hardware performance that I thought framed the concept of performance value well. In the post, Glenn talks about the latest TPC-E benchmark and the SQL Server performance values that vendors release. Glenn mentions that he always tries to divide out the numbers to provide better comparisons for the various hardware choices.

That makes a lot of sense to me, as I think very few of us would be able to afford the top of the line systems on which vendors run the benchmarks. Most of us try to somehow compare the results in some way and then make our own decisions for our smaller systems. I don’t know many people that run 36 core machines, but I do know lots that need to decide which 4 or 8 core systems they should choose.

The idea of normalizing performance to smaller loads is something we do often. We need to do this, because we often can’t get the same size, scale, or specifications in our test systems as we have in production. As much as we’d like to have them, resources are limited, and we do need to have some way of extrapolating the results in our development and test systems forward to production.

Glenn has a way of doing this for CPUs, and while you might not agree with his method, at least he has an organized way of doing things, and then letting empirical results provide feedback on whether this works well. You should do the same thing, whether you’re trying to gauge disk or T-SQL speed. Develop a hypothesis (or read about how other do so)  for measuring your performance on a lesser, and then your primary system. Take time to run some test the same way, even if it’s single query performance on a production system while it’s live.

You won’t get perfect results and scalability, but you’ll develop a few metrics that allow you to determine if your decisions in development will have severe negative impacts. There still might be problems, but you should eliminate more and more of these over time.

Saving Hard Dollars

This week I ran across an analytics piece that talks about how Big Data and analysis can do more than help marketing, it can also create hard dollars for your company. Companies are using sensors and technology for mundane tasks, but tasks that can save the company substantial, hard dollars.

You may have heard of UPS and FedEx optimizing their routes, and tracking the actions of drivers. Those companies (and others) have learned that a little programming work to analyze and create instructions for their workers can reduce wasted time and fuel, which ultimately means less wasted money. Other companies have reduced their heating or cooling costs with more data and small investments.

I don’t know how many additional cost reductions can take place in most companies, but there are likely changes none of us have thought of that could be easily implemented with technology. It seems like a no-brainer that technology can build better routes, but early on, it wasn’t considered as an option because it didn’t replace a formal manual process. It wasn’t worth paying someone to design routes each day for a truck.

As computer power increases, and more data becomes available, I predict that we’ll find many ways to analyze that data and change the way we run businesses. Many of the ideas will be unexpected, and some may be counter-intuitive, but those of us working with data will have a front-row seat to the ways in which data analysis can transform our world.

Steve Jones