Paying for Data

Would you recommend your organization pay a ransom to get the key to decrypt data that’s become encrypted on your database systems? What if the data were encrypted in backups going back two weeks?

It’s an interesting question, and one I might expect we will have to deal with at some point. This week I saw an announcement that another organization, this time a university, paid a ransom to recover data. Before you react with “of course, not,” think about this quote from the University Vice-President:

“We are a research institution,” she was quoted as saying. “We are conducting world class research daily and we don’t know what we don’t know in terms of who’s been impacted and the last thing we want to do is lose someone’s life’s work.”

That’s a powerful statement, and certainly one that I could see salespeople, accountants, and more making as an appeal to upper management. What’s interesting in the piece is that the university acknowledges that using the decryption keys is a complex endeavor, requiring IT to get systems operational again. I hope that at the least they also made new backups of the encrypted data before using the keys. Who knows what sort of results come about from criminals that have encrypted, and the offered to provide reversal instructions and keys.

I expect the ransomware efforts to become more prevalent across the next few years, especially as people have success. Even if fewer people pay ransoms, I bet plenty of people will look to disrupt organizations with these actions. Perhaps even a competitor will look to introduce ransomware into your company? Maybe by offering a job to an individual that leaves such a surprise behind as they get ready to move to a new employer.

There are all sorts of issues here, and potential attack vectors. The fact that ransomware (or other malicious software) could be introduced to remain idle, or even transparently encrypt and decrypt systems until a date passes and the keys delete themselves, is horrifying. At the very least, I think this brings to mind the idea that we need multiple versions of backups, so mirroring a file system to another location, or some cloud service like Azure, Dropbox, etc., isn’t enough. Perhaps we should also have checks in place that look for changes. After all, can you imagine a rouge process implementing TDE and letting it run for a month before deleting the certificates from your system and truncating all the tables a week later?

Steve Jones

Install Cumulative Updates

For years I haven’t recommended installing Cumulative Updates (CU) as they are released. The main reason is that Microsoft has had language associated with their CUs that say customers should not install a particular CU unless they are experiencing the specific issues the CU corrects. Each CU has (in the past) also noted that you should stick with the latest Service Pack otherwise. That disclaimer has been enough for me to be concerned about CUs in a general sense, despite the insistence that the CUs were tested as well as SPs. My thought was that if the testing was the same, that disclaimer wouldn’t exist.
Well, things have changed. The latest CUs have this language in the section where a KB says that Microsoft recommends CUs as they are release:
  • SQL Server CUs are certified to the same levels as Service Packs, and should be installed at the same level of confidence.
  • Historical data shows that a significant number of support cases involve an issue that has already been addressed in a released CU.
  • CUs may contain added value over and above hotfixes. This includes supportability, manageability, and reliability updates.
That’s good news, though as Kendra Little notes, you still need to test. Bugs will still exist in patches, and really all software, so it’s important that you test in your own environment. That means you need a test plan that can easily run, preferably an automated test plan. If nothing else, this is a good reason to use tSQLt and have tests written for your system. At least you can verify important queries and systems are working. Kendra has a good list, so read her post.
While I think the quality of CUs is up and they are probably as safe as most of the Windows patches we get every month (and are often installed no matter our individual feelings), I’d still be wary. If you can’t test, if you’re busy, if this is a month you can’t afford for things to fail, then don’t install the CU. This is like throwing your own developers’ code into production without any testing. Make sure you know what is being changed, and you look for obvious problems. No one will be too upset of an obscure issue, but if your scheduled jobs start failing, you’ll dramatically reduce the confidence people have in you.
I am still wary of all patches. They’re disruptions, to both your routine, and potentially to availability as well. Make sure you test, but if you have the time, I’d say keeping up with patches is worth doing. Microsoft is constantly fixing issues, and you want to take advantage of their hard work, if you can verify the patches don’t degrade your system.
Steve Jones

Power BI Week

This past week was T-SQL Tuesday #75, with the topic of Power BI. There were some interesting posts, and we’ve included a few in this week’s newsletter. This was also the week where the winners of the Power BI report contest were announced, and it was a good entry. This was the one I actually thought was the best one. If you haven’t seen it, take a few minutes and go look at the entry and play with the data. It’s interactive, so be sure to see how well the Denver Nuggets have competed against Mr. Curry.

I first looked at Power BI last summer, watching a presentation and being intrigued at the ability of the tool to easily get data from web pages. I then talked with a friend that works with Power BI regularly, and they showed me how to loop and gather information from many pages, easily assembling the data into a dashboard that was interactive. I was very impressed, and I’m looking forward to doing a bit more work on the platform myself.

Quite a few people have been concerned that the development of tools like Power BI mean that there is less of a need for IT staff. I find myself thinking quite the opposite. There will always be data quality issues with something like Power BI, and there is a certain level of programming to build more complex reports. Not to mention the refresh and update process is slow, which can make the tool feel cumbersome.

However users will find some reports are more useful, or more important, and need some help from technology professionals. Whether that’s developing more DAX formulas and measures to enrich displays or just providing stable, clean data sources that consolidate data from various places, there will be plenty of work for IT staff that support companies using Power BI.

To me, Power BI is the type of tool that empowers users and lets them experiment a bit on their own. Once they determine what they really want, then they can engage me to build a more robust, reliable, and useful tool that helps the business.

Steve Jones

Adios, IE

The first browser I used on the Internet to find information was the text based lynx browser. This was quite an improvement (at the time) over Archive and Gopher. When Netscape released their Navigator browser for Windows, I abandoned those old tools, moving on to a new way to access information across the world.

However I primarily worked in a Microsoft world as my career grew and that meant using Internet Explorer, or IE. In fact, I remember thinking all the ActiveX extensions and the bundling on installations of Windows meant that the browser was convenient. It was even nice to program on, since it dealt with poorly formed HTML fairly well. However it had plenty of issues and was a never ending source of calls from customers that had stability and security issues.

In fact, sometime around Windows XP, I moved to Firefox and never looked back. I only had IE installed on my systems because Microsoft included it with the OS. The last 5 or 6 years, I’ve run Chrome and Firefox together, only using IE when a few specific sites required it. Even then, I’d give an inward groan each time I clicked on the familiar “e” icon.

However times have changed. This month, on January 12, Microsoft ends support for all versions of IE prior to 11. While IE will still be run for many years by some people, just as Windows ME and NT 4.0 still exist in places, Microsoft won’t provide support or patches, and we can only hope that people switch to another browser quickly.

I haven’t been thrilled with the Edge browser in Windows 10, as it hasn’t worked in some sites I use regularly. That seems strange to me, but alas, the idea of a Microsoft browser not working the same as other browsers is nothing new. I can only hope that Microsoft fixes the issues in Edge rather than having web developers include code to handle the finicky nature of Microsoft’s browser technology.

The death of IE shouldn’t matter much to us, as data professionals, but I wonder how many of us still have applications using web browser controls based on IE technology. I suspect quite a few, and it’s entirely possible moving forward that we’ll continue to deal with issues of poor rendering of data from our databases through web controls for many years to come.

Steve Jones

Security Decisions

How many of you have written code that results in a security issue or data breach? It’s possible many of you have, but are unaware of the issues. I’m sure a few of you have been hacked, have had to clean up data or code, but often it’s not your code. Maybe it’s something you inherited. Maybe it’s code your team wrote together. In any case, the feedback loop between your action (writing code) and the result (a security incident) isn’t tightly coupled.

I ran across a post from Bruce Schneier on how people learn about cybersecurity. The piece links to a research paper, and it’s an interesting read. It turns out the researchers see most non-experts learning from news, friends, and websites, often with biases towards actions that have had immediate negative consequences, but not necessarily those that are more serious.

That has me wondering about us, as supposed expert, or expert-ish, developers and DBAs. How many of us get security training, or get this training updated? How many of us learn from friends, or websites, and re-use this knowledge over and over in our work, not necessarily understanding, or unsuring, that we are building in strong security into our systems. I suspect many of us just try to get by, writing the minimal level of security that “works” for us, not really understanding how there might be flaws or holes in our system.

Our code, our configurations, our systems have much farther reaching impact than ourselves. In some sense, I think that a fundamental broken concept of information technology is the lack of security practices and architectures being built into our platforms and applications from the start. While convenience may get the boss off our back, or allow greater profit for our companies, it’s not helping our industry, or even our companies in the long term.

I just wish I had some idea on how to change things.

Steve Jones

Visual Studio Subscriptions

Many of us that work with SQL Server do so exclusively through SQL Server Management Studio (SSMS). I find so many people really do the majority of their jobs with SSMS, Outlook, and a web browser. Even back in 2003 when I was a full time DBA, I probably spent the majority of my time in those three applications.
However I also see more and more people using Visual Studio and other tools to accomplish their jobs. The growth of new tools, like Powershell, the expansion of our work into BI areas, and more mean that more and more people are using tools besides SSMS to work with SQL Server data.
This past week there was an announcement that MSDN subscriptions were changing. At most of my jobs, I’ve had an MSDN subscription available to me. In fact, some of you might remember the large binders of CDs (and later DVDs) that arrived on a regular basis and contained copies of all Microsoft software. However many of you out there haven’t had MSDN available to you, or you’ve struggled to justify the yearly $1000+ cost, but you do want to work on your careers and practice with Microsoft software.
At first I saw the yearly cost of MSDN at $799, which is a pretty large investment. However as I looked to the side, I saw a monthly subscription, no large commitment, available for $45. That’s not an extremely low cost for much of the world, but it’s very reasonable in the US. It’s also a great way to build a setup that allows you to work with a variety of Microsoft technologies at an affordable cost. What’s more, you can stop paying at any time. Or start again at any time.
I know that it can be a struggle to invest in your own career, probably more difficult to find time than money. However this is a good way to get access to the various development and server tools for a period of time if you want to tackle a project or force yourself to learn a new skill.
I’m glad that Microsoft has moved to a subscription model for MSDN. I expect to see this subscription growing as small companies use a small investment that scales linearly with new hires to provide their employees with tools. I can only hope that many other vendors adopt this same model and allow us to rent our tools, and upgrade, for a very reasonable cost. I just hope they all let us backup and save our settings in case we interrupt our subscription for a period of time.
Steve Jones

NoSQL Inside SQL Server

There has been quite a debate the last few years between using NoSQL and SQL Server as data stores for different applications. We’ve seen a great deal of interest from developers in document stores as the data structures are closer to what a C# or Java developer sees in their code. Graph databases are becoming popular for certain types of data as well, since their query performance can be extremely powerful in some cases. There are also the scalability aspects of NoSQL systems touted as a better way to build Internet based applications.

However NoSQL databases aren’t the ultimate data store. There are pros and cons to using them, and while it might be easier to being development on one for an application developer, I would argue it’s just as easy to get caught mis-using a graph database or a document database as it is to poorly program a relational database.
This week, in advance of the SQL Server 2016 release, I caught a post from the SQL Server storage engine team showing how JSON storage inside of a table can provide some NoSQL-like benefits to applications. It’s a good idea, but like many NoSQL solutions, you need to spend a bit of time examining your particular requirements. If you need to update your JSON often, especially in large batches, this might not prove to be a good idea, especially if you need consistency.
I think ultimately that everything does relate to the CAP theorem, precisely because data has to exist somewhere, and anytime you need to duplicate it, move it, aggregate it, or anything to relieve some other issue, then there isn’t any magic taking place. You’re going to have to do work to move the data. Some platforms may make that easier, but I bet there are other drawbacks that you will have to consider.
Ultimately I think you need to pick the platform that best suits your environments. That’s not only the application requirements, but also the developers’ skills, the operational staff’s ability to manage the platform, and the maturity of the platform. Those are tough questions, and while I’m glad people experiment with new systems, I do hope that they think carefully about the positives and negatives, and more importantly, share their experiences with the rest of our industry.

Steve Jones

Gauges and Alarms

I was looking through the news this week and found a piece by Michael J. Swart that I really enjoyed. It talks about different types of metrics that he has used when monitoring and alerting on his database servers. Michaal talks about the difference between alarms and gauges, which are usually base on the time scale in which a DBA or administrator needs to react to a situation. Alarms are created when something grows in a short term and causes an issue. The gauges tend to be longer term items that eventually cause a problem.

I’ve had a similar view in the past. However I look at these types of metrics from the standpoint of how I tackle my job as a DBA. I also try to think about these issues as a developer, ensuring that telemetry helps track issues that occur over time and acute issues cause a graceful degredation (or exit) and not an unexpected one.

When I’ve needed to manage systems, I ensure that I have alarms set up. I need these for the situations where something goes wrong in a short period of time. However I never want alarms to be based on a lack of visibility into the system over time. I want to try and eliminate as many alarms as possible with better long term monitoring that looks for indicators that the system might have issues. I’ve found that many (not all) of the alarms that I receive could have been prevented if I’d been paying close attention.

In other words, I’m a proactive DBA, looking at gauges (using MIchael’s terms), not a reactive one counting on alarms.

I can’t predict everything, but when I find that looking back at data is helpful. If I find a trend that could have alerted me, I’ll evolve my monitoring system to try and track those potential issues in the future. I’ve written my own custom metrics in the past, which track data, either PerfMon type counters or business data, and then let me know when the growth, decline, or change is unexpected. In this way, I tend to find I can often proactively prevent issues. If I can use my data to “predict” an issue my boss doesn’t believe in, I’ve also found my budget for resources may increase a bit, allowing me to manage systems even better in the future.

Steve Jones

The Challenges of Space

It wasn’t that long ago in my career when a survey showed the vast majority of SQL Server databases were < 10GB in size. These days I’m sure the average has risen as there are some very, very large SQL Server databases. I know a few people managing PB sized systems, which certainly will move the average upa bit. It’s likely, however, that the majority of databases are still very small, powering smaller applications with < 100GB.

However almost every databases continues to grow, and as systems grow, new challenges come into play. While we can still store a 100GB backup file on a thumb drive or SD card, it’s still a significant amount of data to write, which can take more time than our clients would like. Archivals, backups, restore, all tend to come from non-flash media, so larger systems certainly mean longer RTOs if we have issues, even if we’re just restoring a copy of a database to extract some deleted information.

However there can be more fundamental problems. I ran across a post from the CSS Engineers at Microsoft that talks about exhausting NTFS resources as more and more writes occur on a filesystem. I’d never heard of this, or expected it to be an issue, but apparently it’s occurring more often. I suspect even the NTFS developers hadn’t considered the scales to which we probably stress the file system daily. What is interesting is that this can occur with DBCC operations, database snapshots, and even bcp operations. There are workarounds, and certainly I’m sure that Microsoft has people working hard to increase the capacities of ReFS, which I expect will take over from NTFS, just as NTFS has largely displaced FAT.

As we grow to larger and larger scales of data sizes, with multi-terabytes on laptops and petabytes on servers, I think we’ll need to adapt a number of our techniques and operations in the future. We might end up needing different types of backups (anyone considering lookup table backups or daily backups of a set of rows, like orders?), as well as the possibility of continuing to spread the load of our server across multiple drives. I just hope that Microsoft recognizes this and looks to ensure mount points work smoothly across all of their technologies and features. At some point, 26 drive letters won’t be enough for the majority of us.

Steve Jones

Awareness

How often have you been caught off guard by some issue with one of your databases? I would guess most people have received a phone call, email, or trouble ticket that alerts them to the fact that something has gone wrong. A few of you might have even had an irate customer or manager wander over to your desk and loudly inform you of an issue. Those are not great times, and most of us would prefer to find out about problems before others bring them to our attention.

This past week was the 66th T-SQL Tuesday blog party. If you haven’t heard of it, I would encourage you to look each month for the announcement on Twitter and participate. It’s a great chance to share some knowledge, and showcase your own experiences or viewpoint on a particular topic. I’d actually encourage all of you to go back and write posts about the past topics at some point. It’s a great way to challenge yourself to document your own experiences in a concrete way.

The theme this month was monitoring (thanks to Catherine Wilhelmsen b | t), and this is exactly the way that you build awareness in your environment. Monitoring allows you to ensure that you, as the DBA, learn about issues, or potential issues, first. Monitoring doesn’t solve your problems; that’s up to you to do, but it can help you proactively detect issues early and perhaps mitigage them before users notice them.

The blog entries this month cover all sorts of issues that you might think about watching for. The 825 errors in your log, metadata about your backups, jobs, and more. There’s a variety of posts that cover all sorts of home grown systems, as well as noting that third party products work well. You have a huge variety of choices, from building your own tools and processes or purchasing something like SQL Monitor from Red Gate. However no matter what your budget, resources, time, desire, and more, I’d highly recommend that you really think about ensuring that you are monitoring your systems in a way that lets you know how things are working.

What to do when they’re working poorly, that’s up to you. Hopefully the information we publish at SQLServerCentral, and the links we send out every week from Database Weekly, can help you learn to be better at your job.