Monitoring and Alerting

Monitoring your systems is important. It’s not just me that thinks so, as plenty of experienced DBAs and developers know the value of monitoring. Heck, most people have learned to build some sort of metric collection into their software. Azure makes it easy to instrument your application and gather lots of data on how well things are working. Perhaps too easy to gather too much data and then you pay for it, or can’t find time to analyze it. High performing software development shops use monitoring in their Continuous Integration (CI) and Continuous Delivery (CD) pipelines to better understand the health of their code and speed of their workflow, in addition to instrumenting the actual application.

For those of us that need to ensure our database servers are running well, we not only need monitoring, but also alerting. I ran across a couple articles that have thoughts about monitoring and the difference between monitoring and alerting. While I don’t completely agree with all the items in the second piece, I do think that it’s important that you get alerting working well.

I’ve had more than my share of un-actionable alerts, or even unnecessary alerts in my career. These days I’ve learned to better classify those items that matter to me. Most of the time what I find myself doing is downgrading most alerts because very few are actually mission critical. Far too often I’ve worried about 100% CPU or slow log writes or even zero sales in an hour or some other metric that “seems” critical. However, since few of these alerts stop business from flowing, I’ve learned to lower their priority or just remove them as alerts and allowing monitoring to track the values. I do need to watch the monitoring and fix issues, but I don’t need to get up at 3am.

The other thing I’ve worked to do is automate responses to problems. If I know there are ways a computer can respond, let it. Don’t get a human involved if the system can manage itself. Certainly the automated solutions don’t always work, but have some escalation built in that only alerts a human after the system has exhausted its own responses. After all, we don’t want to exhaust humans if we don’t need to do so.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Email or Username

Quite a few of us will work with application databases where we need to identify a particular user for working with the various parts of the application. I’m not talking about the authentication and verification of identity, but just having a token to identify a user once they’ve been authenticated. I don’t want to talk security here since most of you aren’t great at building these systems and should just implement an authentication scheme that’s been vetted (OAuth, Live, etc).

However, once you get a way of authenticating a user, how do you identify them in the database? Many people store a username, but there are issues with these. They aren’t consistent across systems, a user might find theirs taken in one application, and then needs a second one for the next application. This leads to the complexity of people needing a way to retrieve their username, which can be a whole separate set of application programming pain. Often the default is to move back to an email associated with the username.

However what does someone do when they don’t have access to that email? Therein lies the issue with using email as an authentication mechanism. I use my sqlservercentral.com email for many things, most of them related to this site. However, if I were to ever leave this job, then I wouldn’t be able to access some sites if I couldn’t remember the login information. Perhaps I shouldn’t be using this email, but there are some services that are related to both my career as a DBA/developer as well as my position here at SQLServerCentral.

Plenty of people have a separate email for personal use, but not necessarily many users of a generic application. There are plenty of people using their work email for registration at sites, assuming they’ll always have access to the application through a username of some sort. You might even change your email, deciding at some point that coolteenager@hotmail.com isn’t the one you want to give out for your kid’s school registration. In that case, if you’ve used that email at a variety of places, then what can you do?

PASS recently changed to using email instead of username, which is fine. I think more sites use email, but in our case, as DBAs and developers, are emails less likely to change? Should we be sure that our PASS email is always a non-corporate email so that we can keep membership as we change employers? I’d think username is more stable for people in technology, and it can be fun. I love my way0utwest moniker, and try to ensure I reserve it on many applications I may want to use.

There’s no good answer here, and there are issues with using either one as an identifier, but what do you think? Are there compelling reasons to choose one over the other? No matter which one you use, I think the need to have multiple ways for users to gain access to their account is important.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

Democratic Data Science

I know quite a few people that are jumping on the data science bandwagon, brushing up on statistics, learning R, and hoping to land one of those lucrative data scientist positions. I wish you good luck, and some of you will succeed, but I wouldn’t count on there being lots and lots of data science positions in the future. The work is hard, and it will take a lot of learning and practice to become good. Plus, perhaps we won’t need as many data scientists as some are predicting.

There seems to be a lot of work being done with intelligent agents and “bots” that can perform some lower level analysis of data and interact with humans. However, there is also work being done with more capable software that might analyze data and find patterns by applying different algorithms and analysis to make it easier for business analysts to comb through the mountains of data. After all, the growth of IoT, of more metrics and measurements being made in all kinds of applications mean that we can’t just throw bodies at problems. We need better ways to work with large amounts of data.

This work isn’t really aimed at eliminating data scientists or data analysts. Rather, this is aimed at trying to free up their time to do the creative, thoughtful work of examining data. The grunt work of playing with algorithms and testing them according to some measure of success is handled by software. I think that this is a mantra that will be repeated over and over in the next couple decades in computing. Grunt work is automated, thoughtful work is done by (well-paid) humans. If you’re not doing the latter, you might not have a job with the former.

I do think there is still plenty of opportunity for data professionals, after all, we need to manage all that data, find ways to ensure it’s available and can be queried, data is somewhat clean, and certainly that we have some idea of what the data means to our business. I’m not sure software agents will easily be able to set those things up, though certainly they’ll do the work of applying our rules to large sets of data.

There will still be plenty of data science roles, and certainly humans need to be watching the software to be sure it’s working well. There will always be tuning efforts, and perhaps always reports that users need help in building. However, as software becomes more sophisticated, there will certainly be less grunt work in our industry, just like many other industries have seen lower, or at least lower mid-level positions reduced in number. We should be prepared to prove our value and ensure our skills are useful to our organizations.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.0MB) podcast or subscribe to the feed at iTunes and LibSyn.

DevOps and Security

DevOps is a buzzword these days, and like many of the hyped concepts written about, it has a lot of meanings. There is this idea releasing software more often, using automation, having various groups talk to each other, and more, all of  which we might see as common sense. However other DevOps ideas, such as releasing more often (with potentially less review), being willing to break applications and fix them quickly, having developers able to release code to live, production environments, these don’t seem to be ideas that would enhance security for most organizations.

However, that’s not necessarily the case. Security and DevOps practices aren’t necessarily mutually exclusive. I ran across a piece from the security perspective, looking at some of the ideas in DevOps that can actually enhance security.

Writing more code, especially around the configuration and infrastructure, allows versioning, auditing, and more that can ensure we have fewer mis-configured systems. Adding some Desired State Configuration (DSR), with some automated testing of this code, could ensure that the changes made don’t open up security holes. Or, at least, allow us to determine who made the change and when the issue appeared. These are important for understand security risk.

There are also the ideas of measurement, metrics, and feedback, which are important for ensuring security. After all, anomalous behavior should be investigated, as this could be a sign of intrusion. For databases, it’s especially important with the large number of clients that connect to our systems. Adding DevOps style monitoring can allow us to determine if a workload is normal, or perhaps a sign of intentional, accidental, or malicious data query activity.

I enjoyed the piece, and I’d recommend you read it. Plus, whenever I see “snowflake” in an article, I think of Grant and want to read further to see how someone else has used the same analogy he does.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn.

Have You Designed a Database from Scratch?

One of the things that I try to regularly promote is the #SQLNewBlogger project from Ed Leighton-Dick (and #SQLNewBlogger hashtag on Twitter). In keep a column open scanning for the tag and try to respond to those that post. I also write a series of posts under that tag, mainly beginner type posts, that showcase things I think are good, basic topics that I’d want a DBA or developer to know about SQL Server. These are exactly the types of posts that you write for your next resume (or next hiring manager).

I saw recently someone begin their blogging journey with a new post about designing a database. What struck me was this person with a lot of experience noted they had never designed a database from scratch. I’m sure that most of you have worked with a database, and have even modified various objects inside of a schema. Certainly you’ve changed the code of stored procedures or functions. I’m even sure many of you have modified tables, adding columns where you need them. After all, that’s a favorite first technique for many people that need a place to store data.

Probably the majority of you have built tables, at least temporary tables. I’m sure you spent a few minutes thinking about the columns, datatypes, etc. I’d also hope that most of you have added indexes and constraints to tables. Those are certainly skills that more people should better understand, especially the developers that work on software powered  by SQL Server. Above all, please learn how to index OLTP tables.

However, have you designed a database from scratch? Not a standalone database, but a database the actually has an API or application built against it. Have you tried to use the database and found places where your modeling was lacking? Did others complain about your design?

I know that lots of school coursework requires people to design parts of a database, but what about a full, complete database. One that fulfills the requirements to actually manage a set of data well? I’ve done a few, though admittedly, with input and help from other DBAs and developers. I am better for that, and I think that a good design requires some interaction and input from others.

With that in mind, whether you’ve built one or now, what about tackling this as a project for your own blog? Certainly there are some good problem sets out there, but I’d actually give you a new one if you’d like. The SQL Saturday site for PASS has a database behind it. How would you design a database, given the requirements that you need to support that site, and actually generate out the XML feed for the Guidebook app? Maybe that would be a fun, multi-part series that looks at your decisions for the various entities required. You’d learn something and get lots of feedback.

If you’re like to publish a series here at SQLServerCentral, let us know. We’d love to have some design series on databases that back applications, especially if you have a side project with an application.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and LibSyn.

Why Use R Services?

I ran across a blog post from the very talented Joe Sack recenty, who I hope makes a few more minutes for me the next time we’re in the same town together😉.

The post is a look at some of the customers that are using R Services in SQL Server 2016. As you might expect, there are highlights from customers that show dramatic improvement in performance. Going from 200 valuations/hour to 12,000/minute and taking a process from 2 days to 1 hour. I’m sure not all operations improved that much, but I bet most had some improvement. R Services is a big improvement in the way data is analyzed with the R language.

What I really like, however, is that the piece includes some of the gotchas customers experienced, with links on how one might go about fixing the issues. There are also hints on visualizations and performance tuning options. I like looks at technology that include some details on those items that work well or don’t work well. This is a good overview that I hope gets more customers interested in using R services.

But, I hope that we see deeper pieces that can give technical guidance on specific scenarios. Which models scale well and which don’t? Which options in some analysis improved (or hurt) performance? There are plenty of items which are best answered with specific examples, rather than general advice. With all the customers and data Microsoft gathers, especially for companies that might use R Services in azure, I’d expect that we could get some useful, detailed examples on how this (and other) technologies actually work in the real world. I’d like to see this guidance, not just with R Services, but with other technologies as well. Certainly providing some sample data sets and code that performs well really (or tuning options), is what most of us want to see.

Of course, I’d like to see more of these stories and details at SQLServerCentral as well, so if you are solving problems and want to publish something, drop us a note. We’d love articles from the real world, whether on SQL Server 2016, or any prior version. The more reference problems and solutions we have, the more people learn to code better.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.1MB) podcast or subscribe to the feed at iTunes and LibSyn.

A Beautiful Google Mind

I read a quote a few years ago that really resonated with me. The quote is attributed to Jeffrey Hammerbacher, an early Facebook employee that left to found Cloudera, a data analytics company. The quote went like this:

“The best minds of my generation are thinking about how to make people click ads. That sucks.”

That actually dovetailed with something I’d heard in the mid 2000s, when someone lamented that the best technical minds in computer science weren’t working to further our industry or even just produce amazing software. Many of them had left industries to work inside financial institutions, manipulating our stock markets and money systems to make millions.

I can’t blame anyone for choosing to engage in a business that will reward them. That’s the beauty of capitalism. I think you can go too far, but I’d hope that most people, while pursuing their success, would look beyond just money. At least at some point. I would like to believe that most of us find there to be things at least as, if not more so, important than money.

Larry Page founded Google, and is one of the richest people in the world. I ran across an article on his beautiful mind, which looks at the ways in which Mr. Page is trying to change the world. Perhaps he has an eye on profit, but my impression is that he also wants to make life better for more and more people around the globe. His focus happens to be ways technology might help us, and I’m hoping he succeeds. Much like I admire Bill and Melinda Gates are trying to improve health around the world. Those aren’t my passions, but I’m glad someone is driven by them.

Many of us aren’t financially secure for life or able to fund whatever endeavor comes to mind. However, we can make a difference in the world. We can be examples that show there is something beyond making money at work. We can also volunteer or donate some of our time to help others. We can all be a Larry Page in some way, impacting the world in a small way to make it better for everyone.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

Updates During Outages

Slack went down on June 10, 2016. I noticed, which is funny, since I don’t necessarily use the tool every day. I tend to still communicate with email, but more and more people like using Slack at Redgate, so I do pop over there. It’s also a good way to drop a quick note and perhaps get a quick reply. In this case I’d asked a group to do something and hadn’t heard back. Since my request didn’t generate a ticket, I didn’t want to send another email, which could result in more delays if someone isn’t processing email. However, that’s not important.

What was interesting is my Slack web page didn’t connect, and when I pinged @slackhq on Twitter, they directed me to status.slack.com. That’s where they post updates. That site was also down, which somewhat defeats the purpose of having a separate location for updates.

I’ve experienced this a few times, where someone (sometimes me) has built an update or notification mechanism that is dependent in some way on the actual service we’re updating. Often this has been because the same http server is being used, but sometimes it’s the same database instance is used to allow non-technical people to post updates. In all those cases, at some point the update mechanism has broken.

I’ve learned to actually separate my update broadcast mechanism from the production server. We’ve done this in a few ways. I’ve had includes of a simple text file in Web applications in addition to a static page that can be served from a web server. I’ve learned to use a separate physical host that can be moved to the proper IP address in the event that our firewall or load balancers don’t work. They key, I’ve learned, is separation. Have a separate resource that can manage a simple message back to users. Perhaps even a small database that can respond to queries with a “we’re down” reply.

Downtime is never good for users, and rarely are people pleased with being unable to access their system, but good communication goes a long way to soothing the hurt feelings. Most of us accept that systems go down and problems occur. What we’d like is a short note (and updates) that let us know something is being done.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Triple Check Your Restores

I used to work at a nuclear power plant. Not really in the plant, but as a network administrator in the building next door to the plant. Probably a good thing since I struggled to get to work on time and everyone going into the plant had to go through a metal detector like that at most airports. My tenure might have been shorter if I had been late every day to my desk.

However, there was one thing that got drilled into me by each person I knew who did work closely with the power generation side of the business. Everything was (at least) triple redundant. Not only did they want a backup for a component, or a system, but they wanted a backup for the backup. There was a limit to the paranoia, but in critical places, where radiation was concerned, we had multiple backups. One notable, and impressive, area was the power for the control rooms and pumps. In addition to batteries for short term power loss, there was a large diesel generator for each of our two reactors, plus a third that could take the load if either of the first two failed. Those were impressive engines, each about the size of a very large moving truck and jump started with a few dozen large canisters of compressed air that could spin the crankshaft in a split second.

This week there was a report that the database for the US Air Force Automated Case Tracking System had crashed. Apparently the database became corrupted, which happens. However, the surprising part of this story is that the company managing this system reported they didn’t have backups and had lost some data going back to 2004. They are looking to see if there are copies in other places, which I assume might mean exports, old backups or something else, but the reports make this seem like a completely unacceptable situation. I assume this is an RGE event for a few people, perhaps all of the staff working the system.

I was reminded of my time at the nuclear plant because we had a similar situation. We didn’t lose any data, but we found a backup system hadn’t been working for months. Those days we had a tape drive system that automatically rotated 7 tapes. I think this would last us about 4 or 5 days, so it was a once a week job for one administrator to pull the used tapes and replace them with new ones. We had a system where tapes were used 4 or 5 times before being discarded, and our rotation had a tape being used every 3-4 months. However, the person managing the system rarely restored anything.

One day we decided to run a test. I think this was just my boss giving us some busy work to keep us occupied but in a useful way. When we went to read a tape, it was blank. Assuming this was just a mix-up, we grabbed one of the tapes from the previous day and tried it.

Blank.

At this point, my coworker turned a bit red and started to stress. He was in his 40s, with a family and mortgage. I was in my early 20s and had no responsibility here, but I could appreciate his concern. We frantically loaded tape after tape, even looking at the oldest tapes we’d just received from our off-site provider. None were readable, and most were blank. We nervously reported this to our boss, who had us request a sample of tapes from off-site storage going back over 6 months.

Eventually we realized that we hadn’t had any backups for about 4-5 months. The tape drive had stopped working properly, hadn’t reported errors, but dutifully kept retrieving files and rotating tapes each week, unable to properly write any data. No databases, no email, no system was being backed up.

A rush order to our computer supplier had been placed the first day to get us two working tape drives that we manually loaded tapes in each day, and checked them the next morning. Eventually we replaced the drive in our tape leader and instituted random weekly restores to be sure we had working backups. I’m not sure if the plant manager or upper IT management was ever told, but I’m glad we never had to deal with a hard drive crash during that period.

Backups are something we all need to perform. I note this as the #1 thing a new DBA or sysadmin should perform on systems. However, backups are only good if you can read them and actually restore data. I’ve made it a point to regularly practice restores as a DBA, randomly restoring backups with diffs, logs, or to a point in time. Not only do I test the backup, but I test my skills. I’ve also tried to keep an automated process around that restores all production systems to another server to test both the restore as well as run a DBCC CHECKDB. Corruption can live in databases for a long time. It flows through backups, at least in SQL Server, and this is something to keep in mind.

I’d suggest that you make sure you ensure that your backup plan is actually working by performing a few restores. Build an automated process, but also run some manual restores periodically. You want to be sure that you can really recover data in the event of an emergency.

Steve Jones

SQL Server 2016 is Just Faster

I’ve enjoyed the series of posts from Microsoft on the ways in which SQL Server 2016 was improved and is faster. Some of these posts show improvement without any change in your code. Some showcase features that might require change, but will bring about improvements, and after all, wouldn’t you rewrite some code if you knew you could get performance improvements? Most of us would, if the changes are fairly simple. I know some changes are hard, but that’s our job as developers: make systems run better, even if it’s work to change our code.

Some posts are just fascinating. The one on log stamping, which highlights something I had no idea about. I wouldn’t think that stamping 0xC0 is better than 0x00, but I’m glad someone dives deep into hardware changes like this. I know many of us don’t have more than 8 CPUs running our databases, but we might at some point. Having a better way of ensuring soft NUMA works well automatically is good. Even if you don’t care about how this works from a practical standpoint, it’s an interesting look at how Microsoft is keeping up with hardware advances where a single CPU these days can encapsulate more than 8 cores.

SQL Server 2016 is not only the best version to date, but it’s also the most tested and evaluated. Apart from all the evaluations and writeups performed on the CTP 2.x’s , CTP 3.x’s, the RC’s, much of the code has also been running in Azure, where metrics have been gathered and evaluated for quite some time. There’s at least one feature I’ve been hearing about for over 3 years, since well before SQL Server 2014 was released, at which time it was running in Azure already. In all that time, there have been lots of tuning and tweaking to ensure that the code is running as efficiently and effectively as possible for this release.

Note that I’m not saying that the product is without bugs. While some people think there’s no need to wait for SP1 (I agree), I also think you should test thoroughly for your situation. It is entirely possible that there will be some bug that affects your code. And you need to test with actual systems and code. Also, be aware that there are documentation updates being made constantly, as we find that bugs also exist in BOL.

While SQL Server 2016 is faster than previous releases, there are quite a few new features that do require code changes. However, if you can take advantage of those features, I think it’s well worth the development effort and the upgrade cost. Let me know what you think.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( MB) podcast or subscribe to the feed at iTunes and LibSyn.