It’s T-SQL Tuesday time again, and this month Mickey Stuewe hosts the party. It’s an interesting topic, and one that I think is important, and will be more important, as we capture, store, analyze, and depend on more and more data.
The topic this month is auditing, and you can read the invitation on Mickey’s blog. I think it’s amazing no one has asked us to write on this in the past, and I expect we’ll see some interesting stories that people will share.
This is a monthly blog party and all you have to do to participate is write a post on the 2nd Tuesday of the month. The topic comes out about a week before, and you can schedule something to go out on the appropriate day. Be sure you leave a comment on the host blog or a pingback.
I’ve never been bound by regulatory requirements for strict auditing. When I worked in industries that had bounds, they weren’t really updated for computer systems, and lots of our practices slipped by because no one thought to look. A few times that was scary.
However I’ve implemented auditing in various systems, in various ways. The most common way to do this in the past was with triggers, though I had a situation where I had to build a unique solution that would capture changes being made to systems.
I worked for a small company one time and we had a few critical systems our clients regularly accessed remotely for various functions. This was in the era of client server systems, and we delivered software to our clients that connected to our servers. We had lots of buggy software, and when we released new features, I’d cringe as I knew we would have calls and complaints for a few weeks as we patched things.
I ran a team of 3 DBAs, and we had 5 or 6 developers working for another manager, including a few remote developers. We had agreed to limit changes to known times, and document the packages being deployed so that we were aware of changes and could better troubleshoot issues. Our environment was complex enough with multiple processes communicating with each other and SQL Server without adding the issues of unknown changes. We communicated things to our development staff, and though they grumbled everyone agreed.
One day I came in to find that a process that had worked the day before had failed. We were struggling to get it working, and eventually discovered that a schema had been changed. This was in response to another enhancement, but it caused an import process to change. We hadn’t scheduled a change, which must have occurred overnight. No one admitted to making a change, and after a long day, I had to drop it.
But not completely.
I went back and set up a server side trace to run and capture information for object changes. I let this run constantly, rolling over the files periodically. It was a bit of a load on our system, but I felt it would be short term and it was important enough to understand what was happening.
A few days went back and we were having an issue with another part of the system. Clients were screaming at my boss, who was coming to the development area to complain to DBAs and developers alike. As we tried to reproduce the problem on test systems, suddenly it started working.
Relieved, but concerned, I decided to check my trace. Sure enough, I found a change to an object logged, by none other than that craft “sa”. Digging in further, we discovered that a piece of middleware had a timestamp of a few minutes before the system started working.
It can be hard to track down who is using an “sa” account, which is why it is not recommended as an account to use by individuals. Cross referencing some information from the network team, we discovered our remote developers were slinging code changes on the system whenever they felt like it, using sa and “Administration” on Windows.
They patched systems at times, but often these were patches to fix issues in poorly written software they’d deployed in the first place. They were also patching patches when thing didn’t work. Whether this was in reducing the time of problems or not was difficult to determine, but it certainly meant a lot of wasted time when we tried to troubleshoot issues with no idea what had changed. We also found lots of their changes weren’t being added to version control.
My boss agreed this was an issue, and it allowed me to change all system administrator passwords, as well as revoke all rights to production systems from developers. Eventually we achieved more stability in our systems than the company had ever found. I’d attribute this to preventing cowboy coding changes to systems, which seem to cause as many issues as they fix.
Auditing is something I’ve depended on, especially for myself. As I’ve worked with more and more systems concurrently, I find that it’s easy to forget about changes made in a hurry, or forget which change occurred on which systems. Logging and auditing allows me to retrace my own steps, as well as those of others.