Skip to content

QA for Data

Here’s a typical DBA scenario. You’re at your desk, working on something, when you get a trouble ticket, phone call, or tap on your shoulder. There’s a problem in production and an order, or some other important piece of data, is incorrect. Can you fix it?

Many of us are happy to oblige. Often we know what needs to be changed and can write the INSERT/UPDATE/DELETE statement to correct the issue. Even if we have to dig in further, we can usually find the data in a table and determine what should be changed.

However, what do we do then?

If you are asked the make data changes to production, do you QA the script? Do you save the scripts in a VCS or a log? Does anyone else examine the data or code? Do you scan the script yourself, have SQL Server parse it, or bother to look for triggers that might fire? Do you bother to put the script in a transaction with error handling?

I have to say that in most of my career, I didn’t bother with any of that. I’d examine a script, perhaps have another DBA or developer look at it, but most of the time I’d run it in production if I thought it was ready. I did often wrap the code in a transaction, with a query to check the results, and an immediate rollback to prevent blocking. However if I thought the results looked good, I’d rerun the query without the transaction. Most of the time that worked well. A few triggers and fat fingers caused issues that I had to fix again, sometimes after quite a bit of work to correct my mistakes.

It’s hard to prevent data change errors, especially if it’s something we do often, or we work in a small environment with limited staff. Ultimately, we might not even care because if we update the data incorrectly, we can run another update to fix the new issue. That’s not necessarily what most managers would want to hear, but it’s reality.

The one thing I have insisted upon in my environments was logging any queries run. A simple cut and paste with a timestamp. That way if things were done incorrectly, at least I know what was done.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

SQL Prompt – Killing the Detail Box

I love SQL Prompt and use it constantly. When I don’t have it, my code writing process slows to spurts and stutters, and it’s painful.

However one of the defaults for SQL Prompt is to pop up details about objects when I’m typing, as shown below:

prompt2

I usually want the object name to appear, but I hate seeing the definitions. That’s rarely useful to me.

I can turn that off with a quick setting change. If I choose SQL Prompt from the menu and select Options, I’ll get a dialog with a number of sections on the left. The one that pops up first should be "Behavior" and there’s a checkbox in this for "Display Object definitions (see the image below).

prompt3

If I uncheck this, then I still get the list of objects, but no definition. The view in this image is a much easier way for me to write T-SQL code.

prompt4

You can see this noted in the SQL Prompt docs.

Have you tried DLM Dashboard?

DLM Dashboard is a new product from Redgate Software that’s free. Free as in beer, which should be attractive to many of you. The product is in Beta, but it’s fairly stable and has worked well for me. It’s still maturing, and we’re certainly looking for feedback on what works and what doesn’t.

What Does DLM Dashboard Do?

There are a couple of things DLM Dashboard does that I think many of you are interested in. Both of these promote DevOps collaboration, and I’d like to see these concepts take hold in most organizations.

  1. Track changes to Production
  2. See changes coming through Development

There’s more to the dashboard, but essentially it works by tracking the version of the schema in all environments. If anything changes, then you get an alert on the dashboard (and email notification if you need it).

That’s it. The versions are what you call them. You can set up versions as numbers (v4.2, 4.21, 4.22. 5.3, etc.) or you could do it by deployments (Mar 3 2015, Apr 6, 2015, etc). Whatever works for you.

The Dashboard lets you know if it detects a change and the version is incorrect, or if it detects a known version that’s been deployed to a new environment.

I’ll describe it more in other posts, but for now, check it out. It’s free, and it’s worth a few minutes of your time to try.

Download DLM Dashboard

T-SQL Tuesday #65 – Learning Computed Columns for XML

It’s T-SQL Tuesday time again, and this month the invite comes from Mike Donnelly (@sqlmd). The topic is Teach Something New and it’s a good one.

I try to learn something all the time, and this post is about something I ran into recently with a submission from a user that I needed to check. I had to dig in a bit and see how XML worked in computed columns and I decided to write about it for this month’s entry.

T-SQL Tuesday is a monthly event on the second Tuesday of each month. There’s a new invite each month, and you need to watch Twitter and the #tsql2sday tag to find the next one. I keep a running list of past topics, and feel free to write on any of them and post something.

Learn Something New

I haven’t done a lot of work with computed columns or XML in my career. At least not in SQL Server. I do remember using VB and traverse XML documents and pull out nodes, but that was years ago.

Recently I had to verify how XML computed columns work and that required me to dig in a bit to the rules for both. This post looks at the way in which I learned how to return part of an XML column from a table.

Imagine that we have a table storing some XML data inside it. For my demo, I have a table storing profile information for users. I’ve actually seen a table like this, grabbing information for a website and then shredding it later if there are changes needed in the database. Here’s the setup of the table with a row of data inserted.

CREATE TABLE UserProfile ( UserID INT IDENTITY(1,1) , profile XML ); GO INSERT INTO dbo.UserProfile ( profile ) VALUES ( N'<user> <username>way0utwest</username> <history> <pageurl>/default</pageurl> <pageurl>/scripts/</pageurl> <pageurl>/scripts/id/2433</pageurl> </history> <points> <forums>32432</forums> <qotd>1123</qotd> </points> </user>' );

If I want the username from this table, I can use this query:

SELECT profile.value('(/user/username)[1]', 'varchar(max)') FROM dbo.UserProfile AS up;

That will return this data:

xmlshred_a

However, suppose I don’t want to use that code in all my queries in the application. Certainly it’s not bad for stored procedures, but I don’t want to repeat that over and over either.

Instead I want to use a computed column. I’d think that we could do something like this:

ALTER TABLE dbo.UserProfile ADD Username AS (profile.value('(/user/username)[1]', 'varchar(max)'));

However that doesn’t work. I get this error:

Msg 435, Level 16, State 16, Line 1

Xml data type methods are not supported in computed column definitions. Create a scalar user-defined function to wrap the method invocation. The error occurred at column "Username2", table "UserProfile", in the ALTER TABLE statement.

I haven’t seen this documented in Books Online, but it’s a restriction. However, there is a specific document on using XML in computed columns, which gives a workaround. The way to do this is to use a user-defined function. I need to build a function with my query:

CREATE FUNCTION UserProfile_XML_UserName(@xml xml) RETURNS varchar(200) AS BEGIN RETURN @xml.value('(/user/username)[1]', 'varchar(max)') END ; GO

Now, I can use my function in the computed column

ALTER TABLE dbo.UserProfile ADD Username AS (dbo.UserProfile_XML_UserName(profile));

Now when I query the table, I have a new column with the username:

SELECT top 10 * FROM dbo.UserProfile AS up

Here are my results

xmlshred_b

That’s what I learned in the last month in working with T-SQL. This came from a user submission, and it required me to dig into BOL and find out how I could make this work.

References

Here are the references I used to learn about this:

The Opportunistic Hacker

You’re sitting in Starbucks, working. You need to use the restroom and you ask the hip, cool looking person sitting coding next to you, with whom you’ve chatted with for the last hour about C# and SQL, to watch your laptop. You leave and come back to an empty table.

This was a hacker, coding away at their side job to pay a bills in between causing mayhem. They walked away with your laptop, not knowing if you work for a bank or a soy milk manufacturer. The hacker won’t know if you have a copy of the production database on your laptop or simple a few .ASPX files that are useless without a VPN connection to your office. Hopefully that can’t access other machines from yours.

This is a crime of opportunity, and it happens regularly. Criminals steal laptops from coffee shops, airports, cars, and more. Often just because they can. Some may just wipe the device and sell it, but others may spend some time poking around to see if there is any value in the data on the machine. Or on a connected machine.

Many of us have far too much data on our machines, with many, many saved passwords and connections. We have copies of code, databases, or backups, and losing our machine to a random criminal might be just as damaging to our organization as losing it to a hacker that targets us.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.9MB) podcast or subscribe to the feed at iTunes and LibSyn.

Pre-built or Ad Hoc

One of the advantages of NoSQL databases id that the schema and organization of data is very flexible. The various types of databases usually allow the schema or organization of data to vary across entries. I hesitate to call them rows, but essentially each time you add data to a store, you can alter the format of the data inserted.

For relational database professionals, this seems to be a recipe for disaster, with entirely too much data being captured in an un-organized fashion. At some point a user will want this data to be returned in a report format, which almost always seems to be rows and column related data, even behind the scenes of the incredible visualizations that appear in modern dashboards.

I had someone recently note that their users don’t want to write ad hoc queries or try and discern the meaning of varying structures of information. They want pre-built structures they can count on and use reliably to answer questions. I suspect many users don’t want to decode the meaning of structures that change, despite the fact that so many users want to reformat and change the shape of data in Excel. Those of you that have to re-import some of these spreadsheets know just how unstructured a set of rows and columns can become.

I really think that it is important that structures of data be decided upon and ordered in a known way so that users can easily understand the meaning behind the data. However we are gathering more and more data in new ways, from new sources, and we don’t have consistent ways of recording that information. That will continue in the future, and I do think that learning how to access new sources, like Hadoop, and present that data back to users in a familiar format will become a way to show you are a valuable resource for your organization.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.3MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Hardware Issues

My backup drive failed last night. I was struggling with a few VM issues and when I finally resolved a few, I went to back up the VM in case I had more. I plugged the external, 2.5” SSD into my laptop and got nothing. No response, no new drive in Explorer.

That’s not what I want to see, though I’m glad it was a backup drive, and not one I needed for presenting. Not much I could do at night, on the road, but in the morning I resolved to get a new one.

I did have a California geek moment, heading to Fry’s for the first time to replace my drive. I’ve read about, and heard about the store for years. I’ve passed them in Northern California, but never gone in.

Photo Apr 10, 8 33 10 AM

It was a bit of a mistake, since I was late getting back to our Red Gate event. I got a little entranced with the various displays and options in there. It’s like a Best Buy++, combined with a Radio Shack and more. Microcenter in Denver is similar, but Fry’s has more.

I decided to go mSata rather than a 2.5” one, mostly for space and weight. It’s amazing to me how small things have gotten. I got the Samsung 840EVO 500GB drive and a small case.

Photo Apr 10, 11 35 00 AM

I carry a screwdriver, but the case came with a tiny one that I used to mount the mSata drive. How small is this? Small.

Photo Apr 10, 2 08 58 PM

The picture above shows my hand, a 16.9 oz water bottle, my Arc mouse, and the 500GB mSata drive in an enclosure.

500GB!

I could probably carry 4TB worth of these one hand. For $249 for the drive and $20 for the case, it’s amazing. I’ve got a nice backup for multiple copies of my VMs, which is very handy and cost effective.

If you haven’t tried mSata, take a look. I’m not sure I’d look at any other formats for portable storage.

April Blogging Challenge 2 – Primary Key in CREATE TABLE

The April Blogger Challenge is from Ed Leighton-Dick and aimed at new bloggers, but anyone is welcome. I’m trying to motivate and cheer people on.

Designating a Primary Key at Design Time

In the first post on this subject, I noted that I often add a primary key after the fact, using ALTER TABLE. However I’ve been wanting to build the habit of adding the PK at design time, inline with the CREATE TABLE statement, so I decided to look up the syntax and start practicing.

The inline code adds a CONSTRAINT clause into the table definition. I can do this in two ways, the most common is after I’ve built all the columns, I add CONSTRAINT, a name (PKUsers) then the PRIMARY KEY keyword with the optional clustered/nonclustered designation, and finally the column(s) in parentheses

Here’s my new code:

CREATE TABLE Users ( MyID int IDENTITY(1, 1) , firstname varchar(250) , lastname varchar(250) , gender char(1) , postalcode varchar(12) , contactphone varchar(12) CONSTRAINT pkUsers PRIMARY KEY CLUSTERED (MyID) ); GO

The better way, IMHO, is to do this inline if you have one column. I know this isn’t consistent, but I can easily see the constraint this way.

CREATE TABLE Users ( MyID int IDENTITY(1, 1) CONSTRAINT pkUsers PRIMARY KEY CLUSTERED (MyID) , firstname varchar(250) , lastname varchar(250) , gender char(1) , postalcode varchar(12) , contactphone varchar(12) ); GO

This is more difficult to see if you have multiple columns, but you can do this:

USE sandbox; GO CREATE TABLE Users ( MyID int IDENTITY(1, 1) , firstname varchar(250) , lastname varchar(250) CONSTRAINT pkUsers PRIMARY KEY CLUSTERED (firstname, lastname) , gender char(1) , postalcode varchar(12) , contactphone varchar(12) ); GO DROP TABLE dbo.Users; GO

Simple and easy to find the PK now, it’s named correctly, and it’s easy to read.

References

A few places I used

Quick and Easy Blogging

This post occurred to me while I was writing my first post.  Actually, this occurred to me first, but I realized that I often do the ALTER method in post 1, so I wrote that first.

Changing to this post required using the same MSDN article, dropping the table, rewriting the DDL code, and testing it. About 5 minutes for this one because it was based on a previous post.

Look for the other posts in the April challenge.

State v Migrations

Most people work in evolutionary databases. By that I mean a database where you have some schema, and over time you are altering that schema. You might be adding columns to tables or views, changing stored procedure or function code, or something else, but you aren’t really rebuilding the database from scratch on a regular basis. Certainly some people sell software and a database that’s always being rebuild and upgraded from state X to state Y, but that’s a much more complex issue.

For those of us that work with these evolving databases, we really have two choices in how to work on upgrades. We can store each change to the database as a script and ensure we run the correct scripts in the correct order (discarding those that aren’t needed) when we deploy changes. I’ve done that before, and it can work, but this approach requires I’ve have good control of production to prevent changes from being made in that environment that aren’t also made in development. This is the approach advocated by Paul Stovell, of Octopus Deploy.

The other approach is to look at the state of development at some point in time, compare that to production (with a tool like SQL Compare) and then generate a script that makes the changes needed. This is how lots of people deploy their changes today, though this approach isn’t without its own issues. NOT NULL columns, renames, and more can cause problems with this approach. There are ways around these issues, but they require some work.

Ultimately the problem of smoothly deploying changes to databases requires a bit of discipline from the DBAs and developers. Tools can help, and they certainly can reduce the work involved, but good habits and a consistent process are important to ensure that changes are made smoothly. The one thing that helps you find problems with your process and code is testing, which is something I’d recommend you implement no matter what method you choose for deploying your changes.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Release Wednesday

I’m glad that someone else is adopting Release Wednesdays. I was surprised to see the post from Red Gate, though I shouldn’t have been. I knew we were releasing regularly and some groups were aiming for weekly releases. There was easily a 1 in 7 chance of them choosing Wednesdays, which has a special meaning for me.

I worked at a startup company where we released every Wednesday. We worked in one week increments and developed an application across a couple years that evolved and morphed to meet differing business needs in the education market. We went from working with digital sales, to including physical products, to subscriptions. Our customers went from individuals to corporations, to departments in corporations, and all along the way we managed front end and database changes in one (or sometimes two) week increments.

We didn’t practice Agile or Scrum or anything formal. We worked as closely as 10 people could, coordinating and communicating to get features implemented every week. We started releasing on Wednesday nights with the lead developer and myself staying late at the office Wednesday nights (usually with another developer or two for company) for an hour. That was reduced to phone calls where we deployed from home (and fixed problems) in real time. Eventually we built such a well engineered process, with thorough testing, that our deployments were completed in minutes with a few chat messages to verify we’d both finished our tasks.

We did that for over 18 months with only one or two rollbacks, and without the all night frantic coding sessions to fix issues that I’d experienced elsewhere. It wasn’t simple, but it wasn’t hard. It required preparation, planning, a strong culture of professionalism, and support from our CTO.

I see the same types of culture at Red Gate, and I’m not surprised by the rapid releases, nor the success we’re having in evolving, updating, and patching our products over time. I’m excited to see how much quicker we can move and how we will continue to build better software, faster. 

One Wednesday at a time.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.7MB) podcast or subscribe to the feed at iTunes and LibSyn.

Follow

Get every new post delivered to your Inbox.

Join 5,197 other followers