Skip to content

Better Test Data Domains with SQL Data Generator

I’ve been working more with SQL Data Generator (SDG) because it solves some problems that many software developers have with test data. Often each developer needs to create their own set of test data, with these being the common actions:

  • Use a backup of production, perhaps old.
  • Randomly insert a few values each developer comes up with
  • Use random data from SDG, a new project each time.
  • Load a known data set from production or test systems

While I think small amounts of random data work well, I think the data should reflect (somewhat) the types of data in production. Totally random strings don’t work, but similar words, structures, etc. make sense.

However having a consistent set of data for each developer is a great idea. SDG can consistently generate data sets, but to make them meaningful, you might want to have control over the types of data inserted.

Using Real Words

I talked about the difference between random words and real words. However, what if you want to include specific types of items?

Let’s evolve some data. If I have a varchar(500) column, SDG defaults to this:

[A-Z0-9]*

Which gives me this:

2015-08-28 10_03_55-New notification

That’s not great if I wanted to examine specific rows and determine if they were being returned by a query. This is just too random and hard to verify.

However I have options. For example, I could use the “Insert File List” item. This gives me a list of files that could be helpful. In this case, let’s choose “Color”.

2015-08-28 10_05_33-SQL Data Generator  - Fun_Article_Titles_Words.sqlgen

I see this in the RegEx box.

($”Color.txt”)

Now my test data looks like this:

2015-08-28 10_06_14-

What’s in “color.txt”? Let’s see. The file is under the Data Generator 3 folder, in a Config location. I see lots of XML and text files.

2015-08-28 10_08_09-Config

If I open Color.txt, I see what I expect.

2015-08-28 10_08_17-Get Started

Now, let’s experiment. Let’s create a SQL Server file. I’ll put values in like this:

2015-08-28 10_08_17-Get Started

I need to change permissions on the config folder to allow saving, but I put it there.  I also had to close and re-open SDG to pick up the new file.

Now I’ll add that to the RegEx box.

2015-08-28 10_15_44-SQL Data Generator  - Fun_Article_Titles_Files.sqlgen _

That’s cool. What if I made a file with a number of random words in it. Like a dictionary of sorts. I could do this. I create dictionary_small.txt.

2015-08-28 10_22_10-Get Started

Now I include that a number of times.

2015-08-28 10_23_17-New notification

Those are some great descriptions. I’m sure I could have fun with this in other ways as well. Let’s create some good and bad data for a cleaning operation.

2015-08-28 10_28_38-New notification

The ability to include data from flat files is a great option in SDG for putting together a data set that developers can actually use, understand, and count on for loading up new databases or tables, especially when creating quick code branches to test something out.

Software Estimates

We really stink at estimating the amount of time that we’ll spend on building software. However I’m not sure that we’re much worse off than many professions. If you ask a doctor how long it takes to get better, do they know? They do in some small, often seen, well documented issues, but in many cases, they don’t really know. Can lawyers determine how long it takes to solve a case beforehand?

It seems the more accurate estimates typically come from places where the problem is fairly simple and very well known. However that’s probably true in software as well. If I asked you how long to write a new blog site, I bet many developers could come up with a close estimate.

The idea of telling a computer to do x seems more bounded and contained than other professions, but it’s really not. This is often because developers don’t get a complete statement of all the requirements from clients. However this also isn’t really any different than many professions. How many of you know exactly how you want a house built, or legal matter settled, or the specifics of medical treatments and effects?

I read an interesting piece on the idea that developers should just make small estimates, small changes, and move forward. They should be willing to undo, or redo their efforts as needed. That’s one of the benefits of Version Control Systems (VCS) and the great developers take advantage of the possibilities.

While there are certainly problems with moving too fast, there should be plenty of reviews, tests, and other steps in the process that keep the quality of code high. Letting developers push forward, experimenting, and building sections of code quickly can help continue to push your development effort quickly. As long as they’re willing to undo and redo their efforts as they get closer to the end product.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.1MB) podcast or subscribe to the feed at iTunes and LibSyn.

Upgrading a SQL Azure Database to v12

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I was looking to test some of the Azure v12 capabilities, but I needed to upgrade a database. Here’s a quick look at how.

In the old portal, you’ll see your databases.

2015-08-14 13_49_10-SQL Databases - Microsoft Azure

I love that there’s no information on database version. I’m sure the Azure people love this, but I don’t. Very, very annoying.

If I pick one of these, and go to the “Quick Glance” items on the lower right, I’ll see the “update to v12” message.

2015-08-14 13_49_34-SQL Databases - Microsoft Azure

However in the new portal, it’s different. At first glance it looks the same, but zoom in.

2015-08-14 13_51_00-SQL databases - Microsoft Azure

Now you can see the icons are different. I couldn’t read the 12 at first, but I did notice the color difference.

2015-08-14 13_52_12-SQL databases - Microsoft Azure

Now I can go into the properties of the database and see those. Mine said v2 here, which was strange. Anyway, pick that.

2015-08-14 13_54_02-Movies & TV

You’ll get a new blade, which will show you the upgrade benefits. Select “Upgrade this server”.

2015-08-14 13_54_07-Latest SQL database update - Microsoft Azure

Now you get a blade to upgrade the server.

You must type the server name, not the database name, to upgrade. Note that all databases on this server will be upgraded. Be aware of that. However, it’s not the full server name.

2015-08-14 13_54_48-New notification

It’s just the host, but a tool tip will prompt you if you mess that up.

2015-08-14 13_55_00-

Once that’s done, the upgrade will start on the blade. You can close this.

2015-08-14 13_55_07-Movies & TV

The upgrade will also appear in the main blade for the database. These are scheduled, not real time.

2015-08-14 13_55_14-Movies & TV

However, within about 3-4 minutes, mine changed to “upgrading” from the schedule notice.

2015-08-14 14_00_05-Movies & TV

This was a small database, with just one table in it. However the complete upgrade took awhile. The database is available and usable while this is happening, but it does take time to complete. I think mine took about 15 minutes before it was done, but I wasn’t watching the entire time, so I could be wrong.

Once it’s done, it has the new version

2015-08-14 14_04_30-Mytester - Microsoft Azure

Easy enough, and if you need to upgrade your systems, here’s a simple way. If you have lots of them, I’d use PoSh.

SQLNewBlogger

This post didn’t take that long to write. I had the database ready, and I took a few screen shots then started the upgrade. I wrote most of this while waiting for the process to complete and then went on to other work and came back for the final screenshot.

Easy enough, and you should practice this and document it. You’ll have some idea of how to use Azure if someone asks you to later along with some notes for yourself.

References

None

No More Foreign Keys?

I struggle with removing FKs as a general idea. It’s presented in a piece from a developer. It appears to me that if you do this, you are merely moving the problem, and potentially causing other issues. Apart from Grant’s response on performance, which is very true, there are other challenges with microservices and separate databases for each service.
Let’s say we are doing something trivial, like a simple e-commerce site. I could have a separate database for products, one for orders, one for shipping actions, one for customers, etc. However as I scale, I’m not using intra-database queries or FKs/joins/other techniques to verify information for an order. Now I’m sending messages. At small scale, these are trivial. At large scale, my network traffic and messaging is vastly growing and potentially a bottleneck.
Also, since there is a “products” database, and an “orders” database and a “customers” database, I still can’t necessarily scale out these microservices beyond their own machine. Perhaps I can scale them higher in terms of each being as large a machine as I can have now with a single Oracle/SQL Server/MySQL/etc. box, but I’m still have a scaling issue. I also now have a new messaging problem I need to manage and architect. If I lose one database, how gracefully can my application degrade or does it start to fail with unforeseen interactions? Do I create more frustration when customers cannot place an order because we can’t find the customer db or because the site is down?
Certainly there are domains of problems that would work fine here. Spotify is probably a good example of this. There may be e-commerce systems that allow this, perhaps being willing to notify customers after some time (hopefully minutes, perhaps hours) that a product is out of stock because the message from the orders database didn’t get to the application. There are certainly ways to copy and replicate some  data, assuming that “fresh enough” works for your application. However that’s a business decision more than a coding decision. This also means that you also have a new load on your system to move data. I also think the more you work with singleton rows and single objects in an application context, the less you need RDBMS capabilities.
Ultimately I think FKs and strong RDBMS systems work very well in many situations. They work less well in many situations and your domain may qualify for those areas where a NoSQL, a multi-db, or other architecture works. We should certainly investigate and write about where things work and don’t work, but I wouldn’t assume the RDBMS with it’s PKs and FKs isn’t a valid, well thought out and incredibly useful architecture.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Adjusting Model

In July the T-SQL Tuesday theme was defaults. There were lots of posts about various items, but I found one by Kevin Kline interesting. Kevin noted that the settings for model are not necessarily what many of us would want in production.

Is that really the case? This week I wanted to take a poll and see if many of you actually do create defaults for your new databases by changing model that are different than what Microsoft has done.

Do you change the model database?

Certainly there are some people that have create scripts with various settings for their databases. Some people prefer to customize the setup each time, based on the purpose of the database. Still others have scripts that alter the database to change settings after a someone has installed the instance.

There are any number of ways to ensure you have good settings for your production (or other) databases. Let us know this week if you have a step to change the model database when a new instance is installed.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.7MB) podcast or subscribe to the feed at iTunes and LibSyn.

tSQLt in Azure SQL Database

I was excited to hear about the v12 Azure databases supporting CLR assemblies. Mainly because I’ve been doing testing work with tSQLt and wanted to run tests in an Azure database.

I upgraded a SQL Azure database to v12 and connected in SSMS. My first step was to open the tSQLt file.

2015-08-14 14_27_22-Start

I had the file open, connected to Azure. This was the easy part. Now I need to actually send the batch across and compile the code:

2015-08-14 14_27_33-Start

My next step was to execute it. However that didn’t work as well as I expected.

2015-08-14 14_27_58-New notification

There are multiple errors here, but it’s possible that one error causes others. I pinged Sebastian Meine, the creator of tSQLt about the External_Access and he noted there was only one method that needs it.

So I decided to make a change. First, a search.

2015-08-14 14_28_18-tSQLt.class.sql - dkranchapps.database.windows.net,1433.Predictions (sjones (54)

That got me the assembly installation.

2015-08-14 14_28_28-Start

I decided to try and change this to something that’s contained inside the database. Since Azure is a bit of a black box, I thought safe was the way to go.

2015-08-14 14_28_42-Start

With that change made, I compiled the entire file again. This was the only change I made.

2015-08-14 14_30_18-Movies & TV

That’s cool, but does it work? I connected in Object Explorer and then opened SQL Test. I saw my database, and I could create a test.

2015-08-14 14_31_36-Cortana

However I got an error on the connection and creation of a procedure.

2015-08-14 14_42_31-Movies & TV

Even from Object Explorer, I refreshed the procedures, but got this:

2015-08-14 14_42_44-Microsoft SQL Server Management Studio

This appears to be a SQL Server 2014 RTM error. Actually I was on a CU, but not SP1. It is supposed to be corrected in SP1.

However the procedure was created, and I could alter it.

2015-08-14 14_43_50-Movies & TV

More importantly, I can execute it.

2015-08-14 14_44_04-SQLQuery4.sql - dkranchapps.database.windows.net,1433.Predictions (sjones (52))_

If I change the code.

2015-08-14 14_44_23-Movies & TV

Now it fails.

2015-08-14 14_44_33-SQLQuery4.sql - dkranchapps.database.windows.net,1433.Predictions (sjones (52))_

It appears that tSQLt can now work in v12 databases in Azure, so start adding those unit tests to your database projects.

A Clock With Benefits

One of the difficulties I think that developers face is that so few people understand what actually goes into producing software. The end result that a user sees can mask the effort and complexity of the underlying code. A person might see one application that required months of effort and not understand how it’s different than a similar (in their view) application that was prototyped and completed in a week.

I’ve seen a few efforts to try and explain how code is written, and I think the huge piece by Business Week was a really good explanation for the layman. It tries to simplify concepts, and does so in a humorous way. I suspect that this piece might actually help our clients and customers gain some appreciation for the difficulty of producing a basic application.

However I think we need some additional articles like this to help explain our database world. While there are some good, basic pieces on what a database is, we don’t necessarily help anyone understand the complexity of assembling disparate data, especially at scale.

I don’t think end users would care much about why database performance can be an issue, but I do think that some well written, easy to read, enjoyable pieces on the issues of writing poor database code might give developers some appreciation for why indexes matter, what RBAR does to performance, and what solutions might help their code shine.

No one wants software written poorly, and no one wants deadlines missed. However the world of software development is complex and the more that we can help our clients understand this, the less time we waste on explanations.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.3MB) podcast or subscribe to the feed at iTunes and LibSyn.

The Log Chain

I ran across a question recently from someone that received a similar message to the one showed in this image:

2015-07-30 12_02_22-SQLQuery3.sql - ARISTOTLE.master (ARISTOTLE_Steve (62))_ - Microsoft SQL Server

In this case, the log chain is broken. I’ve received this message, usually from a script that is automatically restoring all the logs in a folder. I’ve also had clients call me with this message, saying their log shipping system is broken.

What has happened is that you (or someone) is restoring the logs out of order. Each log backup forms part of a sequence that we call the log chain. The restore process requires that each log backup be restored in the same sequence in which is was created.

In the error message above, SQL Server is letting you know that the Log Sequence Number (LSN) contained in the file from which a restore was attempted doesn’t match the last LSN in the restored database.

Fixing This

The fix is easy; restore the correct files in the correct order. In this case, I knew I’d taken 3 log backups, and I was restoring in the incorrect order. I’d restored log backup 1, and was now trying to restore log backup 3.

If I restored log backup 2 (using with norecovery), the restore would work. I could then restore log backup 3 and eventually bring the database online.

If you can’t find the correct log backup, then your database is still usable. Just run RESTORE DATABASE WITH RECOVERY, and you can access your data. However the data will only be correct to the point in time of the last successful restore.

That could be a serious problem if you are missing a log backup early in your restore sequence. It could be a career limiting problem.

Always be sure you keep all log backups since the earliest full backup you keep handy for restores and protect those files. Losing one could be a major problem for your organization.

Surfing

How many of you have gotten up before dawn, driven to the coast, gotten on a long piece of fiberglass and paddled into the ocean before turning around to ride back to shore? It’s a great way to start your day, and I’ve been lucky enough to do this many times in my life. I look forward to getting back out there someday in the future.

Surfing is an interesting sport, because you don’t really get to pick your path. You catch a wave, and begin riding it in, altering your course based on the speed of the water, the height and structure of the wave, and even the way the current is flowing. While you move in the direction you want, your choices are limited.

This is unlike many other moving sports. While snowboarding you can stop or slow down. In many wheeled activities you can go in almost any direction. Much of the time you have more control over your motion, and can even take a break.

However waves are unique. They’re few and far between. If you want to slow down, you end up really abandoning that journey, one you’ll never get again. If you paddle back, you’re hoping you’ll get the chance for another fun ride that day. That isn’t always the case.

I saw someone post a note that working in technology is like surfing waves. As we learn and grow, we often feel that we can’t slow down, or change directions very much. If we’re a SQL Server DBA, we’re riding that wave. Trying to slow our career, or move to another technology means (often) abandoning some of what has worked well for us. We really follow the wave we’ve chosen as often as possible, usually resistant to change.

You can change, however, and if you want to do so, make some plans and look for a new wave. It will take time; it will require some investment, but it’s possible. Always remember life is short, and we want to work in order to enjoy our lives with friends and family. We don’t want to get stuck working in ways that prevent us from enjoying our lives.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn.

Pick the Right Database in Azure

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

Another quick post, because this is something that’s annoyed me regularly. Hopefully writing this will help me make fewer mistakes.

Let’s suppose you have an Azure SQL Database. You go to connect from SSMS, as you do every day.

2015-08-14 12_30_11-SQLQuery3.sql - ARISTOTLE.SQLServerCentral_Trunk (ARISTOTLE_Steve (57)) - Micros

you log in, and you’re connected. But your default, by default, is master. This probably happens to you when connecting to many on-premise SQL instances. What do you do?

2015-08-14 13_21_01-Cortana

Most of us either click the drop down, which you can do, and even in Azure, you’ll see all your databases.

2015-08-14 13_21_10-Cortana

You can also enter a USE statement (which I do as I like the keyboard). However when you run that, you have an issue.

2015-08-14 13_21_34-SQLQuery4.sql - o8c1w0tm5u.database.windows.net,1433.master (sjones (59))_ - Mic

These databases, while they may be grouped on the same server for communications, do not necessarily reside on the same instance. Therefore you cannot ever assume you can change from one to the other with a USE. In fact, you need to actually change your connection.

The way to do that is to click “Options” on the connection dialog and then choose the second tab, Connection Properties.

2015-08-14 13_21_46-Connect to Database Engine

Here you can specify the database to connect to. Once you do that, everything is great.

SQLNewBlogger

This took about 5 minutes to write. I changed my connection back to master, reshot the screens as I reconnected, and things worked.

You can do this. Show your next manager that you’re constantly working on your SQL Server skills.

References

No references here. This is something I should know well enough.

Follow

Get every new post delivered to your Inbox.

Join 5,380 other followers