Creating a Database Snapshot

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I’ve rarely dealt with database snapshots, outside of lab experimentation. They didn’t exist when I did most of my DBA work, and since then we haven’t seen the need for them at SQLServerCentral, though, I may suggest we add them to our deployment process since we can quickly roll back if something breaks.

However, I created one recently for a test and realized that I’d forgotten the syntax. Unlike the quick "create database xx" syntax I often use, with a snapshot I need to be more specific.

The big item that I must specify is the location of the snapshot file. This is the file that gets written to with the copy-on-write process that ensures the original state of the pages when the snapshot is created are still available.

You do need to give the database a logical name as well, which can be anything, but the reference below has suggestions. I’d say that this is bad:

CREATE DATABASE sandbox_snapshot
ON (NAME = MySnap
, FILENAME = ‘h:\SQL Server\MySnap.ss’
)
AS SNAPSHOT OF Sandbox

But this is better

CREATE DATABASE sandbox_snapshot_20150122_1345
ON (NAME = SandBox_Snap_20150122
, FILENAME = ‘h:\SQL Server\MySnap.ss’
)
AS SNAPSHOT OF Sandbox

Because the snapshot is based on an existing database, at a particular point in time, it’s useful to specify the time when the snapshot was created, if possible. While you can get this from metadata, if you have people that look to multiple snapshots for information, it can be handy to know when each is from.

However if you are providing this on a schedule, like daily, for people to report from, you might need to have the same name every day. Think about this, as you cannot rename a snapshot once it’s created.

SQLNewBlogger

When I realized I had to lookup the syntax, I took a few notes and captured code, which meant I combined this writing (< 10 minutes) with other work I was doing.

You should do the same. When you tackle something new, take screenshots, save code, and drop it in a OneNote/EverNote/etc notebook for your weekly blog writing.

Reference

The references I used:

Crafting Your Resume

Your resume of CV is often the first glance that a hiring manager gets about your career. Even if you’ve been recommended by a friend or current employee, often a manager requires some summary of who you as a few paragraphs on a screen that they can study.

I have my own advice, but this post from a manager at StackOverflow covers quite a few of the same things I recommend. I certainly agree with the first section, writing for humans. Over and over again I hear from people that make hiring decisions that they spend 30-60 seconds on a resume to get an impression.

One minute. You should set a timer for one minute and let someone read your resume. Then take it away and ask the person for their impressions of what you know. In fact, maybe that’s a great icebreaker at a user group meeting or SQL Saturday. Find someone that hires others, or is an experienced person in your industry, and ask them to do just that.

We learn a lot from experimenting and seeing what works well and what doesn’t. Many of us solve problems in code and realize later that we could rewrite things more efficiently. Why not do that with our resumes? We certainly can control how we present ourselves, be interesting, and more importantly, don’t waste the reader’s time.

You get one minute, or less, to make a good impression, so spend some time crafting your resume. Control your brand and ensure that you let people know who you are. Do your best to communicate the skills you have, the things you do well, and the ways in which you are a good employee. Most of us will change jobs at some point, so why not be prepared to present yourself in the best possible way you can by working on your resume now.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio (2.4MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Push Button Deployments

I worked in an Agile team in 2000 and we deployed changes to our web platform every Wednesday for over 18 months. That was about the length of time I was involved before the small startup failed, but not for lack of technology. In fact, I managed a few weeks extra pay by helping sell and deploy our IP to another company.

Deploying regularly and often, to meet business needs and get features into the hands of customers is important. For most businesses, this is something the CTO and CEO want. It can work, and there’s a case study with Nmbrs, a Redgate customer, that has used DLM with this quote:

“The productivity of the team has improved dramatically. We were doing one database update a week, which took almost a full day. Now it takes ten minutes. We can spend the extra time we’ve gained developing new features.”

That’s been my experience with tooling and automation. If you get things setup, as I did with my startup, then you gain confidence in deployments, are willing to make them more often, and get these small chunks of software to customers on a regular basis. It’s not just customers, as we found we could make some fundamental changes in databases and applications that helped smooth development later without affecting customers.

I know I’m paid by Redgate to talk about DLM, but the concept, the idea, the processes really do work well. They’ll help you get features to customers quicker (and feedback to developers faster) as well as give you confidence that you can release, if you find an issue with your application.

Advent of Code–Day 2

I’ve continued working along, and while I found Day 2 to be straightforward in Python and PowerShell, I thought it was tricky in SQL I decided this one was worth a post, since I had to futz around a few times to solve it, and I managed a slightly different way than some others.

If you haven’t solved it, then try. Come back here later and compare solutions, but give it a try first.

 

Solution coming below, so don’t scroll if you don’t need the solution.

 

 

 

 

 

But first,

 

 

 

 

 

Missteps

I had a misstep in this one. I loaded the entire list of packages as separate lines into separate rows into a single column table. My challenge to myself was not to use ETL work to break this apart, or updates. I wanted a simple solution, thinking I didn’t want to take up extra space in the database.

As a result, I wanted a single query from a single string column that had the package size stored as one column, ‘2x3x4’ as an example.

My first attempt used the Moden Splitter function, which seemed to work well. I got three rows for each package. I then used a WIndow function to grab that data, order by the sizes, and then start performing calculations. When I didn’t get the right result, I started digging in.

One of the first things I saw was that I had multple packages with the same sizes. So I had two 22x3x1 packages, and when I used a partition based on the dimensions, I had calculation problems. That’s because the window partition doesn’t know that three rows are one package and three are another.

I could have fixed this with some other value to capture the package, maybe a row_number even, but I decided not to go down this route.

 

 

 

 

My Solution

I decided to break this down, and I used a series of CTEs to do this. I haven’t gone back to optimize things, or combine CTEs, which is possible, but instead left the CTEs as I wrote them to solve parts of the puzzle. Multiple CTEs are east, and they help examine the problem in pieces.

My first step was to parse the string. I don’t love this solution as it is limited to a three dimension package, but it does seem to be the easiest way to break down the dimensions of the package. My query looks to find the string positions for:

  • end of the first dimension
  • start of the second dimension
  • start of the third dimension.

This gives me the simple query:

with cteSplit (d, el, sw, sh)
as
(
select
   dimensions
, endlength = charindex(‘x’, dimensions) – 1
, startwidth = charindex(‘x’, substring(dimensions, charindex(‘x’, dimensions),20)) + charindex(‘x’, dimensions)
, startheight = len(dimensions) – charindex(‘x’, reverse(dimensions))  + 2
from day2_wrappingpresents d
)

Once I had these values, a little math gives me the length, width, and height.

, cteDimensions
as
(select
   d
   , l = cast(substring(d, 1, el) as int)
   , w = cast(substring(d, sw, sh-sw-1) as int)
   , h = cast(substring(d, sh, len(d)) as int)
from cteSplit d
)

Now I’m in business. These two queries were fairly simple, despite all the nested functions. I’ve got integers with the dimensions of each package.

Now the tricky part. I want these ordered. They’re columns, not rows, and I can’t put an ORDER BY in the CTE, so I need to use some comparisons.

, cteOrder
as
( select
   d
, small = case
            when l <= w and l <= h then l
            when w <= l and w <= h then w
            when h <= l and h <= w then h
        end
, middle = case
            when (l >= w and l <= h) or (l <= w and l >= h) then l
            when (w >= l and w <= h) or (w <= l and w >= h) then w
            when (h >= l and h <= w) or (h <= l and h >= w) then h
        end
, large = case
            when l >= w and l >= h then l
            when w >= l and w >= h then w
            when h >= l and h >= w then h
        end
  from cteDimensions
)

Not the prettiest code, and perhaps there are better ways to determine this, but this passed all my tests, and seemed to work.

I could have put the next part in the final query, but I decided to make this a separate CTE to easily read the math. I know some people don’t like lots of CTEs, but in this case, I think they make the query very readable. I should look back at this in six months and see what I think.

, cteFinal
as
(
select
  d
  , area = (2 * small * middle) +
           (2 * small * large) +
           (2 * middle * large)
  , slack = (small * middle)
from cteOrder
)

Now I use a final outer query to sum things up.

select
sum(area + slack)
from cteFinal

The other thing I noticed here is that when I needed to solve the second part, I only had to change the math in the cteFinal to get the new values. It took longer to re-read the second part than to change the code and solve it.

I looked over how Wayne Sheffield and Andy Warren solved this in T-SQL, and I thought their approaches were interesting. I didn’t want to PIVOT or UNPIVOT anywhere, nor did I look at performance here. This runs so quickly, I’m not sure it matters, though I wonder if we were calculating across 1mm rows, would one be better?

I may look, but for now, I’ll leave that to someone else.

Disabling SQL Server Network Protocols

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I ran across a question on network protocols recently, which is something I rarely deal with. Often the default setup for SQL Server is fine, but there are certainly times you should add or remove network connectivity according to your environment.

Here’s a short post on turning off (or on) a network protocol for SQL Sever.

What’s Enabled?

The easiest way to verify what’s enabled is to use the SQL Server Configuration Manager. You’ll need administrative permissions on the host to run this, but it’s easy to find.

2016-01-13 14_59_40-Start

Once you open it, typically you’ll have a list of the items that can be configured.

2016-01-13 15_02_09-Photos

We want the SQL Server Network Configuration, which is the server level configuration for this host. The Client configurations are for the host being used a client to connect to a SQL Server.

2016-01-13 15_02_31-Photos

As you can see here, I have Shared Memory and TCP/IP enabled for this instance, but Named Pipes disabled.

Disabling a Protocol

As you might expect, this is easy. I right click on a protocol, and I can change the status. In this case, I’ll disable Shared Memory

2016-01-13 15_03_50-Photos

Once I do that, the protocol is disabled. However not on the instance. I’ll get this message.

2016-01-13 15_04_56-Photos

I need to restart the server. Once that’s done, no one will be able to use Shared Memory on the host.

I can fix this

2016-01-13 15_04_49-Photos

Of course, I need to restart my instance again.

Checking the Log

When SQL Server starts, quite a bit of configuration information is written into the log. This is useful for troubleshooting in many cases. One of the things you’ll find is the network listeners, as shown here.

2016-01-13 15_08_14-Log File Viewer - JollyGreenGiant_SQL2016

This is usually after the database startup information, so if you look, you can see I have some local pipes and some TCP/IP settings here.

SQLNewBlogger

After reading a question, this was less than 10 minutes to write, with making screenshots. However I’ve done this before. If this was your first time, then it might take you longer to research and write, but I bet most DBAs could do this in 30-45 minutes.

Chips and Margaritas at SQL Saturday #461

It was a few years back that my wife and I planned a trip to the Austin City Lights music festival to see Stevie Wonder. This was a bucket list item for us, and we thoroughly enjoyed a long weekend in the city. I’ve been for SQL in the City as well, and each time I’ve gone, I’ve enjoyed a lunch at one of the Mexican restaurants South of the river. My kids always joke that my wife and I will fill up on chips and margaritas at Mexican restaurants, and it’s somewhat true. Often dinner becomes lunch the next day.

It’s just two weeks to SQL Saturday #461 in Austin, and I’m looking forward to going back. In fact, I’m going to make it a point to at least go get chips and a drink at the same place. I can’t remember the name, but I know how to get there, so I’ll be fine.

However the main event is the free, one day SQL Saturday event taking place. I’ll be there delivering my Branding for a Dream Job presentation, but there are tons of other great talks. From AlwaysOn to Power BI to Azure to Writing Faster Queries, you’ll have a great day of learning on all sorts of SQL Server topics.

If you’re anywhere near Austin, register today and come join us in Austin for a fun day that kicks of my SQL Saturday 2016 tour.

Security Convenience

I wrote a question of the day recently that seemed to catch many people. The question had to do with mapping users when a login isn’t specified in the call. The behavior is to auto match existing logins with the same name. About 60% of the people answering the next day got it right, but a third missed it, expecting an error to be thrown.

One of the commenters was surprised that more people didn’t know this. I’d hope people knew this, though to be fair, I bet lots of people manage security through SSMS or a GUI and never write security code. I know I did for years early on. However I really think that the third of the people that got this wrong in its behavior, are actually right about how SQL Server security should work.

We do not want ambiguity when we configure security. We should be sure that rights granted (or removed) are exactly those that we expect. A strong security system should not tolerate any unexpected behaviors.

Security should require specificity.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

Tracking Professional Athletes

I used to read about technology to allow computers to track a single person’s movements from video footage. We’ve seen this shown in Hollywood movies, where casinos can take a picture of a person and backtrack all their previous movements throughout the day. I have no idea if this is possible, but if it is, it’s a scary proposition.

However in specialized, controlled circumstances, we can track people very well. The NBA (with SportVU) is tracking all player’s movements and gathering more data that can be analyzed to better evaluate player’s performances. I expect in the next year or two, players will start to review this analysis and learn how to better adapt to the situations on the court.

The NFL is also starting to use sensors to track their players, but with different goals. They are monitoring workloads, trying to ensure the health and peak performance of players. It remains to be seen how else they might use this data, but in the linked article, there are perhaps more implications for us as data professionals.

Can you imagine more tracking data, for any movable object available? People, machinery, who knows what else will be tracked, and what else we might need to analyze. I thought spatial data had tremendous possibilities when I first saw it introduced in SQL Server 2005. However across the last decade I think we’ve barely scratched the surface of what location data might mean for applications, using it mostly for mapping locations and routes. If this data does become useful, that means that learning to aggregate, trace, and analyze location might just need to be a valuable, if not core, skill set for the DBA and developer in the future. If you want to get started, maybe run through this article on your system and see what you think.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Rebuilding a Heap–Don’t Do It

I saw someone mention recently that you can run a REBUILD on a heap table. I hadn’t realized that, but when I looked in BOL, I saw that indeed you can run this on a heap. That’s interesting, in that you can now move the pages in a heap around to remove fragmentation. At first glance that sounds good, but I wasn’t sure.  A little more searching was required.

The authoritative source for me on many things like this is SQLskills, and sure enough, Paul Randal has a myth piece on this. Paul has a great explanation, but basically if you rebuild the HEAP structure, you are creating work in that every non-clustered index also has to be rebuilt. Why? The pointers back to the heap pages, which are locations, will change.

What about adding a clustered index and dropping it? Nooooooo, and again, I learned something new. This causes two rebuilds of the non-clustered indexes as they are rebuilt with the cluster addition and then rebuilt when the table changes back to a heap (to get the heap locations). That’s crazy, and certainly not what we want.

The short answer here is that you want a clustered index, for the maintenance reasons here, as well as others. If you don’t have a fundamental reason not to create a clustered index, just add one.

Just do it.

The Ongoing Costs of Building Software

I’ve almost never heard of a developer backing down from solving a problem quickly. In fact, I think most developers assume they can solve many problems quickly and whip up useful software. I know I’ve felt this way numerous times in the past. Need a little monitoring? I can knock that out this week.

However there are costs to maintaining software over time. While many of us can build scripts and small programs that solve problems, we also learn that those little pieces of software need some care and attention over time. That’s well illustrated in the piece titled If You Build It, They Will Complain.

I had a similar path to that of the author. I built a utility, then another, then another for various departments and people inside of my company. I knocked out useful software that others used. And then I got caught up in enhancing, repairing, and otherwise attending to said software. It was an eye opening experience to realize that often software is never done, unlike the experience I’d had in school.

I saw a very experienced SQL Server professional note a few years ago that they would never write monitoring software again. It’s not that this person couldn’t, or that they wouldn’t do a good job, it’s that they didn’t want to maintain the software, and there are plenty of good packages that do most of the job needed. This person could fill in missing functionality in small areas, while spending their time on more valuable tasks.

That’s how I think most of us should look at software. We should look to reuse and take advantage of software that’s written to accomplish tasks, evaluating costs against our time. It’s far too easy to discount an employee’s time, when we could easily outspend the cost of a piece of software already written by someone else if we tackle it on our own.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn.