Skip to content

The ALS Ice Bucket Challenge

A few friends challenged me yesterday, including Grant Fritchey and Aaron Bertrand. Last night, when I got to the Denver SQL Server User Group meeting, Todd Kleinhans was ready to take the challenge and asked if I’d participate.

I was speaking, but happy to do it afterwards. Here we are:

I tagged Allen White, Erin Stellato, and Jes Borland.

Production Subsets

Continuous delivery recommends developers never use production data. It’s too big, too cumbersome, and slows the process too much. Developers should have enough data to determine if their solutions work as they build them. Testing should have enough to do some tuning, but unless you plan on full performance/load tests (which you should), then you don’t need the full set of production data.

It’s an interesting idea, and overall I agree. A subset of data, hundreds of rows, can usually tell you if you’re writing code that works if you profile the code and look for inefficiencies. Note that profiling code doesn’t mean use Profiler. It means examining the resource used by your code in terms of CPU, I/O, memory, etc. There are tools to help you, and at some point in your development process, you should be using them.

However it can be time consuming and cumbersome to build small development data sets. There are lots of choices in how you might do this, and I thought this would make an interesting poll. For those of you that deal with development, whether that’s T-SQL, .NET, or something else, what do you think?

Should we have a subset of production data, a custom data set, or perhaps deal with complete production data?

Some of this depends on the size of your production data, and I hope, it’s contents. I would not want any PII, PCI, medical, etc. in any development area. However if that’s not the case, then what do you prefer?

Whether you have  custom data set or a subset of production, it can be cumbersome to keep this up to date. Your data may evolve over time and there’s overhead in maintaining some scripts that would produce the data you need. Perhaps that’s the cost of writing good software, but I’m curious how many of you feel.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.

Bronze Age Development

I was watching a presentation on testing recently where the speaker noted that on one project he’d spent twice as much time testing as coding. That sounds like an outrageous amount of time, but my concern was tempered when he said the release to production produced no bugs. I’m sure some bugs surfaced later, but I have often seen that most of the bugs, especially the incredibly annoying ones, are typically discovered quickly.

I was reminded of that presentation when I saw this quote: “…the result was a two-year development process in which only about four months would be spent writing new code. Twice as long would be spent fixing that code.”

That’s a quote on the development of Visual Studio a few years back. I wonder if the “twice as long fixing” time would have been reduced with better testing efforts earlier in development. It’s hard to know since all evidence on the value of testing is based on disparate projects with different teams working at different levels of experience, but I’ve run into a few people that think more testing reduces overall development time.

The consultant who gave the presentation believes strongly in testing, not only at the application level, but also at the database level. This person has tried different levels of testing on different projects, and found that building and writing tests throughout development results in many fewer issues at release. Perhaps more telling is that when the person has performed less testing in later projects (because the clients declined to pay for it), there were more bugs in production.

I don’t know if the total time spent on building software is less with testing occurring early than with allowing clients and customers to test and report bugs. Certainly some of that might depend on how many bugs you fix and how many bugs people must cope with, but I do know that the fewer issues people find with your software, the more excited they are to ask you to write more code in the future.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.

Two in Two Days

It’s a busy week for me. I’ve got quite a few articles to review, feedback to write for Stairway Series authors, review and changes of some PowerPoint decks for later this month, and two User Group presentations.

This is on top of a busy first week of school in the household. I feel like I’m playing catch up all week.

Boulder

I’ll be at the Boulder SQL Server User Group tonight. My presentation will be on Unstructured Data in SQL Server, looking at Filestream and Filetable and how they can be setup and used.

Fortunately I’ve done this before, and a little practice this week was enough to get me ready.

Hopefully I’ll see a few of you there as I haven’t been to Boulder in over a year.

Denver SQL

The Denver SQL Server User Group usually asks me do a presentation or two each year and this time I have a new one. They get to be my guinea pigs for the first delivery of this talk.

Get Testing with tsqlt, a preview of a talk I’ll be doing at SQL in the City, is on the agenda. I’ve been going over this one a few times this week, so hopefully it goes smoothly.

Updating tsqlt

I was looking to write a new test with the tsqlt framework recently. I wanted to isolate a stored procedure’s logic and planned on using the FakeFunction procedure available in tsqlt.

I wrote my test, using a template from the Pluralsight course on tsqlt and the documentation. I tried to execute the test and get a “tsqlt.fakefunction does not exist” error.

I was slightly confused at first, but checking my list of functions and stored procedures showed that I didn’t have the FakeFunction procedure available. It’s a relatively recent addition to tsqlt, so I needed and update.

After downloading the framework (a zip file), I opened it up to find this:

tsqlt_a

A number of files there, but the tsqlt.class.sql is the important one. I double clicked it from the zip and it opened in SSMS.

tsqlt_b

It’s a standard T-SQL script, albeit a long one. I executed it and it ran fine. My framework was updated to the latest version and I now had the function I needed.

tsqlt_c

Of course, I used my test to ensure this worked as expected and I was pleased to see it work well.

Yet Another Attack Vector

There’s a new movie that’s just come out in August. It looks funny, and I’m planning on going to see Let’s Be Cops. I know it’s a movie, it’s not real, but it concerns me, even with a ruling on warrant-less searches of digital devices. I’m sure you think the an arrest wouldn’t breach your digital security, but how much of a stretch is it for someone to impersonate a police officer (it happens), pull over an executive or engineer, and “search” their cell phone. It’s especially possible if most of us expect that the police have the right to look in our devices (they don’t).

What concerns me is that this is another attack vector into our lives, and potentially, into our companies and organizations. We store more and more information, and access, in our digital devices. We use VPNs, and even authentication tokens, but we often store those on our devices because we can’t memorize everything. If someone has control on of our devices, the potentially have access to anything we do.

How hard would it be for someone to access our mail, or some resource through our work VPN? How quickly could determined attackers perform some malicious activity, or worse, copy information that we’d never be aware was lost? It’s not likely, and perhaps it’s far-fetched, but it seems criminals are becoming more and more creative all the time.

I worry about our data, but more importantly, I worry about the rights and privacy of our digital information. I hope we update our expectations and rights to meet the challenges of our digital future.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.2MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.

SQL in the City 2014

It’s official, we have a couple 2014 SQL in the City events scheduled, covering two countries this fall. Both events are packed into a short time (less than two weeks), so Grant and I have quite a bit of travel ahead of us.
 
Once again we’ve divided up the sessions into administration for the DBAs and development topics for the developers, but feel free to cross from one to the other. Our theme is “Ship often, ship safe” and we hope to show you how to build better software, faster. Our goal is to help you get your enhancements and patches into production so your customers can make better use of their applications.
 
London is first, on Oct 24 at Grange St. Paul’s Hotel. The agenda is set with Grant and I delivering a few sessions and a host of Red Gate developers along with some Friends of Red Gate presenting on a variety of topics. We also have a labs you can drop into during the day to gain practical knowledge on how to solve some of your SQL Server problems.
 
A little over a week later, on November 3, 2014, SQL in the City returns to Seattle, with an all day event the Monday before the Pass Summit. We have a similar agenda, though a few different speakers at the McCaw Hall at Seattle Center.
 
We do hope you’ll join us at one of the events, and get a day of training on SQL Server, the Red Gate way. We’ll also have a short happy hour afterwards, and Grant and I would love to share a toast with you. Feel free to stop and chat with either of us at anytime we’re not presenting.

Artist or Scientist

Which are you, an artist or a scientist? If you automate, you’re the latter. If you are a scientist, you can go on vacation. You can be more productive. People can count on you. You get things done quickly, consistently, and reliably. Everyone knows what to expect when you’re done with a task. They can expect things to be completed a certain way.

If you manually run installation programs, click GUIs to configure options from memory, and customize each system you work on, you’re an artist. Artists build works of art, each of them unique. I know some incredibly talented artists working in technology, people who duplicate their work over and over extremely consistently. However at some point they’ll make a mistake, and then I’ll never know what state the system or code is in.

While we need artists to push boundaries and experiment with new techniques, we don’t want them managing production systems or writing production code.  I want production code to use well known and proven techniques, best practices, good error handling, application of standards, logging and more. I want production systems to be stable, not with a lack of change, but with a lack of issues. I need scientists that produce work that can be counted on.

Don’t build works of art. In development you must be an artist at times, but when you solve problems, ensure that the code contains best practices (secure coding and error handling among them), and make sure that your team understands and can reproduce the code later. In production, ensure you learn automation (PoSh, scripting, templates) and can build, or rebuild, your systems quickly and consistently. Deploy your builds to QA and development so that all the environments are the same.

Become more of a scientist and not only will people depend on you, they’ll be less worried when you go on vacation because there will be fewer surprises for the person covering your work.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.

Disconnecting Auditing

We know security is becoming more and more of a topic for IT professionals. As we realy more heavily on our computer systems, we have to be sure that the information contained in them is secure. We know that we can’t necessarily anticipate and protect the applications from every attack, but we can usually detect and respond to incidents. To do that, we need good auditing of all the events that occur.

The problem, in my mind, is that our auditing efforts and implementations are too tightly tied to the administration of our systems. The auditing features must be configured by administrators, who are also often tasked with the review of the auditing data and logs. This is a fundamental problem as it’s entirely possible that an administrator or privileged user might be just the person that will violate security practices. With their rights inside of the computer system, it’s likely that the same person perpetrating the malicious activity would be able to easily cover up or remove any evidence of the incident.

I think that auditing is fundamentally implemented poorly. Auditing features in software, including SQL Server, should be separated out from administration, perhaps even configured and enabled by a separate user or account than the person who administers the system. I would anticipate that a person in the finance or accounting departments at most companies might be responsible for managing the audit data. Even if they were unsure of the meaning of the data, having control over the information would prevent problems with the auditing data being compromised. I could even see auditing services being offered by third parties that interpret or review the data for companies without a dedicated security department.

I doubt we’ll see a disconnect anytime soon, but I do think that the value of auditing is drastically reduced when we don’t have a strong separation of rights, responsibility, and capabilities between auditing and administration.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.5MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

The Voice of the DBA podcast features music by Everyday Jones. No relation, but I stumbled on to them and really like the music. Support this great duo at www.everydayjones.com.

Playing with Pivot

I’ve been working on some skills, trying to grow some of my T-SQL, and started to mess with the PIVOT operator. This is a T-SQL construct that helps you turn row data into column data. It’s part of the SELECT query, and comes about in the FROM clause. There is an article on MSDN to help you understand this.

Note: Performance of PIVOT can be challenging. Jeff Moden has a great piece on Cross Tabs and PIVOT. It is worth a read to understand the limitations and issues with PIVOT.

Setup

I’ve got some sample data I’ve put together. In this case, I have a set of data from the 2013 NFL season, with the scores for the Denver Broncos.

CREATE TABLE Scores
(
Team varchar(3)
, Opponent varchar(3)
, gamedate datetime
, TeamScore int
, Oppscore int 
);
go

INSERT INTO scores (team, opponent, gamedate, teamscore, oppscore)
values
    ( 'DEN', 'BAL', '2013-9-5', 49, 27)
,   ( 'DEN', 'NYG', '2013-9-15', 41, 23)
,   ( 'DEN', 'OAK', '2013-9-23', 37, 21)
,   ( 'DEN', 'PHI', '2013-9-29', 52, 20)
,   ( 'DEN', 'DAL', '2013-10-6', 51, 48)
,   ( 'DEN', 'JAX', '2013-10-13', 35, 19)
,   ( 'DEN', 'IND', '2013-10-20', 33, 39)
,   ( 'DEN', 'WAS', '2013-10-27', 45, 21)
,   ( 'DEN', 'SD', '2013-11-10', 28, 20)
,   ( 'DEN', 'KC', '2013-11-17', 27, 17)
,   ( 'DEN', 'NE', '2013-11-24', 31, 34)
,   ( 'DEN', 'KC', '2013-12-1', 35, 28)
,   ( 'DEN', 'TEN', '2013-12-8', 51, 28)
,   ( 'DEN', 'SD', '2013-12-12', 20, 27)
,   ( 'DEN', 'HOU', '2013-12-22', 37, 13)
,   ( 'DEN', 'OAK', '2013-12-29', 34, 14)
;
go

The Problem

I want to get an average score for the Broncos for each of their opponents. They had 13 opponents in 16 games, with 3 teams being played twice. However, I want to see the data sideways, but I’m only going to show the scores for the opponents with multiple games. In this case, I know the teams are “KC”, “OAK’, and “SD”.

In other words, I want to see:

Team     KC    OAK   SD

DEN      X     Y     Z

I want the real averages for the games in question. I could include the other teams, but let’s leave this alone.

The Query

There are a couple parts to the query. First, there’s the column list, in this case, it’s a SELECT *. We’ll fix this as I dislike the asterisk, but let’s leave that for now. That means we have:

select
    *

Now, we need a FROM clause. The first part of the FROM clause will be my source data. This is a normal select, in this case, we’ have:

select
          team
        , opponent
        , teamscore
        from scores
          results

We need to embed this in parenthesis, so that gives us this:

select
    *
  from
    ( select
          team
        , opponent
        , teamscore
        from scores
          results
    ) as rawdata

So far this is a normal query. Not it’s time to PIVOT things. First we add a PIVOT clause to the end of the statement. The outline looks like this (with our query):

select * from ( select team , opponent , teamscore from scores results ) as rawdata

PIVOT

( aggregate(col) for var in ([col1], [col2], [col3]

) as pivotalias

The PIVOT clause includes one (and only one) aggregate. You choose the aggregate and a column. In my example, I’ll be averaging the scores for the teamscore column.

The next part of the clause is a pivot column, on which we are looing for data. This column must be in the  select list for the first part of the query. Next we have the “IN” clause, in which we list the various values that we are moving from rows to columns.

In this case, I’m looking to move the divisional opponents to columns rather than rows. I’m pivoting on the Opponent column and looking for the AFC West opponents. Those teams are OAK, SD, and KC. That makes my query look like:

select * from ( select team , opponent , teamscore from scores results ) as rawdata

pivot

( avg(teamscore) for [Opponent] in ( [KC], [OAK], [SD] )

) as pivotresults;

This will give me these results:

team   KC   OAK  SD

DEN    31   35   24

I have the average scores for opponents as columns, not rows.

Technically, I should include DEN as an opponent so I can actually build a proper grid. If I do that, I get:

select
    *
  from
    ( select
          team
        , opponent
        , teamscore
        from
          scores results
    ) as rawdata
  pivot 
   ( avg(teamscore) for [Opponent] in ( [KC], [OAK], [SD], [DEN] ) 
   ) as pivotresults;

And these results.

team   KC   OAK  SD  DEN

DEN    31   35   24  NULL

Note that this gives me a NULL. I can handle that with an ISNULL, COALESCE, etc.

If I add more data for the other teams, then I’ll get this:

INSERT INTO scores (team, opponent, gamedate, teamscore, oppscore)
values
    ( 'KC', 'DEN', '2013-9-5', 49, 27)
,   ( 'KC', 'SD', '2013-11-24', 38, 41)
,   ( 'OAK', 'DEN', '2013-9-23', 21, 37)
,   ( 'KC', 'DEN', '2013-9-29', 52, 20)
,   ( 'KC', 'SD', '2013-12-29', 24, 27)
,   ( 'KC', 'OAK', '2013-10-13', 24, 7)
,   ( 'KC', 'OAK', '2013-12-15', 56, 31)
,   ( 'OAK', 'DEN', '2013-10-27', 45, 21)
,   ( 'OAK', 'SD', '2013-10-6', 27, 17)
,   ( 'OAK', 'SD', '2013-10-27', 45, 21)
,   ( 'OAK', 'KC', '2013-10-13', 7, 24)
,   ( 'OAK', 'KC', '2013-12-15', 31, 56)
,   ( 'SD', 'DEN', '2013-11-10', 20, 28)
,   ( 'SD', 'DEN', '2013-11-10', 20, 28)
,   ( 'SD', 'OAK', '2013-10-6', 17, 27)
,   ( 'SD', 'OAK', '2013-11-10', 20, 28)
,   ( 'SD', 'KC', '2013-11-24', 41, 38)
,   ( 'SD', 'KC', '2013-12-29', 27, 24)
;
go

And these results from the same query:

pivot_a

That’s pretty cool, though I’d argue that the data isn’t valuable for many people unless you’re trying to bet or analyze the performance of teams against each other.

As I mentioned at the top, this isn’t necessarily the best way to move data. Performance can be an issue, however there are people that find this to be an easier way to write code and understand what it’s doing.

I might argue that a cross tab can be just as easy to code and maintain, but you may feel the differently.

Follow

Get every new post delivered to your Inbox.

Join 4,615 other followers