The Digital Woes of Public Records

Researcher Chris Vickery, discovered that 191 million voter records are available to the public. The information was in a database on the Internet, which seemed to be a collection of voter records and information compiled from public sources in various US states. While no Social Security numbers were present, names and dates of birth, as well as address and voting scores were in the database.

That’s scary, though potentially not a problem. A number of states publish voter data as public records. A few might have restrictions on the use of that data, but the fact that the data is available means it could be used maliciously, with overburdened authorities unlikely to prosecute anyone even if they’re caught.

This is one of those areas where our understanding and control of data hasn’t caught up to the digital age. It is one thing when public data is available to those that must physically search for it, or even query for singleton records. However data can reveal much more information, or even be used in new ways when large volumes of it is available. Now the ability to access every voter’s name, address, and date of birth could potentially be a problem.

I see so much data that we might have taken for granted in the past, thinking nothing of it’s visibility, being a problem in the future. When someone can gather large amounts of data, and store is cheaply, even accessible in something like a data lake, we may find that public data is problematic. When anyone can start to gather and combine lots of data from different sources, we might find that capability quite scary as potentially lots of information about individuals can be determined. We’ve seen anonymous data sets de-anonymized with the application merge of data from different sources.

I truly hope that we find ways to better protect and ensure privacy in the future, as all the capabilities and power that computing brings to data analysis truly has a dark side.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.4MB) podcast or subscribe to the feed at iTunes and LibSyn.

Creating a User Without a Login

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

This is one of those simple things that many people should be able to do in order to build in better security for their database code. However I’m sure many people haven’t ever built one. I ran into this recently, and thought it would be a great SQLNewBlogger post.

Creating the User

It’s a bit anticlimactic, but creating a user in a database without a login is simple:

CREATE USER UpdateStatsUser
WITHOUT LOGIN WITH DEFAULT_SCHEMA = [dbo];
GO

This creates a new user in my database,

2016-01-25 13_38_56-Start

that is not associated wtih a login.

2016-01-25 13_39_13-Netflix

I can assign this user permissions, like any other user.

GRANT CONTROL ON dbo.Authors TO UpdateStatsUser;

I could then use these permissions any other way.

Why would you do this?

A short summary from Pinal Dave, which is pretty good. Essentially you want to give different permissions to a user, without using something like an Application Role, which has a name and password that must be managed. Here, you can allow a user to execute a routine as another, more privileged user, without giving the original user additional permissions.

SQLNewBlogger

This is a really simple post that took my longer to write than create the user. About 10 minutes. I wouldn’t expect most of you to stop here. I’d want a post that shows you understand something about how this user can be used, show me an example of reading or writing a table as a user with this impersonation in action.

References

CREATE USER – https://msdn.microsoft.com/en-us/library/ms173463.aspx

Culture and Performance

Most people in management seem to believe that culture matters in a company. I know some don’t, and I’ve worked for a few of those people, whichi s never an enjoyable experience. As the world seems to change to more and more knowledge work for people in technology, it seems that businesses are starting to realize that the way their employees feel about the company can have a direct impact on the company’s bottom line.

There’s an article about culture and motivation in the Harvard Business Review that I think does a good job of looking at how well people perform when they have various motivations. The authors talk about the six reasons why people work, each of which can drive motivation in a different way. Some are positive motivators, some are negative, and it’s good to be aware of the differences.

This ties into culture in that the way your organization is built. The culture that pervades the company can really determine how employees are motivated. More negative motivators result in less performance, especially creative performance, from employees.

I don’t think that building a great team and getting the most from people is necessarily this simple. Different people respond differently to a culture, and the same person might even respond differently at different times in their employment. However I do think that you can look to adjust the way you fit each employee in, with the work you assign, the support you give, and the demands that you make on them.

The mark of a good manager is that they find ways to treat each employee differently, in a way that suits them best, while maintaining a core set of values and rules for the entire organization.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.0MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Who’s Got a Preteen SQL Server?

This year we will have the end of extended support for SQL Server 2005. Microsoft has been reminding people they should upgrade, which may be important if your organization requires support for the running software. Personally I think if you’re still running SQL Server 2005, and it works for you, why upgrade?

It’s been ten years since that version was released, and while it was a huge step forward with the SQL CLR, better mail, and more, it had plenty of issues. Since that time we’ve had 4 major releases with a fifth coming this year. All of those versions have built on the base of SQL Server 2005’s completely rewritten codebase and have improved many features.

Recently I got a note from someone that noted that they have been upgrading their SQL 2005 systems over the last few years (along with SQL 2008 instances) and have made great progress. however they still have a few SQL 2005 instances that they are racing to upgrade. This person was wondering how many others are out there facing similar struggles.

So this week I’m wondering. How many of you have those instances growing up and getting close to double digit ages? Are you working to upgrade SQL 2005 instances? Have you been thinking about those 2008 instances who are already out of mainstream support and will leave extended support in a few years?

Let us know how you think about your aging database software.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.0MB) podcast or subscribe to the feed at iTunes and LibSyn. 

The New Operating System

I often see pieces written about startup companies and the methods they use to develop software. Small companies often adapt quickly to new ideas and processes. However, many of us work in larger, more established corporations and don’t get the flexibility to change our processes. In fact, for many of us, the idea of developing software faster doesn’t even seem to make sense. Most of us in large organizations (private or public) have fundamentally different views of our clients and market.

However I read an interesting piece on adaptive, more flexible, startup-type companies that does seem to indicate that many of our larger organizations might want to rethink how they work. The piece looks at an operating system as a way of building products or services. What’s most fascinating to me is that this piece looks at the way in which large, established companies have failed while smaller, younger companies that operate differently have succeeded.

In many ways, software is eating the world. As playing fields level in many industries because of both cheap computing power and complex software, I suspect more and more organizations will feel pressure to become more adaptive. I would especially think this will happen as new, younger managers mature, with new ways of thinking. Those managers that haven’t spent decades working in a factory mentality will see the power and advantages of trusting people, setting a vision, and then allowing people to build software that does the work.

However there are plenty of organizations where you have a wide range of skills, and of employee motivation. There are certainly employees that don’t want to innovate and just want to do the job they’ve been doing. There will always be a need for some grunt style work (though maybe less in technology). We certainly need to adapt any system to ensure those employees that need more supervision to transition to new ways of working get that help.

I expect that we’ll also see more public groups adapt these techniques as well. There is increasing pressure on reducing costs, as well as the need to accomplish more and more, perhaps with smaller groups of people. In many cases this means that software should do more. I would hope that more public organizations adapt their processes in order to meet the demands they face. Those that have, already stand out.

It’s going to be an interesting world the next decade to two, and I really do think we will see more companies adapting the way they work to something that leans towards embracing and allowing creativity and innovation more than following rules by rote.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.8MB) podcast or subscribe to the feed at iTunes and LibSyn. 

Loading a Text File with a Line Feed

Loading text files is a skill that probably every DBA needs. I know that the import wizard is available, but there are times that you might want to automate this without using SSIS. In those cases, it’s nice to know how to load data from a programmatic standpoint.

I had the need to do this recently with a text file that looked normal. When I opened it in my editor, it looked like a normal text file, one column of data.

2016-01-28 17_44_07-Start

I thought this would be easy to load, so I created a simple table:

CREATE TABLE MyTable ( teststring VARCHAR(100))

I then ran a simple BULK INSERT command.

BULK insert MyTable
     from ‘C:\SampleFiles\input.txt’

And I received this:

Msg 4863, Level 16, State 1, Line 3
Bulk load data conversion error (truncation) for row 1, column 1 (mychar).

That’s not good. I suspected this was because of the format of the file, so I added a row terminator.

BULK insert MyTable
     from ‘C:\SampleFiles\input.txt’
with ( ROWTERMINATOR = ‘\r’)

That didn’t help. I suspected this was because of the terminators for some reason. I also tried the newline (\n) terminator, and both, but nothing worked.

Since I was worried about formatting, I decided to look at the file. My first choice here is XVI32, and when I opened the file, I could see that only a line feed (0x0A) was used.

2016-01-28 15_43_19-Settings

However, I wasn’t sure how to get this in my code.

I tried CHAR(), and that didn’t work.

2016-01-28 17_58_47-Cortana

I could look to edit the code with XVI32, but that seems odd. However, let’s try that.

2016-01-28 17_57_28-Settings

I replaced the \r with 0x0A and then deleted the r. Once I saved this, and reloaded into SSMS (do not normalize to CRLF), I could run this.

2016-01-28 17_58_09-Start

I suppose I could also do this with the ALT key, and a number pad, though I couldn’t get that to work on my laptop. I need to try that on my desktop, but it’s not a great way to code. Easy to forget that characters are in the code.

I tried searching a bit and found that SQLDenis had a solution. He used dynamic SQL, but with a little formatting, the code is still easy to read, and this works fine.

DECLARE @cmd VARCHAR(8000)
SELECT @cmd = ‘BULK insert Mytable
                from ”C:\SampleFiles\input.txt”
                with ( ROWTERMINATOR = ”’ + Char(10) + ”’)’
EXEC(@cmd)

I executed this and loaded my file just fine.

It’s not often you might need to do this, but it’s a handy little trick for those files that might be formatted from other OSes.

Queues in Databases

I ran across this post, The Database as a Queue Anti-Pattern, and it struck a chord. I both agree and disagree with the premise, and certainly see this as another case of the “it depends.” However, I am curious what others think.

I had just written about this for T-SQL Tuesday #74, talking about our use of the database for queuing emails as a part of the site. It was a high volume system that worked for us. We were, of course, looking at SQL Server as a hammer, and in 2002/2003, there weren’t many messaging systems available at a reasonable cost. These days it seems that Amazon SQS or Azure Service Bus are low cost ways to move messages around, but are they better than a database?

In our case, cost mattered, knowledge and skill mattered, though certainly we could have learned how to work with a messaging system. However we also had the need to feedback potential status and progress information. We also wanted the ability to re-point emails to another process if one failed. For example, if SenderB fails after having retrieved an email, we can update the table and have SenderA pick up the row(s) that haven’t been processed.

I do believe that messaging is an extremely powerful architecture technique for for distributing work as well as scaling to high workloads. I’d like to see more messaging implemented in applications, even those that aren’t Twitter scale. After all, if I exceed the power of a 4 core machine for my small application, it’s much easier for me to add another 4 core machine that can deal with messages than upgrade to an 8 core machine and just delay the scale problem to another day.

I’m curious what the rest of you think. Is the database a good platform for a queue?

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.3MB) podcast or subscribe to the feed at iTunes and LibSyn.

Maybe Security is Harder than Rocket Science

I was giving a talk last year about software development and I made an off-hand remark that most of developers out there shouldn’t be writing authentication schemes for applications. My reasoning was that most people aren’t good at writing these systems and there are plenty of good authentication schemes already written that you can incorporate into a system.

However there are some bad ones as well. While I hope you don’t pick any of them, I also know that many of you might just build a poorly architected system because your focus isn’t on authentication. Your focus is on some other aspect of your application. I’m sure you know some of the good advice for building systems, as well as the best ways to handle passwords, but do you follow it? Under deadlines? When you have pressure to focus on more important aspects of your system? Or do you implement anti-patterns because it’s easy?

The European Space Agency (ESA) is full of rocket scientists. Literally, as they send rockets and astronauts into orbit around the earth. However they were hacked recently and the disclosures aren’t pretty. They not only had personal information released, but passwords were stored in plain text. What’s worse, 39% of the passwords were three letters.

Three.

I’m sure many of the people working on ESA systems were smart individuals, and they may be great web developers that build beautiful, useful sites. However their security programming is extremely poor, and really, there’s no excuse. Not even the pressure of scientists that want simple, easy logins.

It’s 2016. No short passwords, no limitations on complexity such as preventing special characters (one site recently didn’t allow a “,” for me), and no storage in a reversible format. There are lots of best practices, but they require some effort to learn, understand, and implement, as well as modification over time to keep up with changing guidelines.

Or, as I suggested, just stop implementing this yourself. Use some authentication scheme that’s been shown to work well with few vulnerabilities.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

Loading a Text File from T-SQL

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

One of the interesting things I’ve had to work on with the Advent of Code puzzles is loading files into SQL Server. Some of the inputs are large strings, but many are files with lines of code that need to be loaded into SQL Server.

I thought this might be a nice, simple SQLNewBlogger post. Do you know how to load a text file? Certainly the Import/Export wizard can work, but can you quickly load a file from T-SQL itself?

If you can’t, go work that out. If you get stuck, come back or search for help.

Loading a Text File

Obviously you need a place to load the file. I created a table for each puzzle, and here is the table for Day 2.

create table Day2_WrappingPresents
( dimensions varchar(12)
)
go

Now ordering doesn’t matter for this puzzle, so I have a very simple table. If ordering mattered, I’d have to do this differently.

To load this file, I’ll use the BULK INSERT command. This takes a table as a target, and optionally has a number of parameters.  Since this is a simple load of a simple file with one column of data to a table with one column of data, I can use the defaults.

bulk insert Day2_WrappingPresents
from ‘C:\Users\Steve\Documents\GitHub\AdventofCode\Day 2 – Wrapping\input.txt’

In this case, the insert will load all 1000 rows into the table. A simple query shows this works:

 

Now I can get on with the rest of my puzzle solution.

SQLNewBlogger

This is a great example of a simple thing that we might not need to do often, but we may need to do at times. Knowing how to do this, a simple operation, showcases that you are improving your SQL Server skills. This post took me about 5 minutes to write.

What is the True Version of Code?

As I’ve been working on more development tasks, I regularly hear from developers that a version control system (VCS) is the “true” version of our code. We should always look to a VCS for the actual code that we need to work with. I believe that, and I think it’s true.

However I heard a DBA say that the true version of the code is always in production. That’s what’s being executed, especially in the database world. There is some truth here as well, but in my mind this speaks to a broken process. This means we have a more chaotic, and less “engineered” way of producing our software. Perhaps more importantly, we don’t have a good source from which development can proceeed.

Certainly hot fixes need to occur, and there are times that you can’t wait for a set of changes to be made in development, tested, approved, and then deployed to production. I think you can overcome most of this, but there may always be cases where this happens, so I don’t think we should  argue about how fast a software deployment pipeline we can build.

Instead I’d say that production changes should always be fed back to the VCS. Our code is an ever changing set of files that we can only really keep track of by using a proper VCS. We should be able to track back the view of production at any point in time to a set of versions for our files in a VCS.

Certainly some of you have gotten by for years, perhaps your entire career, without a VCS. However I’d argue that isn’t a good position for your organization. What habits and skills you have could easily be lost if you leave, and your memory isn’t a point of record. A stable system, a VCS, should note the state of our systems, both in development and production.

And if you’re worried about the cost of a VCS, there are plenty of free (as in beer) ones. I recommend Git, but there are plenty to choose from. In terms of overhead, certainly you can use tools like SQL Source Control or ReadyRoll, but even without those tools, using a VCS really just requires building some habits. If you aren’t willing to change your workflow slightly, there’s probably no way you will ever ensure your environment is under control in the long run. However, if you do change to a VCS, I think you’ll find it’s well worth the effort.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and LibSyn.