Loading a Text File with a Line Feed

Loading text files is a skill that probably every DBA needs. I know that the import wizard is available, but there are times that you might want to automate this without using SSIS. In those cases, it’s nice to know how to load data from a programmatic standpoint.

I had the need to do this recently with a text file that looked normal. When I opened it in my editor, it looked like a normal text file, one column of data.

2016-01-28 17_44_07-Start

I thought this would be easy to load, so I created a simple table:

CREATE TABLE MyTable ( teststring VARCHAR(100))

I then ran a simple BULK INSERT command.

BULK insert MyTable
     from ‘C:\SampleFiles\input.txt’

And I received this:

Msg 4863, Level 16, State 1, Line 3
Bulk load data conversion error (truncation) for row 1, column 1 (mychar).

That’s not good. I suspected this was because of the format of the file, so I added a row terminator.

BULK insert MyTable
     from ‘C:\SampleFiles\input.txt’
with ( ROWTERMINATOR = ‘\r’)

That didn’t help. I suspected this was because of the terminators for some reason. I also tried the newline (\n) terminator, and both, but nothing worked.

Since I was worried about formatting, I decided to look at the file. My first choice here is XVI32, and when I opened the file, I could see that only a line feed (0x0A) was used.

2016-01-28 15_43_19-Settings

However, I wasn’t sure how to get this in my code.

I tried CHAR(), and that didn’t work.

2016-01-28 17_58_47-Cortana

I could look to edit the code with XVI32, but that seems odd. However, let’s try that.

2016-01-28 17_57_28-Settings

I replaced the \r with 0x0A and then deleted the r. Once I saved this, and reloaded into SSMS (do not normalize to CRLF), I could run this.

2016-01-28 17_58_09-Start

I suppose I could also do this with the ALT key, and a number pad, though I couldn’t get that to work on my laptop. I need to try that on my desktop, but it’s not a great way to code. Easy to forget that characters are in the code.

I tried searching a bit and found that SQLDenis had a solution. He used dynamic SQL, but with a little formatting, the code is still easy to read, and this works fine.

DECLARE @cmd VARCHAR(8000)
SELECT @cmd = ‘BULK insert Mytable
                from ”C:\SampleFiles\input.txt”
                with ( ROWTERMINATOR = ”’ + Char(10) + ”’)’
EXEC(@cmd)

I executed this and loaded my file just fine.

It’s not often you might need to do this, but it’s a handy little trick for those files that might be formatted from other OSes.

Loading a Text File from T-SQL

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

One of the interesting things I’ve had to work on with the Advent of Code puzzles is loading files into SQL Server. Some of the inputs are large strings, but many are files with lines of code that need to be loaded into SQL Server.

I thought this might be a nice, simple SQLNewBlogger post. Do you know how to load a text file? Certainly the Import/Export wizard can work, but can you quickly load a file from T-SQL itself?

If you can’t, go work that out. If you get stuck, come back or search for help.

Loading a Text File

Obviously you need a place to load the file. I created a table for each puzzle, and here is the table for Day 2.

create table Day2_WrappingPresents
( dimensions varchar(12)
)
go

Now ordering doesn’t matter for this puzzle, so I have a very simple table. If ordering mattered, I’d have to do this differently.

To load this file, I’ll use the BULK INSERT command. This takes a table as a target, and optionally has a number of parameters.  Since this is a simple load of a simple file with one column of data to a table with one column of data, I can use the defaults.

bulk insert Day2_WrappingPresents
from ‘C:\Users\Steve\Documents\GitHub\AdventofCode\Day 2 – Wrapping\input.txt’

In this case, the insert will load all 1000 rows into the table. A simple query shows this works:

 

Now I can get on with the rest of my puzzle solution.

SQLNewBlogger

This is a great example of a simple thing that we might not need to do often, but we may need to do at times. Knowing how to do this, a simple operation, showcases that you are improving your SQL Server skills. This post took me about 5 minutes to write.

Creating a Database Snapshot

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I’ve rarely dealt with database snapshots, outside of lab experimentation. They didn’t exist when I did most of my DBA work, and since then we haven’t seen the need for them at SQLServerCentral, though, I may suggest we add them to our deployment process since we can quickly roll back if something breaks.

However, I created one recently for a test and realized that I’d forgotten the syntax. Unlike the quick "create database xx" syntax I often use, with a snapshot I need to be more specific.

The big item that I must specify is the location of the snapshot file. This is the file that gets written to with the copy-on-write process that ensures the original state of the pages when the snapshot is created are still available.

You do need to give the database a logical name as well, which can be anything, but the reference below has suggestions. I’d say that this is bad:

CREATE DATABASE sandbox_snapshot
ON (NAME = MySnap
, FILENAME = ‘h:\SQL Server\MySnap.ss’
)
AS SNAPSHOT OF Sandbox

But this is better

CREATE DATABASE sandbox_snapshot_20150122_1345
ON (NAME = SandBox_Snap_20150122
, FILENAME = ‘h:\SQL Server\MySnap.ss’
)
AS SNAPSHOT OF Sandbox

Because the snapshot is based on an existing database, at a particular point in time, it’s useful to specify the time when the snapshot was created, if possible. While you can get this from metadata, if you have people that look to multiple snapshots for information, it can be handy to know when each is from.

However if you are providing this on a schedule, like daily, for people to report from, you might need to have the same name every day. Think about this, as you cannot rename a snapshot once it’s created.

SQLNewBlogger

When I realized I had to lookup the syntax, I took a few notes and captured code, which meant I combined this writing (< 10 minutes) with other work I was doing.

You should do the same. When you tackle something new, take screenshots, save code, and drop it in a OneNote/EverNote/etc notebook for your weekly blog writing.

Reference

The references I used:

Push Button Deployments

I worked in an Agile team in 2000 and we deployed changes to our web platform every Wednesday for over 18 months. That was about the length of time I was involved before the small startup failed, but not for lack of technology. In fact, I managed a few weeks extra pay by helping sell and deploy our IP to another company.

Deploying regularly and often, to meet business needs and get features into the hands of customers is important. For most businesses, this is something the CTO and CEO want. It can work, and there’s a case study with Nmbrs, a Redgate customer, that has used DLM with this quote:

“The productivity of the team has improved dramatically. We were doing one database update a week, which took almost a full day. Now it takes ten minutes. We can spend the extra time we’ve gained developing new features.”

That’s been my experience with tooling and automation. If you get things setup, as I did with my startup, then you gain confidence in deployments, are willing to make them more often, and get these small chunks of software to customers on a regular basis. It’s not just customers, as we found we could make some fundamental changes in databases and applications that helped smooth development later without affecting customers.

I know I’m paid by Redgate to talk about DLM, but the concept, the idea, the processes really do work well. They’ll help you get features to customers quicker (and feedback to developers faster) as well as give you confidence that you can release, if you find an issue with your application.

Advent of Code–Day 2

I’ve continued working along, and while I found Day 2 to be straightforward in Python and PowerShell, I thought it was tricky in SQL I decided this one was worth a post, since I had to futz around a few times to solve it, and I managed a slightly different way than some others.

If you haven’t solved it, then try. Come back here later and compare solutions, but give it a try first.

 

Solution coming below, so don’t scroll if you don’t need the solution.

 

 

 

 

 

But first,

 

 

 

 

 

Missteps

I had a misstep in this one. I loaded the entire list of packages as separate lines into separate rows into a single column table. My challenge to myself was not to use ETL work to break this apart, or updates. I wanted a simple solution, thinking I didn’t want to take up extra space in the database.

As a result, I wanted a single query from a single string column that had the package size stored as one column, ‘2x3x4’ as an example.

My first attempt used the Moden Splitter function, which seemed to work well. I got three rows for each package. I then used a WIndow function to grab that data, order by the sizes, and then start performing calculations. When I didn’t get the right result, I started digging in.

One of the first things I saw was that I had multple packages with the same sizes. So I had two 22x3x1 packages, and when I used a partition based on the dimensions, I had calculation problems. That’s because the window partition doesn’t know that three rows are one package and three are another.

I could have fixed this with some other value to capture the package, maybe a row_number even, but I decided not to go down this route.

 

 

 

 

My Solution

I decided to break this down, and I used a series of CTEs to do this. I haven’t gone back to optimize things, or combine CTEs, which is possible, but instead left the CTEs as I wrote them to solve parts of the puzzle. Multiple CTEs are east, and they help examine the problem in pieces.

My first step was to parse the string. I don’t love this solution as it is limited to a three dimension package, but it does seem to be the easiest way to break down the dimensions of the package. My query looks to find the string positions for:

  • end of the first dimension
  • start of the second dimension
  • start of the third dimension.

This gives me the simple query:

with cteSplit (d, el, sw, sh)
as
(
select
   dimensions
, endlength = charindex(‘x’, dimensions) – 1
, startwidth = charindex(‘x’, substring(dimensions, charindex(‘x’, dimensions),20)) + charindex(‘x’, dimensions)
, startheight = len(dimensions) – charindex(‘x’, reverse(dimensions))  + 2
from day2_wrappingpresents d
)

Once I had these values, a little math gives me the length, width, and height.

, cteDimensions
as
(select
   d
   , l = cast(substring(d, 1, el) as int)
   , w = cast(substring(d, sw, sh-sw-1) as int)
   , h = cast(substring(d, sh, len(d)) as int)
from cteSplit d
)

Now I’m in business. These two queries were fairly simple, despite all the nested functions. I’ve got integers with the dimensions of each package.

Now the tricky part. I want these ordered. They’re columns, not rows, and I can’t put an ORDER BY in the CTE, so I need to use some comparisons.

, cteOrder
as
( select
   d
, small = case
            when l <= w and l <= h then l
            when w <= l and w <= h then w
            when h <= l and h <= w then h
        end
, middle = case
            when (l >= w and l <= h) or (l <= w and l >= h) then l
            when (w >= l and w <= h) or (w <= l and w >= h) then w
            when (h >= l and h <= w) or (h <= l and h >= w) then h
        end
, large = case
            when l >= w and l >= h then l
            when w >= l and w >= h then w
            when h >= l and h >= w then h
        end
  from cteDimensions
)

Not the prettiest code, and perhaps there are better ways to determine this, but this passed all my tests, and seemed to work.

I could have put the next part in the final query, but I decided to make this a separate CTE to easily read the math. I know some people don’t like lots of CTEs, but in this case, I think they make the query very readable. I should look back at this in six months and see what I think.

, cteFinal
as
(
select
  d
  , area = (2 * small * middle) +
           (2 * small * large) +
           (2 * middle * large)
  , slack = (small * middle)
from cteOrder
)

Now I use a final outer query to sum things up.

select
sum(area + slack)
from cteFinal

The other thing I noticed here is that when I needed to solve the second part, I only had to change the math in the cteFinal to get the new values. It took longer to re-read the second part than to change the code and solve it.

I looked over how Wayne Sheffield and Andy Warren solved this in T-SQL, and I thought their approaches were interesting. I didn’t want to PIVOT or UNPIVOT anywhere, nor did I look at performance here. This runs so quickly, I’m not sure it matters, though I wonder if we were calculating across 1mm rows, would one be better?

I may look, but for now, I’ll leave that to someone else.

Disabling SQL Server Network Protocols

Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.

I ran across a question on network protocols recently, which is something I rarely deal with. Often the default setup for SQL Server is fine, but there are certainly times you should add or remove network connectivity according to your environment.

Here’s a short post on turning off (or on) a network protocol for SQL Sever.

What’s Enabled?

The easiest way to verify what’s enabled is to use the SQL Server Configuration Manager. You’ll need administrative permissions on the host to run this, but it’s easy to find.

2016-01-13 14_59_40-Start

Once you open it, typically you’ll have a list of the items that can be configured.

2016-01-13 15_02_09-Photos

We want the SQL Server Network Configuration, which is the server level configuration for this host. The Client configurations are for the host being used a client to connect to a SQL Server.

2016-01-13 15_02_31-Photos

As you can see here, I have Shared Memory and TCP/IP enabled for this instance, but Named Pipes disabled.

Disabling a Protocol

As you might expect, this is easy. I right click on a protocol, and I can change the status. In this case, I’ll disable Shared Memory

2016-01-13 15_03_50-Photos

Once I do that, the protocol is disabled. However not on the instance. I’ll get this message.

2016-01-13 15_04_56-Photos

I need to restart the server. Once that’s done, no one will be able to use Shared Memory on the host.

I can fix this

2016-01-13 15_04_49-Photos

Of course, I need to restart my instance again.

Checking the Log

When SQL Server starts, quite a bit of configuration information is written into the log. This is useful for troubleshooting in many cases. One of the things you’ll find is the network listeners, as shown here.

2016-01-13 15_08_14-Log File Viewer - JollyGreenGiant_SQL2016

This is usually after the database startup information, so if you look, you can see I have some local pipes and some TCP/IP settings here.

SQLNewBlogger

After reading a question, this was less than 10 minutes to write, with making screenshots. However I’ve done this before. If this was your first time, then it might take you longer to research and write, but I bet most DBAs could do this in 30-45 minutes.

Chips and Margaritas at SQL Saturday #461

It was a few years back that my wife and I planned a trip to the Austin City Lights music festival to see Stevie Wonder. This was a bucket list item for us, and we thoroughly enjoyed a long weekend in the city. I’ve been for SQL in the City as well, and each time I’ve gone, I’ve enjoyed a lunch at one of the Mexican restaurants South of the river. My kids always joke that my wife and I will fill up on chips and margaritas at Mexican restaurants, and it’s somewhat true. Often dinner becomes lunch the next day.

It’s just two weeks to SQL Saturday #461 in Austin, and I’m looking forward to going back. In fact, I’m going to make it a point to at least go get chips and a drink at the same place. I can’t remember the name, but I know how to get there, so I’ll be fine.

However the main event is the free, one day SQL Saturday event taking place. I’ll be there delivering my Branding for a Dream Job presentation, but there are tons of other great talks. From AlwaysOn to Power BI to Azure to Writing Faster Queries, you’ll have a great day of learning on all sorts of SQL Server topics.

If you’re anywhere near Austin, register today and come join us in Austin for a fun day that kicks of my SQL Saturday 2016 tour.

Rebuilding a Heap–Don’t Do It

I saw someone mention recently that you can run a REBUILD on a heap table. I hadn’t realized that, but when I looked in BOL, I saw that indeed you can run this on a heap. That’s interesting, in that you can now move the pages in a heap around to remove fragmentation. At first glance that sounds good, but I wasn’t sure.  A little more searching was required.

The authoritative source for me on many things like this is SQLskills, and sure enough, Paul Randal has a myth piece on this. Paul has a great explanation, but basically if you rebuild the HEAP structure, you are creating work in that every non-clustered index also has to be rebuilt. Why? The pointers back to the heap pages, which are locations, will change.

What about adding a clustered index and dropping it? Nooooooo, and again, I learned something new. This causes two rebuilds of the non-clustered indexes as they are rebuilt with the cluster addition and then rebuilt when the table changes back to a heap (to get the heap locations). That’s crazy, and certainly not what we want.

The short answer here is that you want a clustered index, for the maintenance reasons here, as well as others. If you don’t have a fundamental reason not to create a clustered index, just add one.

Just do it.

T-SQL Tuesday #74–The Changes

It’s T-SQL Tuesday time, the monthly blog party on the second Tuesday of the month. This is the first T-SQL Tuesday of 2016, and all you have to do to participate is write a blog post. It’s supposed to be today, but write one anytime you can and look for the invitation next month.

I’ve got a list of topics here, but watch the #tsql2sday tag on Twitter.

This month’s topic comes from @SLQSoldier, Robert Davis. The topic is Be the Change, and it’s a good one.

Quick Changes

I’m going to write about SQLServerCentral here. Years ago we were updating our email system to send a high volume of email in two ways. At the time we’d considered purchasing software from others, but found the cost to be significant at our volumes (5-6 figures a year). Instead we needed to handle emails stored in our SQL Server database in two ways:

  • Thousands of bulk emails sent overnight, as quickly as possible
  • Quick, high priority emails sent in response to actions

These two conflicting requirements meant that a simple queue of emails to send wasn’t easy for us to design around. We also needed to deal with the issues of scaling, so we wanted to have mutliple separate machines that could help spread the load.  We were building a small .NET process that would run every minute and send a series of emails.

Our design process led us to the need to build in priority levels into our table. We couldn’t think of more priorities, but we allowed for them with a smallint. Our bulk emails were inserted with a priority of 2, and the registration emails, forum notes, etc, were stored with priority 1.

Once we had a separaton of emails, we needed a way to determine what was sent already. To do this, we used a NULL date for the sending date. This allowed each process to determine when new information had been inserted into the table, and needed to be processed.

This worked well for a single machine. The process would:

  • query for xx priority 1 emails
  • send priority 1 emails
  • update sent priority 1 emails with the sent date/time.
  • query for yy priority 2 emails
  • send priority 2 emails
  • update priority 2 emails with sent date/time.

The updates actually occurred for each email sent, so we could easily track the time/order of sends for troubleshooting purposes. We would query a few hundred emails each minute, let’s say 500, knowing that was the rate at which we could send emails. We wanted all priority 1 emails to go, so our value for yy would be  500 – xx.

As we worked to scale things out, we also needed to track what items were queried by which client. Our solution here was to add a machine name to the data, which was blank when emails were inserted, but would be updated by a client with its name as it queried rows. Since we were looking to determine which emails to send, we’d update xx rows with the name of a client process and then query back those rows. The query used the sent date of NULL with the client name to get the correct rows.

Using a combination of the date sent, the client name, and the priority, we could easily manage detecting and working with changes to this table and build a high volume queue that worked extremely well on SQL Server 2000, and all versions since.

Hash Tables in PowerShell–Advent of Code Day 3

I continue to work on solving the Advent of Code puzzles in both PowerShell and T-SQL after completing them in Python.

When I hit day 3 in PowerShell, it was a few new tricks to learn, one of which was reading a large string from a file. However the interseting thing for me was learning to work with hash tables.

The Day 3 puzzle looks at moving Santa around on a grid. We don’t know the size or shape of the grid, just that we get directions in 1 of 4 ways (^<>V) and then must move to a new house. The first puzzle asks how many houses get one present.

This is interesting, and naturally to me the first thing that occurs is a dictionary. I used one in Python, adding new elements as I received new coordinates. I wasn’t sure how to do this in PowerShell, and ended up searching a bit about two dimensional arrays, thinking I could perform a count there, adding indexes as needed. However I ran into hash tables while searching, and this was a better solution for me.

The short part of working with hash tables is that you declare them with a simple command. In my case, since I had a delivery at coordinates (0,0), I wrote this:

$houses = @{“0,0” = 1}

This uses the @{} syntax to set up a key value pair. Essentially a dictionary. From here, I computed new “keys” from the directions, and could add new values like this:

$houses.Add(“1,1”, 1)

Of course, I had to check for existing values, and if there were existing values, I had to increment them.

$houses.Set_Item(“1,1”, $houses[“1,1”] + 1)

With that code, I could easily solve the puzzle. However I was struck by the various ways I work with the hash tables. I use braces, {}, to declare the table. I use brackets, [], to access elements and then parenthesis, (), when calling methods. All of that makes programming sense, but it’s something to keep in mind, especially as those three marks mean different things in Python.

I also learned how to search with ContainsKey and ContainsValue, as well as how to Set_Item and Get_Item, which didn’t appear to work with the ISE Intellisense.

All in all, it was interesting working with a hash table and good to learn PowerShell supports them. They are very handy constructs when building up a set of data that you’ll need to work with, and you need more than the simple buckets an array provides.