I’ve continued working along, and while I found Day 2 to be straightforward in Python and PowerShell, I thought it was tricky in SQL I decided this one was worth a post, since I had to futz around a few times to solve it, and I managed a slightly different way than some others.
If you haven’t solved it, then try. Come back here later and compare solutions, but give it a try first.
Solution coming below, so don’t scroll if you don’t need the solution.
I had a misstep in this one. I loaded the entire list of packages as separate lines into separate rows into a single column table. My challenge to myself was not to use ETL work to break this apart, or updates. I wanted a simple solution, thinking I didn’t want to take up extra space in the database.
As a result, I wanted a single query from a single string column that had the package size stored as one column, ‘2x3x4’ as an example.
My first attempt used the Moden Splitter function, which seemed to work well. I got three rows for each package. I then used a WIndow function to grab that data, order by the sizes, and then start performing calculations. When I didn’t get the right result, I started digging in.
One of the first things I saw was that I had multple packages with the same sizes. So I had two 22x3x1 packages, and when I used a partition based on the dimensions, I had calculation problems. That’s because the window partition doesn’t know that three rows are one package and three are another.
I could have fixed this with some other value to capture the package, maybe a row_number even, but I decided not to go down this route.
I decided to break this down, and I used a series of CTEs to do this. I haven’t gone back to optimize things, or combine CTEs, which is possible, but instead left the CTEs as I wrote them to solve parts of the puzzle. Multiple CTEs are east, and they help examine the problem in pieces.
My first step was to parse the string. I don’t love this solution as it is limited to a three dimension package, but it does seem to be the easiest way to break down the dimensions of the package. My query looks to find the string positions for:
- end of the first dimension
- start of the second dimension
- start of the third dimension.
This gives me the simple query:
with cteSplit (d, el, sw, sh)
, endlength = charindex(‘x’, dimensions) – 1
, startwidth = charindex(‘x’, substring(dimensions, charindex(‘x’, dimensions),20)) + charindex(‘x’, dimensions)
, startheight = len(dimensions) – charindex(‘x’, reverse(dimensions)) + 2
from day2_wrappingpresents d
Once I had these values, a little math gives me the length, width, and height.
, l = cast(substring(d, 1, el) as int)
, w = cast(substring(d, sw, sh-sw-1) as int)
, h = cast(substring(d, sh, len(d)) as int)
from cteSplit d
Now I’m in business. These two queries were fairly simple, despite all the nested functions. I’ve got integers with the dimensions of each package.
Now the tricky part. I want these ordered. They’re columns, not rows, and I can’t put an ORDER BY in the CTE, so I need to use some comparisons.
, small = case
when l <= w and l <= h then l
when w <= l and w <= h then w
when h <= l and h <= w then h
, middle = case
when (l >= w and l <= h) or (l <= w and l >= h) then l
when (w >= l and w <= h) or (w <= l and w >= h) then w
when (h >= l and h <= w) or (h <= l and h >= w) then h
, large = case
when l >= w and l >= h then l
when w >= l and w >= h then w
when h >= l and h >= w then h
Not the prettiest code, and perhaps there are better ways to determine this, but this passed all my tests, and seemed to work.
I could have put the next part in the final query, but I decided to make this a separate CTE to easily read the math. I know some people don’t like lots of CTEs, but in this case, I think they make the query very readable. I should look back at this in six months and see what I think.
, area = (2 * small * middle) +
(2 * small * large) +
(2 * middle * large)
, slack = (small * middle)
Now I use a final outer query to sum things up.
sum(area + slack)
The other thing I noticed here is that when I needed to solve the second part, I only had to change the math in the cteFinal to get the new values. It took longer to re-read the second part than to change the code and solve it.
I looked over how Wayne Sheffield and Andy Warren solved this in T-SQL, and I thought their approaches were interesting. I didn’t want to PIVOT or UNPIVOT anywhere, nor did I look at performance here. This runs so quickly, I’m not sure it matters, though I wonder if we were calculating across 1mm rows, would one be better?
I may look, but for now, I’ll leave that to someone else.