It’s been a long time since I wrote about my workspace, however I made a change recently. This was the "short term test" situation I set up a couple years ago:
After just using boxes for a few days, I decided I liked things and got monitor stands. However I left my boxes in place for the keyboard and mouse. Mostly because I wanted a few months of playing with heights.
I actually did experiment, adding and removing books a few times, but mostly I let things languish. I wanted to build my own keyboard stand, but kept finding excuses. I lived with the setup, getting annoyed with no good place to place a mug of coffee or a piece of paper. Precariously balancing my laptop on the books at times.
A few weeks ago I saw some friends on Facebook mention they were getting standing desks. I’d seen Brent Ozar’s desk, but had no desire to spent that kind of $$. I wasn’t even sure I wanted an adjustable one as when I sit, I usually go to a table or couch with my laptop.
Someone recommended a simple desk, which I liked, but wasn’t sure I wanted to ask my boss for $700 for a hand crank desk. However I did spend a few minutes shopping when my daughter asked for a desk for her room. She got a normal desk from Ikea as she’s starting high school and wanted a workspace.
I saw a $400 electric adjustable one, and was tempted, but decided to go for a $40 upgrade for mine. I got a small table, and added it.
This one isn’t perfect as it’s a touch high. I did add a couple more foam mats and raised myself to a good level, but I’m not sure I love this.
I do, however, like the extra shelf and space for putting a couple mugs down. I usually have two (coffee and water), so this is handy. I also have space for a few pieces of paper if I need to set them down.
I’m not sure if this is a good move. For now I want to leave things alone as I’m not in a hurry and want to be sure I would use an up/down desk. I certainly have some tasks that work better sitting down, like webinars, so I am tempted to get a chair and work here at times.
The one good thing I have going for me is Redgate is good about ergonomics and ensuring a good workspace. I am tempted by the treadmill desks, but I think I’d just as soon just walk away from the desk if I need a break.
I’m used to working at a chaotic pace. I have lots of diverse projects and deliverables that constantly force me to change my focus. I have short term and long term projects. I work with diverse technologies. My workload is a mix of development and administrative tasks. It’s a hectic schedule that gets worse when I travel since SQLServerCentral must continue to run.
Years ago I was struggling with daily deliverables, the bimonthly SQL Server Standard magazine, and a couple book projects a year. It was maddening, especially when my business partners also wanted me to experiment with ways to grow our business. I wondered, how would I ever find time to dream up new ideas or implement a proof of concept.
Andy Warren helped me slow down by scheduling “thinking time.” He told me it was important, and he had to make some at his job. Carve out an hour once a week or so and just spend time thinking about a project. Don’t do; just think.
I have kept that advice in mind over the years, and I try to take some time to just think about the tasks I need to accomplish. I don’t worry about actually getting anything done, but instead try to just think about the best way to move forward. I don’t always succeed, but I continue to try to make time and just think.
Whether you run a business or just write code for one, I think the art of stopping and thinking is a bit lost. So few of us actually stop to spend some time planning. Instead we’re very quick to start writing code or trying out a configuration change. While I am a proponent of experimenting, I still think it’s worth taking a few minutes and just consider different ways to experiment. Often we can choose better experiments if we have a plan.
Try it this week. Take ten or fifteen minutes and think about the next task rather than diving right in. You might find it to be a more valuable tool than just experimenting.
The Voice of the DBA Podcast
I learned a new trick with SQL Data Generator that I wasn’t aware of previously. I think this is a good idea for masking some of that production data that you might not want developers to have.
Let’s start with a production table. In my case, I’ve created a Sandbox_Prod database with a table in it for employees. I’ve added a few records that contain some sensitive information.
CREATE TABLE Employees ( empid INT IDENTITY(1,1) , EmployeeName VARCHAR(250) , EmpployeeEmail VARCHAR(250) , active TINYINT , salary money , pwd VARBINARY(max) ) ; GO INSERT INTO Employees VALUES ( 'sjones', 'firstname.lastname@example.org', 1, 10000, ENCRYPTBYPASSPHRASE('The User Sample', 'MyS%83ongPa44#word')) , ( 'awarren', 'email@example.com', 1, 20000, ENCRYPTBYPASSPHRASE('The User Sample', 'Ano$2therS%83ongPa44#word') ) , ( 'rsmith', 'firstname.lastname@example.org', 1, 5000, ENCRYPTBYPASSPHRASE('The User Sample', 'Kedj93m@@83ongPa44#word')); GO
I’ve got a second database, called Sandbox, that simulates my development environment. I’ve got the same table, but without any data in it.
What I want to do is move some of the production data to my development area, but not all of it. Some of the production data needs to be masked.
SQL Data Generator Sources
I can use SQL Data Generator from Redgate to do this, by using a data source that actually exists. In this case, I’ll create a new project and point it at my Sandbox database. I’ve deselected all of the tables except my Employees table.
When I pick my Employees table, I see the familiar generation screen on the right. Most of you are like me and notice the number of rows and the option to delete data.
However there’s another option. I can select the “Use existing data source” radio button instead. When I do this, I have a few choices for data. I can use an existing table or a CSV file. Both of those can be good choices, especially if I have sets of data I want to load into the table. Either one can help me to build known, specific data sets for development (or testing).
In my case I will choose and existing table. When I do this, I click the “Browse” button and I get a connection dialog for SQL Server. I pick my instance and the production database.
I click “Next” and then get the chance to select the table to use. In this case, I’ll pick the Employees table.
When I return to the main SDG screen, I see the table listed as the source, but my preview shows the actual production data. This is because I’ve mapped the production table as a source, and it will be used as it currently exists.
That’s not what I want. I want to mask the email address and the salary. However, now I can change things like I might do for any random data generation.
Let’s first click in the EmployeeEmail column. When I do that, I see the following, the column with its source set as the existing column in the production table.
However the drop down gives me lots of choices, including an Internet email generator.
If I select, then my preview changes. Now the image below shows production data for all columns other than the email.
I can repeat this for the salary (and password to be safe). When I do that, I’ll see random data for those columns and production data for others.
I can repeat this for all tables in my project, mapping through data that isn’t sensitive, and masking data that is. It’s a tedious process, but it’s a one time process for specific data. Once this is done, every restore can have the project run and the data masked. If production DBAs do this refresh, then developers never see sensitive information
It seems that there’s no shortage of re-branding attempts being made in all industries and by all types of people. I still remember when most of us were called computer programmers instead of developers. Not many people writing C# or Java code would want to be called “programmers” today.
One of the latest fads is the call for more data scientists to work on big data, another equally, poorly defined term. However it seems that he definition of what a data scientist is has been so ill defined that almost anyone that can write a query using aggregates might define themselves as a data scientist.
A good thing if you are looking for a job. Many of you might find opportunities (and raises) if you convince a hiring manager that you are a data scientist. However I’d be wary of living on just the new brand without growing your skills. If your company comes to expect more, especially with regards to advanced statistical analysis, you might find yourself in a bind.
I ran across a piece that looks at the skills that a data scientist might actually need. I don’t know how many managers might understand the difference between simple discrete rules engines and more subtle, complex, multi variable, adaptive algorithms, but there can be a big difference in how well the system actually performs for your company.
No matter what you choose for your carer, I’d certainly encourage you to continue to learn more about how to work with data. Whether you want to learn more about statistics, pick up R, or improve your visualization skills. Keep Learning. Keep your brain active and work to improve before you find yourself without a job and in need of training. Every little bit you learn helps and the practice of continuous improvement builds a habit that will serve you well over time.
The Voice of the DBA Podcast
Last week we received the results of the PASS Board of Directors elections for 2015. Jen Stirrup and Tim Ford were re-elected to new terms. They’ve served for the past few years as members of the BOD. Ryan Adams was elected for the first time, and I supported his candidacy, so I’m glad he won.
Argenis Fernandez didn’t get elected, but I still applaud his decision to run. He was one of only four out of the thousands of PASS members that did so. However in reading Andy Warren’s notes, perhaps Argenis will end up being on the board anyway.
If that’s the case, I’m glad. I think Argenis brings a fresh, new, different view, which is needed.
In any case, I’ll publicly thank them here, and in person when I see them, as their volunteerism should be appreciated and acknowledged.
There is a bit of a rant from John Welch on testing your data manipulation that I like. I think some of the impact is lost because the end shows an advertisement for a product that helps here, but the points made are good. We all want to test, we think it’s hard, we don’t have time, and our businesses live with the issues from limited testing.
I’m not a fan of Test Driven Development, as John is. Usually this is because I’m not always 100% sure of the results I want or have been given. I’ve often been given a request to do x and as I get involved, I find that the requirements might be incomplete, or even wrong, and they’ll change. As a result, I like to write a little code, get some idea of what I want to return or change, and then write a test that verifies what I’ve done is correct.
It’s a subtle difference, and maybe I’m doing TDD in the wrong order, but I like to get code, test it, then think about potential issues (which I might find as I write code) and write a few tests for the things that I’ve missed.
However I do believe we need to test our code. We all do test our code, even if it’s with a few before/after queries. What I don’t get is why we don’t just mock up a quick test that we can run in an automated fashion. It’s not much more work, and then we can more easily re-run the test later to ensure any refactoring or optimizations we make continue to work as expected.
The Voice of the DBA Podcast
I’ve been looking to work on my programming skills a bit and try some new languages. I’d like to grow my career in a few different ways, as well as investigate where some of the new languages and platforms might be useful for data analysis. As I’ve talked to some developers and been looking around, someone recommended Exercism.io.
Exercism.io is a site that allows you to practice coding exercises and get feedback from others. When you visit the site, it’s an interesting look, and invites you to log in through GitHub. If you’re not a member of GitHub, and you’re a developer, you probably are making some sort of mistake.
Once you log in, you really need to download the command line client and execute it. Once you do, you can configure it to connect and download exercises. Each of these is placed in a folder, as shown:
For each of these languages, you get an exercise that you need to complete in that language. I’ve been playing with Python, and I had a first exercise of hello-world. In the python folder, were my exercises (you can see I’ve moved on).
The hello-world folder had a read me and a test file (it’s since gotten my program in there). The Readme has instructions and the test file is a set of tests that can be executed to check your program.
When you pass the tests, you submit your solution from the command line client. The submissions appear on the website and people can comment on your code. I am in no way commenting on any else’s Python code at this point, but I did get a comment on my leap year calculation.
I looked over the comment and then changed some code. I had to futz with the command line to get this to resubmit, but as you can see, I ended up getting a second iteration in there. No comments on that one, but we’ll see.
It’s an interesting idea to share code and get comments. I think mostly this is a way to formally practice some exercises and get comments from experienced users, but the volume means that potentially you won’t get comments on your solutions. I know many of the SQL users may, or may not, comment on solutions.
I think this is interesting, and I’m tempted to try to do something like this for SQLServerCentral. The hard part of putting together enough questions that others can practice in an organized fashion.
Last week I did something that many people have already done. However this was my first experience, and I really enjoyed it. We’ll see if I get to do it again.
I went to SQL Saturday #441 – Denver as an attendee. I hadn’t submitted to speak, but I spent half my day over at the event, watching some sessions, networking, and enjoying some time with my fellow data professionals.
I’ve never done that before. I’ve attended many SQL Saturdays, 49 by my count, but always as a speaker. I’ll sit in sessions, but I always have a responsibility and my laptop.
Saturday was a bit more relaxing for me. I could attend sessions on topics and focus more, being unworried about keeping my talk in my head, worried about setup or practicing anything during the day. I could focus all my time on someone else’s presentation, which was nice.
I don’t know how many more SQL Saturdays I’ll get to attend like this. Certainly the ones in the Denver area are events I like to support by volunteering to present on a topic. Denver has grown quite a bit and we have many speakers from out of town that submit, so I may avoid submitting here in the future and continuing to attend, or volunteer in some other fashion.
It was a fun experience, and I’m glad I got to go. It was worth my time, as I learned a few things that I am looking forward to trying out over the next few weeks.
I was chatting in one of our DLM webinars and someone asked a question about logging data. I mentioned a few things, but this individual followed up and wanted more information. Specifically, how do you build a logging table?
It’s an open ended question, but it’s perfect for a poll. How do you construct a table that you use for capturing lots of information that you may or may not query later, but you might only query in a very ad hoc fashion when something has gone wrong. Usually that’s the only time I look through logs, when I’m investigating some type of issue. With that in mind, this week the question is:
How do you design a logging table?
I’m looking for two things here: the DDL and the rational for the design. What is good about this design? How does this particular structure (with indexing) allow me to find answers to the questions I might need to ask.
I realize this is very open ended, but the concept of logging is also open-ended, used different ways for different purposes. If you have problems with logging structures, let us know what design techniques to avoid as well.
The Voice of the DBA Podcast
Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.
I ran across a question on Facebook, of all places, the other day. Someone had asked a friend how to return a value from a procedure and assign it to a variable. My friend answered, but in the discussion, I noticed the poster was looking to return @@IDENTITY to the calling procedure as the value of the row that was just inserted.
Don’t do that. At least not without understanding the potential issues.
It’s been years since I’ve seen @@IDENTITY in use, and for a number of years before that, this was an easy “weed out” question in interviews.
If you look at the documentation for @@IDENTITY, the documentation notes that SCOPE_IDENTITY() and @@IDENTITY both return the last identity value inserted in the table, but @@IDENTITY is not limited in scope to the current session. This means that when concurrent inserts occur, you could receive the identity value of another session. Depending on how you use this value, that may or may not be an issue.
How does this work? Let’s create a simple table with an identity. I also create a logging table and a trigger that will add a message to my logging table when I add a row to the first table.
CREATE TABLE newtable ( id INT IDENTITY(1 ,1) , mychar VARCHAR(20) ); GO CREATE TABLE Logger (logid INT IDENTITY(56,1) , logdate DATETIME , msg VARCHAR(2000) ); GO CREATE TRIGGER newtable_logger ON dbo.newtable FOR INSERT as INSERT INTO logger VALUES (GETDATE(), 'New value inserted into newtable.') RETURN ; go
If I run this, what do I expect to be returned?
INSERT INTO dbo.newtable ( mychar ) VALUES ( 'First row' -- mychar - varchar(20) ) SELECT @@IDENTITY
However I get this. A 56 in my result set for @@identity.
The reason is that the last identity value was 56, from the logging table. The order of operations is
- insert value into newtable
- @@identity set to 1
- trigger fires
- insert into logger
- @@identity set to 56
That’s often not what we want when capturing an identity value. What’s worse, this behavior can exist, but not manifest itself until someone changes a trigger later.
If I change this SCOPE_IDENTITY(), I get a different result.
This is because the SCOPE_IDENTITY() function takes the scope into account and doesn’t get reset by the trigger code.
This took some time to write. Mostly because I had to setup the demo, test things, and then get the explanation straight in my head. It took me 15-20 minutes, including lookup time in BOL, but if you are new to writing, this might take a bit longer. You’d also want someone to review your explanation since this can be tricky to explain.