Skip to content

The Value of Data

How valuable is your data? It’s a good question, and certainly the type of data your organization has along with the business in which you are engaged will make your data more or less valuable. More and more we find the differentiation between companies is in the way they collect, manage, and use the data available to them. So much in business is based on guesses, but more and more the guesses have some basis in data. We are starting to see those who make decisions in business feel some need to justify or support their choices with data.

Is data the new oil? Oil was arguably the most important commodity of the twentieth century (and perhaps still is). The SQLRockstar wrote a piece with that same title, with the idea that knowing more about how valuable data can be will make you more successful in business. The post is based on the review of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, and talks about some of the challenges of using data to make decisions.

I certainly believe in the power of data, and that more data often gives us more insight into how the world works, as well as allowing us to draw some inferences about the future. Not necessarily better insight, but certainly more. I do think that computer extrapolation of patterns to the future is vastly overrated as most of our algorithms are far too simple, using too little data and discounting the increasing effects of small variables as scale increases. In short, I don’t think we’re anywhere close to a Foundation-like computer that can help us predict the success of new products, much less the future of a country.

However I think that using analytics to make small decisions, and help guide our directions is important. We will still need humans that apply their internal supercomputers to interpret data, and continue to evolve the algorithms, and I hope that more and more of you are gaining deeper industry insight in your particular field. After all, many of us data professionals will be needed to help guide analysts in gathering, transforming, interpreting, and displaying data in ways that allows us to make decisions with more confidence.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.6MB) podcast or subscribe to the feed at iTunes and LibSyn.

An Amazing Conference

I said that I’d write about some good things PASS does, and I’ll make it a point to do so. Despite my complaints and criticisms, I like the organization. I’ve been a part of it since October 1999, when the initial conference was in the basement ballroom of a hotel in downtown Chicago.

Since that time, I’ve attended most of the annual Summits. I’ve missed a few, and the last few years I’ve been on an every-other-year pace, but that’s my issue, not that of the organization.

The Summit has grown from a conference, like every other conference where techies gather to watch presentations, furiously scribble down notes, and then return to their hotel rooms to get a little late night work in for the employer that sent them away to a conference.

That first year was no different for me, though I did meet the amazing Kalen Delaney and managed to ask a semi-intelligent, reasonable question that she answered. I also had my wife, sister-in-law, and infant son with me. We watched the last game of the year at Comiskey Park, smoked cigars and enjoyed wine looking out over the river, and had a great few days in Chicago.

However something happened. I’d like to think that Andy Warren, Brian Knight, and I had a hand in it with our annual SQLServerCentral parties on opening night. Those grew from a line out the door of the reception hall in 2003, to the XBOX debacle and tv giveaway in 2004 to the casino parties with tons of prizes in later years. However we weren’t the only ones and PASS has grown to encompass numerous social events at the Summit and a community of people that greet each other with smile and hugs each fall.

We have more and more attendees that greet larger and larger groups of fellow attendees by name. We have runs, worship services, Women in Technology, kareoke and more, all of which almost overshadow the event itself. They don’t, and there’s lots of information being passed around, but the PASS Summit is way, way more than a conference.

Kudos to the organization, the HQ staff, and the various board members across the last 15 years that have worked hard to build (at times) and let evolve organically (at others), an amazing event.

Why Have a Nom Com?

I don’t want to get too deeply involved in the complaints and struggles of PASS, but I also want to ensure we have a healthy community. I want to have us continue to grow and prosper as data professionals who work with SQL Server. PASS has a mission, and it does help our user groups, SQL Saturdays, and members thrive as a group.

I also want to say that I think PASS does lots of good things, and there are good people involved. Props for the good things, and I’ll make it a point to point them out more.

Now…

Every year we have a set of candidates for the PASS board, who will help guide the organization through the next few years. We have had board members who did  well, and board members who didn’t. Overall, the organization may move forward, or it may stall, but I haven’t really seen anyone damage PASS through their actions.

To be clear, I have liked most of the people who have served and consider most of them friends. I’d be happy to sit down and have a drink with most of them, and I think they all had the community’s best interests at heart. No one made decisions to maliciously hurt the organization. I thank them all for their volunteer time and efforts.

However let’s not confuse appreciation with acceptance. Your best intentions do not imply competence or success. Criticism is not a personal attack, but rather an understanding that the process and system have bugs. If you can’t handle that, don’t serve.

I’ve wandered a bit from my title, but for a reason. The process for allowing candidates has changed over time and while I think the NomCom served a purpose when it was created, I wonder if that’s the case now. I saw a note recently that candidates need to meet a minimum criteria, and then they are evaluated by the committee.

If someone passes the minimum criteria, shouldn’t they be on the ballot. That certainly hasn’t been the case, but really, why have a committee? It can serve no other purpose than to influence voting by ranking candidates or removing candidates from the process that the committee doesn’t like. That dislike can be for personal reasons, a non-disclosed issue about the candidate, or some reason they aren’t qualified, whatever that means.

However if I certify I will travel, if I can speak English, work with SQL Server, and if I have some volunteer record, then stick me on. Well, not me, but anyone else.

I know some people worry we might have 25 people running for 3 slots, and then oh no, what will we do? How can the voters decide? Listen, if we ever have 25 people running for 3 slots, I think that’s a good day in the community. I’d view that as a win, not as a problem.

The point I’d make is there are no real decisions being made by the board that require some special training that the board members somehow have. We entrust the running of many civic decisions to people with no real training in some area, and I see no reason why the PASS board is any different. Any reasonably intelligent DBA can listen to information, ask questions, and make a decision.

In fact, I’d argue that while everyone that has served on the board has worked in the technical field, and probably had some success, they aren’t necessarily qualified to lead a non-profit with $1m+ in revenue. At least not more qualified than you or any other member of the community.

Even me.

Let’s grow up a bit. Let’s recognize that the board is a part of the community, and keep it that way. If someone wants to run, meets the criteria, let them run. Anything else smacks of attempts to shape and control the organization in some way.

Any way, whether good or bad, is unnecessary.

Still time to get a SQL Server tattoo

Not for me, and perhaps not for you, but you can still encourage Jason Strate, Gareth Swanepoel, Ed Watson, and Kristen Benzel to go under the needle. If we raise $25,000 for Doctors without Borders, they’ll get tattoos.

We’re over the $10,000 mark, so we’re already going to have these events at the PASS Summit

That alone will make the event memorable, but we’d still like to do more. There are plenty of places around the world that have medical crisis underway, and this is a great charity that sends medical professionals to places that could use the help.

If this cause speaks to you, then please think about donating for a great cause, and a bit of fun.

A SQLServerCentral DR Event

We had a disaster at SQLServerCentral this past weekend. It wasn’t a big disaster, but it was an event that required a restore of data.

An Administrative Error

On Saturday, I was attending SQL Saturday #331 in Denver. In between my sessions, I was prepping a few demos and finished getting ready earlier than expected. Since I had a few minutes, I checked my email and immediately knew we had an issue on the site.

We’ve been fighting spam for months, slowly tweaking our posting system in the forums. With the start of the American football season, we’ve been getting hundreds of posts every Friday and Saturday. I’ve tried to ensure these posts are removed before our newsletters are generated so that they aren’t filled with advertisements, but it’s been a chore.

One of the things I can do in the forums is select a series of posts and mass delete (or open, close, hide, etc) the group. For most of the SPAM posts we receive, the posts all occur in the same few minutes and are grouped together. I’ve gotten in the habit of deleting these batches of posts, watching for a legitimate SQL Server post at the end.

However on this day, one of the our regular threads was buried in the middle of all the SPAM posts. This was THE Thread, the most active and long lived discussion on the site with 45k posts. I inadvertently deleted the post and went on to give my presentation.

Afterwards, I got a private message from the site, telling me the discussion had been deleted around 2:00pm MST.

Quick Reaction

The first thing I did when realizing what had happened was connect to our production database cluster through VPN. When I opened Management Studio, I ran a few queries to verify the discussion had been deleted, and not just “marked for deletion”. Logical deletes exist in many applications, and if this is the case in your own disaster, the last thing you want to do is initiate a database restore.

In this case, the data was gone, so I immediately tried to initiate a restore. Since over an hour had passed, I didn’t want to restore over top of the current database. Instead, I wanted to restore a copy as a new database, as of 1:45pm or so.

I selected the proper options, marking the full backup from overnight and the log backups throughout the day. I didn’t have time to worry about using STOPAT and trying to get close to the actual time of data modification, so I choose the last backup I knew would be good. Verifying the database name was a new name, I clicked OK.

And nothing happened.

Actually, that’s not true as I got an error. The backup system in use by Rackspace, our provider, doesn’t keep the files available from SSMS. I don’t have rights to work within the restore system, or even request one from Rackspace, so I opened a ticket with Red Gates support for a restore.

Had this been a situation where the site was down or users were unable to read articles or post, I would have escalated this for immediate action. However since this was a restore of a single thread, and one that exists for entertainment more than education, I chose not to bother our IT staff on a Saturday night or Sunday.

The Fix

When I woke up Monday morning, I had a message that the restore had been completed to my specified new database (SSCForumsOld) as of 1:30pm MST on Saturday. I hadn’t asked anyone to do more than this, so this was the extent of actions taken by Red Gate.

Again, I could have specified actions in more detail, but rather than try to explain to someone in email which thread, and which posts needed to be restored, I decided to handle this myself. After taking my children to school, I sat down and got to work.

I’ve known the PK of this particular thread since I’ve had to work with in in the past. Connecting from Management Studio to the production instance, I verified I could see the 45k messages in the SSCForumsOld database. I ran the same query on the SQLServerCentralForums database, and validated the data was still missing. I then built a query that would perform an INSERT..SELECT of the parent posts from the restored database to the production system. This took longer than expected, with the table having a number of locks for about 2 minutes. However the post details had been moved.

That left me with the need to move the actual words of each post, which are stored in a separate table (for some strange reason). Rather than lock up the forums for minutes, I spent time rewriting my next insert to use batches of 1000, and only insert those messages which had not already been moved. Since I could join on PKs, this went quickly, in a few seconds. I next changed my batch size to 5000, and this completed in about 15s.

That seemed like enough time to run quickly, but also a good batch of data and manually executed this 9 times to move all the data. A quick check on the site showed THE Thread was back and I posted a few notes to let users know.

Aftermath

There are still a few issues with points for users that have posted to this topic being incorrect, but that is a lower priority item and I am letting our developers look at it. There is at least one known bug with points, and it’s possible we have another here.

My personal lesson learned was that I need to move a little more slowly when removing SPAM. I also don’t want to trust myself to do it regularly, so I also spent part of Sunday morning writing a little code and scheduling a job to delete posts with certain patterns of titles that the spammers use. I tried to limit to those obvious subjects so that no legitimate posts are removed.

We are also escalating some of the issues with SPAM, and with the return of my manager from sabbatical, I’m hoping we can build a few more filters to limit the disruptions in the forums.

And of course, everyone that posts to THE Thread has my apologies for the mistake.

Testing in Production

The recent Apple keynote announcing the iPhone 6 was very heavily watched. At least, I think it was heavily viewed because I had trouble connecting, numerous pauses and regular stuttering of the video. Despite the fact that the stream was limited to Apple operating systems (iOS/OSX), it appeared that many other people had trouble watching based on the tweets I saw. On top of the scale issues, there were also different language translations overlaid in the audio and crashes of the Safari browser. Overall, the live event was a disappointment to me, though it hasn’t stopped me from upgrading my iPhone.

The numerous problems that occurred had me wondering if any system wide testing was previously conducted. Was it possible that Apple was actually having their full scale, end to end system test in production? During the live event? I guess it’s possible, though it would be  imprudent and foolish to do so. With all the effort and expense that goes into the “Apple show”, how could a complete system test not be managed?

Certainly a one time event like a product launch can be hard to simulate. The scale alone is difficult to predict, but certainly there are things that can be simulated. The actual people and applications can be used to record, encode, broadcast, etc. As closely as possible to the conditions of the live event, with the same size, type, and configuration of equipment should be used. The same people who will operate it should participate. We know this as technologists, and most of us would perform testing like this if we could.

I know that resources are often constrained, and time is precious. However we need to perform some testing prior to production if we are to have confidence that everything will work during a deployment of new bits. The best way to do this is to deploy often, to a variety of environments in the same way we will to production. Execute a variety of tests each time that ensure the application functions as expected. If we find problems, we shouldn’t fix them in that environment. We should start over, fixing the issue in development, and deploying the changes again through test, pre-production, and any other environments we have. Be repeating the process over and over, we can build confidence that our production environment will work as we expect. 

I hope Apple did this, though the end result has me feeling a bit skeptical.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.9MB) podcast or subscribe to the feed at iTunes and LibSyn. feed

Automate Yourself to a Coffee Break

I’ve worked as a production DBA in a few companies, and in those positions, I’ve always worked to make my position one of “insurance,” with me able to respond when things go wrong. My goal has been to understand, improve, and enable an environment that runs smoothly, which allows me to take leisurely coffee breaks and not hurried (and harried) sips of coffee as I walk from the coffee pot back to my desk.

There was a good article published recently about the mindset of a DBA and how automation is an important part of your job. If there’s something you can easily automate, then it’s probably something you should automate. There are plenty of tasks that are easy to script into jobs, set alerts for, or have the system perform some action when they occur. If the system is performing that work, then you have time to deal with other, higher value tasks.

What are those higher value tasks? Well, what things do people complain about, but you never get the time to work on? Perhaps tuning queries? Maybe practicing your skills for a disaster? Finding time to analyze the performance of systems and plan for the future? There’s probably no shortage of things that you wish you had time to deal with because there is no shortage of busy work you’re assigned.

Do yourself a favor. Look for places to introduce automation through T-SQL scripts, Powershell commandlets, alerts, and more. Practice writing some small program to manage a task for you. It might seem like it will take longer than doing the work, and it will. However the next few times you’re asked to complete the same task, it should take you much, much less time.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 1.8MB) podcast or subscribe to the feed at iTunes and LibSyn.

The Mentoring Experiment – Closing Thoughts

Andy wrote a post today called The End of the Mentoring Experiment, which is something we’ve talked about for some time. In fitting with the decision, and perhaps justifying it further, we decided to do this a few months back, but as with many things, we haven’t gotten to it until today.

I remember when we started this, and we were very excited and interested. The first cycle of matches, using 8 people we hand picked, consumed a lot of time, and while it was successful, it was difficult to scale. We continued on a couple other cycles, trying to tweak to process a bit, but never found a way to manage this effectively, given other parts of our lives, and were never quite comfortable with automating too much of the experiment away.

We also struggled with the idea of accidently making some mistake, breaking someone’s privacy, or causing harm to another’s career or life. Andy wrote a few things about this, and it made sense to us.

We do believe in mentoring, and I hope that those of you looking to grow your careers take time to look for mentors. If you need advice or want help, ask. There are lots of people that will probably give you a little time. It’s not a sign of weakness to need or want a mentor. It’s a sign of maturity.

For those of you with some experience, keep an eye out for someone that might ask for help, or maybe just seem to be lost. Tread lightly, and carefully, but offer to be a sounding board if they’d like one. You can really make a difference in someone’s career with a little effort.

Grant for President

Grant Fritchey is running for the PASS board. He’s not running for President, though I’m not sure why he shouldn’t be able to. The board makes decisions as a group and the President isn’t necessarily more or less powerful than other members. However the President can be the face of PASS and present an image that motivates others. Grant would be great here, and I’d like to see the President elected directly at some point in the future, without the nonsensical requirements they serve for multiple terms before then.

However that’s a separate discussion. In the next week, the election will take place and I’m voting for Grant. I’ve known him a long time, I’ve watched him work with the community, and I think he might be able to create some change in the organization.

For far too long, I think PASS has been stuck in a bureaucracy that acts out of fear rather than leadership, and I’d like to see that change. I wouldn’t blame any past or present board members as I think many of them are fine community members, but I think the organization has systemic problems that create issues. I’ll cease ranting about that, and no, I’m not going to attempt change myself. I did once, but I have neither the time nor inclination to fight those battles right now.

I’m voting for Grant. You can make your own decisions, but I am choosing Grant because he’s can create change. I’ll also vote for JRJ for the same reason. I think James has worked to create, and implement change, and I’d like to see him do more, especially in non-US areas.

I’m not sure who else I’d vote for, though I like both Wendy and Sri. I’m not sure what they’ve done, and I’d certainly like to get more information. That’s one of my complaints from all board members, is that they disclose too little information to the community.

However you vote, please take a minute and vote. We have an amazing community, and the more you participate, the better it gets.

Disclosure: Both Grant and I work for Red Gate Software, so take that for what it means to you. We’re also on the DBA Team together ;)

Watson Freemium

I’ve been intrigued by the Watson project from IBM. It was quite a coup to see the platform win on Jeopardy. That’s an amazing accomplishment for machine learning, though it does seem like the investment and effort to set up the platform and experiment with it was more than most people could afford.

However this week I saw an announcement that IBM has introduced a freemium version of Watson, aimed at people looking to work with analytics. This product allows users to upload some data and ask natural language questions, which Watson will answer.
I have no idea how this will work, but I like the model they’ve taken. We can upload limited data and experiment with it. If it works well, we can subscribe, pa some money and upload more data and get more complex analysis. As an IT person, I think this is great. End users can play with it, and I don’t have to mess with a proof of concept. If it works, I can get involved, help automate ETL, query structure, tuning etc.
It’s a changing way of working with analytics, and one I welcome. To me, as business people play with technologies like this, they become more savvy and willing to invest in technology. And as they demand more, they create more opportunities for IT people to help them, not less.
Steve Jones
Follow

Get every new post delivered to your Inbox.

Join 4,734 other followers