The Exciting World of Data

I was honored to have the chance to give the keynote at SQL Saturday #520 in Cambridge this year. This was a quick keynote, and it was fast. I didn’t record it, but people seemed to enjoy it, and I decided to share some of my thoughts on the Exciting World of Data, the title of the talk.

We love data. At least, I do. It’s a way of learning more about the world around us, describing it, modeling it, understanding it, even enhancing it. And the world of data is changing. Size is increasing. We’ve moved from bits to bytes, to kilobytes to megabytes to gigabytes to terabytes, just in our hands. We can’t even really conceive of what this amount of storage means in a physical sense. Our large systems have grown to petabytes, perhaps exabytes and zettabytes one day and eventually to yottabytes and beyond.

We have to transfer data quicker as well. My first modem was 300 baud Hayes Smartmodem. This was at university, where I could watch the text crawl across the screen. From here I had a few upgrades and standarized on 28.8k for quite some time before moving briefly to 56k speeds. When I started SQLSeverCentral, I had an ISDN line in my house. I’ve configured T1 lines at work, and upgraded to faster DSL and Cable routers at both home and work. I never worked in the OC space, but some of you may have transferred data across OC-256 or even OC-768 lines.

Our mobile data moved from SMS to GPRS to Edge, where with a Sidekick, where I could actually type real messages on a keyboard. When I got to 3G, I thought was all the speed I’d need for a long time. I think I spent 2 or 3 years with an iPhone 3GS. However, like many of you, I upgraded to 4G and LTE, which are amazing speeds, faster than many of the early networks I had at work. We’re testing 5G and 6G and maybe we’ll keep going to subspace radio? Who knows.

Here on earth, we move more data in our systems. Some of you may have worked with tape storage. My first PC had a tape drive. So I was quite pleased to get a floppy disk drive, first 5.25″ and then 3.5″. I thought we’d have those forever, but I’ve migrated to hard disks to solid state disks to 3D drives. I think 3D SSD technology is going to fundamentally change the world, with latencies that will require our software to be very, very efficient.

Our interfaces have improved, to allow us to move more data, quicker. From SMD to ESDI to ATA to IDE to SATA to SCSI to Wide SCSI to Fast SCISI to Fast Wide SCSI to Ultra SCSI to Ultra Wide SCSI. SCSI 2 to SCSI 3 to Fibrechannel, infiniband and beyond. USB to Firewire 400 to USB 2 to Firewire 800 to USB 3, 3.1, eSata, Thunderbolt, Thunderbolt 2, Thunderbolt 3, and what’s next? Who knows?

Our computers used to be the room, but we moved to minis, with the computer in the room. Then we got desktops and portable luggable machines, moving to laptops that we can carry one handed to handhelds computers in our pockets. We even went to tiny devices that we found were too small. So we’ve gone the other way with smartphones and phablets and iPads and tablets. Soon the small things will be larger and the world  around us will become enhanced with virtual reality and Hololens. Maybe.

Our world is using all this technology to monitor, mark, chip, tag, record, watch, measure, and gather data. We get to work with that data. We get to gather, store, manage, index, backup, transfer, clean, and care for all that data. We need to work with it. We’ve got to move it with text files, CSVs, Excel, Word, PDF, MP3, MP4 and more.

We send data over TCP, FTP, SMB, AirDrop, VPN, Web services, REST, jQuery, and more. We share data with files, messages, texts, clicks, likes, tweets, pings, drops, shares, snaps, hangouts, and once in awhile, we communicate with phones.

What do we do with all that data? Why, we can do anything. We have PowerPivot, Power query, Power View, Power map, and Power BI. It seems Microsoft really believes data has power.

We have the chance and potential to build amazing visualizations. We can analyze our business progress, producing tables, charts, graphs, animations, and of course, reports. We can map our own activities and events, tracking how we interact with the world, experience it, perhaps even using the data to relive, remember, or reinvent the world around us.

But, we have so many things to learn in order to reach our potential in working with data. Fortunately, we’ve got all sorts of resources to help us, no shortage of books, articles, blogs, podcasts, tutorials, classes, and most impotantly, friends. I hope you take advantage of the resources to learn more.

And you can start today.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 6.8MB) podcast or subscribe to the feed at iTunes and Libsyn

Posted in Editorial | Tagged | Leave a comment

Finding the Attribute

I was playing with some Extended Events recently. If you haven’t tried, I’d encourage you to do so. However, working with XML is not my favorite. I know I can get the GUI in SSMS 16.x to show me events, but I sometimes want to query.

Here was my quick adventure in XML and XQUERY. I should know this stuff better, but I think I’m working with XML so rarely that I’m constantly re-learning things.

I had a document like this:

DECLARE @x XML = CONVERT (XML, 
'<event name="login" package="sqlserver" timestamp="2016-09-28T01:48:31.743Z">
  <data name="is_cached">
    <value>false</value>
  </data>
  <data name="is_recovered">
    <value>false</value>
  </data>
  <data name="is_dac">
    <value>false</value>
  </data>
  <data name="database_id">
    <value>1</value>
  </data>
  <data name="database_name">
    <value>master</value>
  </data>
  <action name="username" package="sqlserver">
    <value>PLATO\Steve</value>
  </action>
  <action name="session_nt_username" package="sqlserver">
    <value>PLATO\Steve</value>
  </action>
  <action name="session_id" package="sqlserver">
    <value>60</value>
  </action>’

There was more, but this is fine. I had a query someone else sent me that looked like this:

SELECT 
[message] = @x.value(
                     '(event/data[@name="database_name"]/value)[1]',
                     'nvarchar(250)'
                     )

That’s fairly simple, but what I really wanted was to get an attribute at the top. In the “event” node, I wanted the “name” attribute. I can go from the query above to that, right? I could have dug into XQUERY, but I’ve found it logical in the past, so I thought I could actually figure this out.

I know that the path was just event, and I needed to get the attribute from that. I tried this:

SELECT 
[message] = @x.value(
                     '(event[@name="name"]/value)[1]',
                     'nvarchar(250)'
                     )

That didn’t work. So I modified things to

SELECT 
[message] = @x.value(
                     '(event/name/value)[1]',
                     'nvarchar(250)'
                     )

No go.

Hmmmm. What do I need to do? I decided to Google a little and saw a note that the attribute is accessed with the @ symbol. OK, so I need to provide that as the path.

SELECT 
[message] = @x.value(
                     '(event/@name/value)[1]',
                     'nvarchar(250)'
                     )

Still no good, but then I removed the value.

SELECT      @x.value(
                '(event/@name)[1]', 
                'nvarchar(250)'
               )

That was it.

XPATH and XQUERY make sense once you get the rules, but they’re still annoying to work with. I’ll be trying to work with the GUI in SSMS as much as possible with XE.

Posted in Blog | Tagged , , | Leave a comment

The Quiet Zone

I’ve been in a data center when most servers turned off. I’ve actually heard dozens of systems powered off quickly, and it’s a strange sound. You become so used to the white noise of numerous fans that having them turned off is a little unnerving. It’s neat when it’s a scheduled patch day and all servers cleanly shut down together. It’s an altogether different experience when there’s an unexpected issue and management sees their expensive hardware not working.

However, imagine losing your servers because of a loud noise. That’s what happened to ING Bank when a fire extinguishing test caused a number of hard drives to fail. To be fair, the loud noise was north of 130db, which is very loud. Since sound is really vibration, the impact to read/write heads caused numerous failures. The bank needed 10 hours to restart systems in their DR center, and managed to do so. While that might not have been what the bank officials wanted, this is a good DR test, and I hope they learned a few things that might help to fail over much quicker in the future.

This might be a good reason to think about SSDs, which are less susceptible to vibration than the older, spinning rust drives. I’d guess that there are other issues that could affect SSDs and someone is going to discover them at an inopportune time. Already we’ve seen dramatic improvement in SSD technology, driven by numerous early issues relate to writes and reliability.

Engineering facilities is hard, and there can be many unexpected issues. I’m sure the people that designed the fire suppression system weren’t concerned about the noise; they were concerned about shutting down flames quickly. I’m sure that the people filling the system didn’t think a little extra pressure would matter. These seemingly innocuous decisions can cascade, which is why we practice and preach DR preparation. Not just backups, but restores and quick fail over.

If it’s not your organization, it might be humorous rather than stressful, but you never know what design flaws might lurk inside your facilities. I once worked in a data center that had only about half the cooling that we expected. Why? The engineers assumed that since we worked an 8 hour day, so did the computers, which we’d turn off at night. Luckily they had built a pad into their calculations so we were only short half the capacity rather than two thirds.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.7MB) podcast or subscribe to the feed at iTunes and Libsyn.

Posted in Editorial | Tagged , | Leave a comment

What is the Future of Monitoring?

I read Tom LaRock’s post on the future of monitoring, looking at the rise of machine learning and complex algorithms to evaluate systems. Tom notes that the data input into a system is crucial in determining whether the system can produce a prediction or output that has value. Since the humans that decide what data is included may not choose wisely, the system won’t necessarily reduce the time required to manage a system.

I tend to agree. From my conversations with those people having success with machine learning systems, the data preparation is the most critical (and difficult) step. Shortcut the cleansing and organizing, or make poor choices of what data to include and you won’t likely get actionable results.

However, I have a different view of monitoring. After my watching of the Darpa Grand Challenge (editorial on this recently), I think the future of monitoring is perhaps actually making changes to code in-line. Perhaps with approval from humans, but I suspect that we can train some process to understand how particular batches can cause issues. We can certainly set filters that might note index changes on large tables could be problematic and should be investigated before an issue arises. We can teach a system to recognize code deployments, perhaps even roll back certain changes if the application fails. We certainly could have a machine learning system watching index usage and query plans to recommend indexes in a real time manner, perhaps even turning on and off on-line rebuilds.

I think there is possibility, but where I’d really like to see advanced monitoring is not in production environments. I’d like to see better systems that can watch development, helping suggest or rewrite SQL before it’s deployed. Perhaps applications can warning developers of potential performance or security issues. Such systems could help us in building more consistency into our applications. We can have systems that help our developers code better.

As our databases and software become more critical to the functioning of most of our businesses, we certainly need more reliable and robust development practices.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.6MB) podcast or subscribe to the feed at iTunes and Libsyn

Posted in Editorial | Tagged , , | Leave a comment

DevOps at Microsoft

I’ve been watching Microsoft for quite a few years, with my career being tightly intertwined with the company for 25 years.  I’m amazed by the way the company has changed in the last five years. It’s probably been longer than that, but from the outside, the last 5 years have been surprising. Certainly the way Microsoft produces software is a process worth studying by almost any organization with developers. There are some scale advantages Microsoft has, and their mission is to reach and touch people at a level far beyond what many of us deal with, but there are lessons to be learned.

There’s a great piece that shows some of the ways in which Microsoft is practicing DevOps in their own software development. I like that the focus isn’t just that developers do more, but that operations staff are involved in working closely with the development teams. In fact, there’s a quote I like that notes in addition to technology changes and habits, ” we found that the organizational changes and cultural shifts required to embrace these practices have been just as significant.”

As I talk about smoothing the database development pipeline and creating efficiency in the process, I want people to be aware that cultural change is required. My employer, Redgate Software, can sell you tools that solve the technology challenges, but if you don’t change the way you build software in your culture, with your staff, the tools don’t help. At best they make a few tasks easier for developers, but you will still have issues releasing your software consistently on your schedule, whatever that may be.

There is also the emphasis on developers deciding when to release. In Microsoft’s domain, this probably works, but not likely for many line of business applications in organizations. However, the focus should be getting changes to customers quickly after development. The sooner they see features, the sooner they can provide feedback on whether the software works as expected or changes are needed. This quick feedback also means that developers have the architecture and flow of the code in their recent memory. Always a recipe for smoother continued development and quicker bug fixes.

Testing isn’t emphasized much, though I think as developers become more accountable, and they can get changes to customers quicker, they’ll start to want automated testing that allows them to check for regressions, as well as ensure requirements are being met. Microsoft has the advantage that many of their internal teams can actually use the end product, something that isn’t always easy for those of us in other industries.

DevOps works. The principles are sound, and it doesn’t matter if you’ve followed the principles for decades (maybe under another name) or just started a new process by asking for more coordination and collaboration from your teams. Learn to build a coherent system with your developers and operations staff, throughout your software development process, and your whole organization will benefit.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 4.6MB) podcast or subscribe to the feed at iTunes and Libsyn .

Posted in Editorial | Tagged , , | Leave a comment

Quick Tips–Using ii in #SQLPrompt

SQL Prompt has lots of great features that can help you write SQL quicker. However, you’ve got to train yourself to use a few of these and not just start to type with your old habits. This quick tip looks at one of those areas when inserting data into tables.

There are times I’m building some test data to try a new piece of code. In the past, I’d start typing INSERT. In SQL Prompt, this gives me a selection after beginning the ins, which is how far I get as a quick typist.

2016-08-22 10_34_02-SQLQuery7.sql - (local)_SQL2014.Sandbox (PLATO_Steve (57))_ - Microsoft SQL Serv

Effort: 4 keystrokes.

I’ve tried to write cleaner T-SQL, so I’d add in the INTO, which for me is “i” plus 2 down arrows.

2016-08-22 10_34_11-SQLQuery7.sql - (local)_SQL2014.Sandbox (PLATO_Steve (57))_ - Microsoft SQL Serv

Now: 9 keystrokes

Next, I’d begin with the table, in this case, my Employee table. For me, this means hitting “e” and then the down arrow.

2016-08-22 10_35_26-ObjectDefinitionBox

Now: 11 keystrokes

Hitting Tab at this point would get me the insert statement, with the list of columns and default values. Having the quotes already there, and the data types, can be really handy.

Total: 12 keystrokes

That’s a pretty quick way to get the insert statement. However, there’s a better way. If I type “ii”, I get this:

2016-08-22 10_41_51-ObjectDefinitionBox

That’s one of the built in snippets for SQL Prompt that builds an insert statement. Now I can add my table.

2016-08-22 10_42_02-SQLQuery7.sql - (local)_SQL2014.Sandbox (PLATO_Steve (57))_ - Microsoft SQL Serv

That means in 6 keystrokes I’ve built the same insert statement, and it’s much quicker. As I’ve gotten used to using ii, I can get test data for an algorithm built very quickly.

Hopefully, you’ll see the value in SQL Prompt and start using snippets to improve your ability to code quickly and take the hassles and guesswork out of cleanly building SQL Code. You can also read a similar piece I wrote on the Redgate blog.

Try a SQL Prompt evaluation today and then ask your boss to get you this productivity enhancing tool, or if you’re using the tool, practice using ii the next time you need to insert some data.

I’ve also written a similar piece on the Redgate Blog,  You can see a complete list of tips at Redgate.

You can also watch this tip in action:

Posted in Blog | Tagged , , | Leave a comment

The Black Boxes

Machine learning and artificial intelligence seem to be the hot topics these days. From bots that can interact with people to systems that learn and grow as they process more data, it seems that science fiction is becoming reality. At least, in limited ways. Autonomous cars, perhaps the highest profile example of these topics, are advancing and being tested in a few locations around the world, but I think we are a long way from having human controlled and autonomous cars interacting freely at any scale. There are still plenty of issues to work out, and the consequences from mistakes require serious consideration.

I was thinking of this as I read an interesting question: Whose black box do you trust? It’s a look at algorithms and machine learning, and the impact they have on the world around us, despite many of us not understanding how they work. The main examples in the piece are in the area of journalism as it relates to social media (primarily Google and Facebook), but also touches on autonomous vehicles, both autos and planes. The latter was a bit of a shock to me as I assumed humans always handled takeoff and landing, something the author says doesn’t happen at SFO. Some searches around pilot sites seem to note that automated landing is done regularly to test systems, but is used in a minority of cases.

The question is, do we trust the black boxes that run our systems, and really, does it matter? In the piece, Tim O’Reilly says he has four tests for trusting an algorithm:

  • the outcome is clear
  • success is measurable
  • the goals of the creators align with the goals of the consumer
  • does the algorithm lead creators and users to better long term decisions.

Those are interesting ways to evaluate a system, though I think the problem is that the last two are a bit nebulous. One of the things that I see more and more as I get older is that the same data or the same facts can lead two different people (or groups) to two different results. Our goals, our interpretation of events, even the weights we place on the various factors in a complex system vary dramatically from person to person. In such a world, can we truly evaluate what the goals of a creator are? Forget about consumers, assume one person building a software system. They will have multiple goals, and do we really think their goals can be easily listed? Or weighted/ranked appropriately? What about when the goals change?

I really think that the black boxes need more disclosure, though I freely admit there isn’t a good way I know of to do this. However, I do know one thing that can be better disclosed: data. We can have more openness and analysis of data from software systems along with some accountability by creators for the impacts of their software. Again, I don’t know how to enforce accountability, especially at scales that encompass millions of consumers and easily cross country borders. That is a problem I think we need to find ways to tackle, at least at some manageable level. Maybe using the 80/20 rule where 80% of consumers and creators find the outcome to be a good one.

The world of technology and software are advancing and growing extremely quickly. Certainly hardware advances, but it seems the last 5-10 years have been more about new and different software applications that fundamentally alter the way humans can interact in social, business, and government situations. Underpinning all the changes is data. New data, more data, and novel ways of working with this data in ways that were unheard of 20 years ago. It’s an amazing time to be a data professional.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 5.2MB) podcast or subscribe to the feed at iTunes and Libsyn .

Posted in Editorial | Tagged , | Leave a comment

What Helps You Learn?

There was a post recently asking how to train a junior DBA. What tasks do you give them, how to get them productive, etc. The responses were interesting, but I was struck by one note from a person that is a junior DBA. This person mentioned a few things that have helped them get better at their job, some of which I hadn’t thought about before.

Since I try to teach people about SQL Server in writing, in presentations, and longer formats, I decided this might make a good poll question. If you spend a few minutes thinking about how you’ve improved your skills at your job, let us know what works best for you.

What things can others do to help you learn more about your job?

Are there specific things that have actually worked better for you? Perhaps you prefer to learn alone, but even then, what works well for you that builds skills. Is there a format or style of presentation that reaches you?

I’m a big believer in multiple voices and multiple ways to teach people. This is one reason we sometimes have 2-3 articles on the same topic from different authors. Each will reach different people, and speak to them in different ways. I think the format (text, video, live, etc. ) also has some impact on what helps different people learn. Let us know today what works best for you.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 2.1MB) podcast or subscribe to the feed at iTunes and Mevio.

Posted in Editorial | Tagged | 1 Comment

More Azure Database Creation in Powershell

I wrote about Azure SQL Database for T-SQL Tuesday #82, but I had a few mistakes in my process, at least from a cost standpoint. Since I’ll probably use this a bit more as I flesh out some CI/CD, I decided to document a bit more about the database creation process.

The Old Code

One of the problems I noticed right away was that I created an S2 level database in my post, which was too large. That wasn’t needed, and it might eat up my database credits. My original code was:

New-AzureRMSqlDatabase –ResourceGroupName “Default-SQL-WestEurope” –ServerName “dkranchapps” –DatabaseName “TSQLTuesday” –Edition Standard –RequestedServiceObjectiveName “S1”

In here, there are a few things that are obvious to me, and some that might not be. I won’t delve into detail on how to choose some of these, but I’ll define what they mean. There is a doc page for the cmdlet, which I used to check what I knew.

ResourceGroupName

This is the way you organize Azure resources. In my case, I have one resource group, which I named “Default-SQL-WestEurope”. This was the first group I made, intending this to be for SQL servers, and it’s in the West Europe region. Hence the name. 

ServerName

This is self explanatory, but you create servers to host resources in Azure. The one I created a long time ago for experiments was “dkranchapps”. I re-used it here, though I could have created another one.

DatabaseName

I’m not defining this. If you don’t know, stop reading.

Edition

This is the basic edition for your database. There are multiple options, but I used “Standard”, which was in the sample documentation code. However, a Standard incurs a charge from $0.0202/hr to $0.2016/hr. That doesn’t sound like much, but there are 720 hours in a 30 day month. That’s $14.44 to $145.15 a month. Again, not much, but this is per database. This could cause me issues with my $150/month credit.

ServiceObjectiveName

This is the scale within the Edition. I’m not sure why this the Edition is needed if we specify this here, but we can use a name or a value (GUID). In my case, I neglected to check the S2 syntax when I made the database. What I’d want for most things is a Basic or maybe an S level if this is short lived.

Other Options

There are other options. I can specify the subscription in a parameter, do a “WhatIf” to see what this would do, Force the command without user conformation, set a maximum size, and more. 

While you might not need these options, I’d encourage you to look through the list and make sure that it’s not your intention to omit something that might be important in your situation.

Creating a Basic Database

One of the things I want to try is scripting the creation of a database in my CI process. That’s not quite simple, and it’s a task for another day, but here’s a basic call that I can work on adding to my builds.

New-AzureRMSqlDatabase –ResourceGroupName “Default-SQL-WestEurope” –ServerName “dkranchapps” –DatabaseName “” –Edition Basic –RequestedServiceObjectiveName “Basic”

Once I run this, I can check the portal, and sure enough, I see my database at the correct size.

2016-09-21 10_52_49-SQL databases - Microsoft Azure

That’s the first step to automating some builds. Now I need to worry about security and API keys, but for now, I can build a database.

And, of course, remove it with Remove-AzureRMSqlDatabase.

Posted in Blog | Tagged , , | 3 Comments

Capture the Flag

I’ve read a lot of science fiction in my life. Quite a few books talk about cyberspace and the interactions of humans and software. In Snowcrash there are AI systems defending systems. The Unincorporated Man series has AI systems that not only have sentience, they hide their capabilities from humans. Numerous other examples exist, including movies such as The Matrix, which popularized system programs independently able to evaluate and react to security issues.

Science fiction is here, showcased recently at the DARPA Cyber Grand Challenge. You can watch a bit about the challenge or read a short article about what happened. Overall, it’s amazing. Seven systems ran various pieces of software that were targeted at finding vulnerabilities, patching their own and exploiting those of other systems. All without human intervention.

In a visualization, I picture a team of Agent Smiths (from the Matrix) seeking out rogue programs that seek to wreck havoc inside of the operating environment. Or maybe a series of automated developers tracking down bugs, such as buffer overflows and correcting the code to remove the issue.

Is this where static code analysis is heading? Automated repair, or maybe even real time repair? Could we actually have a software bot that might run through your Entity Framework (EF) application, tracking down SQL Injection vulnerabilities and correcting them? Even at a rudimentary level, I can imagine a bot that examines incorrectly parameterized queries and rewrites the code to properly manage the .NET code. Perhaps at some point we’ll even have bots that can understand queries from some application and build stored procedures on the fly that accept parameters, are called the next time a particular method is called, and can even grow to evolve with schema changes.

I know the simplistic ways in which we might build this code would have their own sorts of issues, but I’m hopeful that at some point we’ll have semi-intelligent bots that can fix code to ensure it’s secure and reliable, while keeping the creativity originally built by the human developer.

Steve Jones

The Voice of the DBA Podcast

Listen to the MP3 Audio ( 3.2MB) podcast or subscribe to the feed at iTunes and Mevio .

Posted in Editorial | Tagged , | Leave a comment