The first browser I used on the Internet to find information was the text based lynx browser. This was quite an improvement (at the time) over Archive and Gopher. When Netscape released their Navigator browser for Windows, I abandoned those old tools, moving on to a new way to access information across the world.
However I primarily worked in a Microsoft world as my career grew and that meant using Internet Explorer, or IE. In fact, I remember thinking all the ActiveX extensions and the bundling on installations of Windows meant that the browser was convenient. It was even nice to program on, since it dealt with poorly formed HTML fairly well. However it had plenty of issues and was a never ending source of calls from customers that had stability and security issues.
In fact, sometime around Windows XP, I moved to Firefox and never looked back. I only had IE installed on my systems because Microsoft included it with the OS. The last 5 or 6 years, I’ve run Chrome and Firefox together, only using IE when a few specific sites required it. Even then, I’d give an inward groan each time I clicked on the familiar “e” icon.
However times have changed. This month, on January 12, Microsoft ends support for all versions of IE prior to 11. While IE will still be run for many years by some people, just as Windows ME and NT 4.0 still exist in places, Microsoft won’t provide support or patches, and we can only hope that people switch to another browser quickly.
I haven’t been thrilled with the Edge browser in Windows 10, as it hasn’t worked in some sites I use regularly. That seems strange to me, but alas, the idea of a Microsoft browser not working the same as other browsers is nothing new. I can only hope that Microsoft fixes the issues in Edge rather than having web developers include code to handle the finicky nature of Microsoft’s browser technology.
The death of IE shouldn’t matter much to us, as data professionals, but I wonder how many of us still have applications using web browser controls based on IE technology. I suspect quite a few, and it’s entirely possible moving forward that we’ll continue to deal with issues of poor rendering of data from our databases through web controls for many years to come.
How many of you have written code that results in a security issue or data breach? It’s possible many of you have, but are unaware of the issues. I’m sure a few of you have been hacked, have had to clean up data or code, but often it’s not your code. Maybe it’s something you inherited. Maybe it’s code your team wrote together. In any case, the feedback loop between your action (writing code) and the result (a security incident) isn’t tightly coupled.
I ran across a post from Bruce Schneier on how people learn about cybersecurity. The piece links to a research paper, and it’s an interesting read. It turns out the researchers see most non-experts learning from news, friends, and websites, often with biases towards actions that have had immediate negative consequences, but not necessarily those that are more serious.
That has me wondering about us, as supposed expert, or expert-ish, developers and DBAs. How many of us get security training, or get this training updated? How many of us learn from friends, or websites, and re-use this knowledge over and over in our work, not necessarily understanding, or unsuring, that we are building in strong security into our systems. I suspect many of us just try to get by, writing the minimal level of security that “works” for us, not really understanding how there might be flaws or holes in our system.
Our code, our configurations, our systems have much farther reaching impact than ourselves. In some sense, I think that a fundamental broken concept of information technology is the lack of security practices and architectures being built into our platforms and applications from the start. While convenience may get the boss off our back, or allow greater profit for our companies, it’s not helping our industry, or even our companies in the long term.
I just wish I had some idea on how to change things.
Many of us that work with SQL Server do so exclusively through SQL Server Management Studio (SSMS). I find so many people really do the majority of their jobs with SSMS, Outlook, and a web browser. Even back in 2003 when I was a full time DBA, I probably spent the majority of my time in those three applications.
However I also see more and more people using Visual Studio and other tools to accomplish their jobs. The growth of new tools, like Powershell, the expansion of our work into BI areas, and more mean that more and more people are using tools besides SSMS to work with SQL Server data.
This past week there was an announcement that MSDN subscriptions were changing
. At most of my jobs, I’ve had an MSDN subscription available to me. In fact, some of you might remember the large binders of CDs (and later DVDs) that arrived on a regular basis and contained copies of all Microsoft software. However many of you out there haven’t had MSDN available to you, or you’ve struggled to justify the yearly $1000+ cost, but you do want to work on your careers and practice with Microsoft software.
At first I saw the yearly cost of MSDN at $799, which is a pretty large investment. However as I looked to the side, I saw a monthly subscription, no large commitment, available for $45. That’s not an extremely low cost for much of the world, but it’s very reasonable in the US. It’s also a great way to build a setup that allows you to work with a variety of Microsoft technologies at an affordable cost. What’s more, you can stop paying at any time. Or start again at any time.
I know that it can be a struggle to invest in your own career, probably more difficult to find time than money. However this is a good way to get access to the various development and server tools for a period of time if you want to tackle a project or force yourself to learn a new skill.
I’m glad that Microsoft has moved to a subscription model for MSDN. I expect to see this subscription growing as small companies use a small investment that scales linearly with new hires to provide their employees with tools. I can only hope that many other vendors adopt this same model and allow us to rent our tools, and upgrade, for a very reasonable cost. I just hope they all let us backup and save our settings in case we interrupt our subscription for a period of time.
There has been quite a debate the last few years between using NoSQL and SQL Server as data stores for different applications. We’ve seen a great deal of interest from developers in document stores as the data structures are closer to what a C# or Java developer sees in their code. Graph databases are becoming popular for certain types of data as well, since their query performance can be extremely powerful in some cases. There are also the scalability aspects of NoSQL systems touted as a better way to build Internet based applications.
However NoSQL databases aren’t the ultimate data store. There are pros and cons to using them, and while it might be easier to being development on one for an application developer, I would argue it’s just as easy to get caught mis-using a graph database or a document database as it is to poorly program a relational database.
This week, in advance of the SQL Server 2016 release, I caught a post
from the SQL Server storage engine team showing how JSON storage inside of a table can provide some NoSQL-like benefits to applications. It’s a good idea, but like many NoSQL solutions, you need to spend a bit of time examining your particular requirements. If you need to update your JSON often, especially in large batches, this might not prove to be a good idea, especially if you need consistency.
I think ultimately that everything does relate to the CAP theorem, precisely because data has to exist somewhere, and anytime you need to duplicate it, move it, aggregate it, or anything to relieve some other issue, then there isn’t any magic taking place. You’re going to have to do work to move the data. Some platforms may make that easier, but I bet there are other drawbacks that you will have to consider.
Ultimately I think you need to pick the platform that best suits your environments. That’s not only the application requirements, but also the developers’ skills, the operational staff’s ability to manage the platform, and the maturity of the platform. Those are tough questions, and while I’m glad people experiment with new systems, I do hope that they think carefully about the positives and negatives, and more importantly, share their experiences with the rest of our industry.
I was looking through the news this week and found a piece by Michael J. Swart that I really enjoyed. It talks about different types of metrics that he has used when monitoring and alerting on his database servers. Michaal talks about the difference between alarms and gauges, which are usually base on the time scale in which a DBA or administrator needs to react to a situation. Alarms are created when something grows in a short term and causes an issue. The gauges tend to be longer term items that eventually cause a problem.
I’ve had a similar view in the past. However I look at these types of metrics from the standpoint of how I tackle my job as a DBA. I also try to think about these issues as a developer, ensuring that telemetry helps track issues that occur over time and acute issues cause a graceful degredation (or exit) and not an unexpected one.
When I’ve needed to manage systems, I ensure that I have alarms set up. I need these for the situations where something goes wrong in a short period of time. However I never want alarms to be based on a lack of visibility into the system over time. I want to try and eliminate as many alarms as possible with better long term monitoring that looks for indicators that the system might have issues. I’ve found that many (not all) of the alarms that I receive could have been prevented if I’d been paying close attention.
In other words, I’m a proactive DBA, looking at gauges (using MIchael’s terms), not a reactive one counting on alarms.
I can’t predict everything, but when I find that looking back at data is helpful. If I find a trend that could have alerted me, I’ll evolve my monitoring system to try and track those potential issues in the future. I’ve written my own custom metrics in the past, which track data, either PerfMon type counters or business data, and then let me know when the growth, decline, or change is unexpected. In this way, I tend to find I can often proactively prevent issues. If I can use my data to “predict” an issue my boss doesn’t believe in, I’ve also found my budget for resources may increase a bit, allowing me to manage systems even better in the future.
It wasn’t that long ago in my career when a survey showed the vast majority of SQL Server databases were < 10GB in size. These days I’m sure the average has risen as there are some very, very large SQL Server databases. I know a few people managing PB sized systems, which certainly will move the average upa bit. It’s likely, however, that the majority of databases are still very small, powering smaller applications with < 100GB.
However almost every databases continues to grow, and as systems grow, new challenges come into play. While we can still store a 100GB backup file on a thumb drive or SD card, it’s still a significant amount of data to write, which can take more time than our clients would like. Archivals, backups, restore, all tend to come from non-flash media, so larger systems certainly mean longer RTOs if we have issues, even if we’re just restoring a copy of a database to extract some deleted information.
However there can be more fundamental problems. I ran across a post from the CSS Engineers at Microsoft that talks about exhausting NTFS resources as more and more writes occur on a filesystem. I’d never heard of this, or expected it to be an issue, but apparently it’s occurring more often. I suspect even the NTFS developers hadn’t considered the scales to which we probably stress the file system daily. What is interesting is that this can occur with DBCC operations, database snapshots, and even bcp operations. There are workarounds, and certainly I’m sure that Microsoft has people working hard to increase the capacities of ReFS, which I expect will take over from NTFS, just as NTFS has largely displaced FAT.
As we grow to larger and larger scales of data sizes, with multi-terabytes on laptops and petabytes on servers, I think we’ll need to adapt a number of our techniques and operations in the future. We might end up needing different types of backups (anyone considering lookup table backups or daily backups of a set of rows, like orders?), as well as the possibility of continuing to spread the load of our server across multiple drives. I just hope that Microsoft recognizes this and looks to ensure mount points work smoothly across all of their technologies and features. At some point, 26 drive letters won’t be enough for the majority of us.
How often have you been caught off guard by some issue with one of your databases? I would guess most people have received a phone call, email, or trouble ticket that alerts them to the fact that something has gone wrong. A few of you might have even had an irate customer or manager wander over to your desk and loudly inform you of an issue. Those are not great times, and most of us would prefer to find out about problems before others bring them to our attention.
This past week was the 66th T-SQL Tuesday blog party. If you haven’t heard of it, I would encourage you to look each month for the announcement on Twitter and participate. It’s a great chance to share some knowledge, and showcase your own experiences or viewpoint on a particular topic. I’d actually encourage all of you to go back and write posts about the past topics at some point. It’s a great way to challenge yourself to document your own experiences in a concrete way.
The theme this month was monitoring (thanks to Catherine Wilhelmsen b | t), and this is exactly the way that you build awareness in your environment. Monitoring allows you to ensure that you, as the DBA, learn about issues, or potential issues, first. Monitoring doesn’t solve your problems; that’s up to you to do, but it can help you proactively detect issues early and perhaps mitigage them before users notice them.
The blog entries this month cover all sorts of issues that you might think about watching for. The 825 errors in your log, metadata about your backups, jobs, and more. There’s a variety of posts that cover all sorts of home grown systems, as well as noting that third party products work well. You have a huge variety of choices, from building your own tools and processes or purchasing something like SQL Monitor from Red Gate. However no matter what your budget, resources, time, desire, and more, I’d highly recommend that you really think about ensuring that you are monitoring your systems in a way that lets you know how things are working.
What to do when they’re working poorly, that’s up to you. Hopefully the information we publish at SQLServerCentral, and the links we send out every week from Database Weekly, can help you learn to be better at your job.
This week I was scanning through a number of SQL Server posts (in between working on April Fool’s Jokes) and a couple of them caught my eye. They dealt with simple subjects, but subjects whose details are important. I’ve often found people have had performance issues in SQL Server because of simple misunderstandings of how the system works, or because not enough weight is given to the impact of small details.
Kimberly Tripp (of SQLskills
) wrote a basic post on parameters, variables, and literals
, which would think most programmers would understand. However I realize more and more that lots of people that write code in the world taught themselves. That’s amazing, but it does mean that people have many different holes in their knowledge. What I might consider basic or common knowledge might be something that another person never had been exposed to. In any case, Kimberly does a fantastic job of laying out the differences in these three concepts, how they work side by side, and how the choice can dramatically impact performance of your code. Read this one.
The other post this week was a fairly simple post on data conversions by Rob Sheldon
. Again, precedence and conversions are something I dealt with when writing C code, and you do not want to get things wrong there. However many people might not have the same background and wouldn’t understand by varchar and nvarchar wouldn’t be a basic, simple conversion with little cost. Or why the type of conversion might result in far different results than you expect. Since we often test with one set of data and our application work with a (much) wider set of data, this one is a must read as well.
There are so many things you should consider when writing code it’s really a constant and regular effort to improve your skills in this business. You won’t learn everything, or learn something every day, but pick things up and practice with them. Learn to build better code, one technique at a time, and cement that knowledge with actual implementation in your work.
You’ve got at least two new things to look at this week, so pick one and try to ensure your code next week considers these details.
One of the problems I’ve seen for years in computing is people seeing the forest or the trees. Not that either is good or bad, but far too often I’ve run across people that only see one of these, meaning they examine problems from a high level perspective, or a very low level perspective. While either of those can help you solve your issues, there will be a problem or a time when you need to examine the situation from another perspective. Knowing when to look at a higher or lower level can be invaluable in troubleshooting.
This past week I ran across a great post from Grant Fritchey called Understand the True Source of Problems. In this post, Grant talks about some strange DBA “best practices” that were implemented to solve problems, but aren’t necessarily the best of anything. It’s an interesting list, and you might read it for entertainment, though I hope none of you are following these ideals at your place of employment.
However the thing I liked about this post is that in the beginning Grant begins with an old doctor joke, which is still funny to me, but it often is the approache taken by people trying to solve problems. They just avoid the issue, and never solve it. That can work in the short term, but I’ve seen this build up over time to the point where people are afraid to make any changes to a system or code because of the fear of causing some unknown issue.
I don’t think that every issue needs to be solve to the root cause. There are plenty of times something goes sideways on one of my computers and a reboot fixes the issue. I’m fine with that, but if it occurs more than two or three times, I do need to solve the problem. I need to be able to dig deeper to diagnose what resources are failing, what performance issue exists, or maybe what line of code is broken. Perhaps I need to step back and examine if some other part of my system, like the disk or network, is causing an issue that appears to be isolated to code. In other words, I need to be able to look at the wider environment (the forest) or dive into the particulars of a process (the trees).
Knowing how to strike this balance is a bit of an art, and it’s what really makes the best problem solvers experts. I can’t teach you that part of computing, but I can teach you this. When you don’t think you’ve completely solved the issue, stop for a minute and take a wider or narrower view. Or maybe both. Look for what you might not have thought of. If you’re like me, there are probably plenty of possibilities that come to mind when you stop and re-examine the issue. Then work your way through those items, discarding the ones that aren’t relevant. Sometimes opening your mind a bit is the best way to find the exact cause of an issue.
I heard a new phrase this week: the data lake. It comes from a Radar Podcast episode where Edd Dumbill talks about data as an asset that should be exposed everywhere in an organization. There’s a blog post on the subject as well.
The idea is somewhat centered on Hadoop, but it could apply elsewhere. Data comes into an organization and then tends to seek and get stored with other data in a large lake. Applications are just a way of accessing the data in the lake, but all the data really lives in a large Hadoop lake of information. In some sense, this isn’t far away from the “single view of the truth” that I’ve seen plenty of organizations attempt. In a relational world this means all data moves from OLTP systems to a large data warehouse, and is then moved to smaller data marts (really warehouse subsets) and is accessed from there.
It’s a good idea in theory, but in the practice of trying to move data around with any velocity between users, with all the copying, cleaning, transforming, and more going on doesn’t work. OLTP systems are needed because there are transactional actions that must be completed quickly and accurately. Moving this data to other systems becomes harder as data volumes and the sheer number of clients (whether users or other systems) increase. The idea that we can keep all of our processes working quickly enough that users won’t get frustrated is likely a dream. The more a value exists in a set of data, the more users will access it. The more accesses, the slower it often becomes, which starts a cycle of smaller subsets of data and applications that subsist on those small data puddles.
Excel is probably the most common example of a data puddle that exists in your organization. A set of data, perhaps out of date, but useful enough to make decisions based on. Infinitely flexible and convenient enough that updates, changes, and more often spawn more and more puddles where the information never gets transferred back to the large lake of a database, whether that’s an RDBMS, Hadoop clusters, or something else.
I think the idea of a large data lake is great, but in a practical sense, much of an organization’s data will never live in the lake. If it does, it will most likely be data that’s been superceeded by information in a puddle somewhere on an employee’s laptop, tablet, or personal cloud.