As a small child, I learned about the Dewey Decimal System in school. Those were the days when you wanted to look up some information or check a fact, you needed to peruse the dead tree books in a library, with information organized by the Dewey Decimal System in the US. At least, all libraries I used in school adhered to this.
These days we usually use a computer of some sort for learning, research, or really most any work with data. Often I start with Google to find my way to the source of information, but that’s not something that necessarily works well with finding sets of data. It certainly doesn’t work well within a an organization.
I saw recently that Microsoft announced the general availability of the Azure Data Catalog, which is designed to provide a catalog of data sets. In essence the Data Catalog is an index of the data sets that might be produced by your organization, with the information about the data filled in by the producer of data. Users that are looking for data can query the catalog instead of asking coworkers, wandering through the enterprise databases, or even relying on their own memory of where data might be located.
At first this seems silly, after all, don’t people inside of an organization know where data is kept? Don’t they learn the servers, databases, and connection methods? Certainly many do, but as with the pace of change these days, as well as the rapidly growing number of ways to publish data these days, it’s entirely possible that many people aren’t aware of all the data sources available inside of an organization. Even at Redgate Software, with a few hundred employees, it is fairly difficult to keep track of what data exists in which location.
The functionality of the Data Catalog seems a bit basic, and really almost like an extension of adding extended properties to various tables. Certainly things are centralized here, which is good. There are also ways to add other sources, such as SSRS reports, files, and even other relational sources. I’ll have to experiment a bit and see what’s available, and I might encourage you to do the same. The product relies on crowdsourcing, which can go really well, or really poorly, depending on how cooperative your crowd is.
In any case, I do like the idea of having a central catalog that individuals can update as they produce data sources for others to consume and change what’s available. If it works well, with good searching and tagging, it might eliminate some of the redundant work often performed to surface data inside of any organization and let employees know how to find the answers to their questions.