Missing Full Text Extensions in Express

I was tasked recently with removing the full text indexes in Adventureworks for a demo. The full text indexes were causing a few extra items to appear in a SQL Compare demo and weren’t needed. The individual that had set up the VM I was using wasn’t sure what to do, so they asked me.

I logged on to AdventureWorks and right clicked the Production.Document table. I knew that had full text indexes because I’d tested them before. However, what I got was this:


The Full Text index part was grayed out. Strange, since the database was attached, and with a query, I could see FTS indexes below.


I suspected that the FTS extensions weren’t installed. I decided to check by running setup. When it started, I clicked the top item to "add features", as shown here.


That brought up a list of instances. The default is the top radio button below, but I selected the second one, which let me select an existing instance.


Next, I saw the features, and sure enough, FTS wasn’t checked.


I checked it and then clicked next to continue the installation.


Once this was done, I could run SSMS and sure enough, I could delete the FTS indexes (shown below).


I actually had two instances on this VM, but this FTS feature isn’t in SSMS. It comes from the instance. After I deleted these three indexes, I connected to the second instance and tried to delete the FTS indexes, but things were grayed out, as shown in the first image above.

I had to re-run setup for the second instance and add the FTS components there as well to delete the indexes from that database. Once that was done, I could easily delete all the FTS indexes and complete this simple task.

DevConnections Slides and Code

I had a great time at DevConnections this week, It was a fairly quick trip for me, with lots of other events, but a good time. I presented two sessions and both went well. I had some good questions from the audience and people seemed to enjoy the talks. In order to make it easy to find the slides and code, I’m including them here:

Encryption in SQL Server

Searching Office Documents in SQL Server 2012

If you have any questions, please feel free to contact me.

Encryption and Full Text Search at The Mandalay Bay (DevConnections)

It’s just a week until Dev Connections.  This is one of the great conferences for the hybrid technology person, with a variety of technologies being talked about, all for a single admission price. I’ll be speaking at SQL Server Connections, but wandering over to the development side at times since I’m getting back into a little bit of software development as part of my job with Red Gate Software, and of course, for fun.

This fall the Dev Connections event is at the Mandalay Bay in Las Vegas. It’s a great hotel at the end of the strip with nice amenities and a good conference center. It’s easy to get to from the airport, which I like since traffic can be a pain. There’s also the Luxor next door, with Carrot Top playing. I’m hoping to sneak over there for a show.

However I’m primarily there to teach you a few things about SQL Server. I’ve rewritten a bit of my Encryption talk to add a few things and give you a good grounding in how the technology works in SQL Server.  I’ll be looking at how you can encrypt your connections, secure your physical files with TDE (don’t miss the hex editor demo) and also look at data encryption using symmetric and asymmetric keys. I’ll also be talking about full-text search. I’ve reworked my talk on searching binary data to focus on extracting information from office documents.

Both of my talks are on Wednesday, October 2, 2013, but if you are looking for other SQL Server topics, there are some great ones. Denny Cherry talks partitioning, Allen White talks Powershell and SQL Server, Tim Ford talks DMVs and Stacia Misner talks BI. There are a number of other great SQL Server speakers as well as speakers to talk Windows, Visual Studio, Exchange, Sharepoint and more.

If you’re looking for a great conference to attend soon, think about registering for Dev Connections and I hope to see you in Las Vegas.

Full-Text Search – Thesaurus

I would hope that most of us have used a thesaurus at some point in our careers. These allow us to substitute words for one another, providing for richer and more interesting communication.

Full-text search in SQL Server includes a thesaurus that you can customize for your searches. As with the thesaurus some of us use when writing, this features allows the search engine to substitute one word for another in searches.

You actually have to customize it. Here’s the default thesaurus for SQL Server 2008, which is stored in this location:


If you look in this folder, you see a lot of XML files. These are the thesaurus files and they are named as tsxxx.xml, where xxx is the three letter language code. For English, the thesaurus is tseng.xml.


If I open up the English file, you can see there’s not much there in terms of entries.


This looks like the file that came with SQL Server 2005, if not SQL Server 2000. Note also that everything is commented out, and you need to remove these comment lines if you want to edit this file.

The configuration isn’t that complex, but let’s look at a simple example. I’ll set up a small table and create a full text index on it.

( id INT
, Notes varchar(8000)
INSERT ftstemp SELECT 1, 'The quick brown fox jumped over the lazy dog'
INSERT ftstemp SELECT 2, 'I run WinXP.'

I’m going to edit my thesaurus file to include these entries:





And also





I am only using expansion sets here. There are also replacement sets, but that’s for another post. In this case, when I search for any of the terms above, elements matching any other terms will be returned.

NOTE: The US English file is tsenu.xml. The UK English file is tseng.xml.

I’ll now create a full text index on this table, on the Notes column.

I can issue this search, which I expect to work:

, notes
 FROM ftstemp
 WHERE CONTAINS(notes, 'quick')

That returns the data I expect.

Now to check the expansion set. To do that, I’ll need to use a FREETEXT query. Once I do this, I get results from both of my entries.


Note that if you edit the thesaurus, in order for your changes to show up in queries, you need to reload the Thesaurus file with this:

EXEC sys.sp_fulltext_load_thesaurus_file 1033;

The 1033 is for English. This is the LCID, which varies for each language.

If you want to broaden your searches, include acronyms, etc, then the thesaurus is a good way to do this. Beware, however, that your entries will apply to all searches on the instance, so if you have disparate applications on the same instance, you might encounter some strange results.

Full-Text Search – Word Breakers and Stemmers

There are numerous components to the Full-Text Search (FTS) subsystem in SQL Server that help provide efficient, relative answers to queries. Full-text Search is a little complex, and as I’ve been working with the system in an effort to learn more about it, I decided to document how a few things work.

Word Breakers and stemmers are two interesting parts of the FTS system. They deal with certain language specific operations that help searches work better. They are related as the two items are loaded together if you use third party word breakers and stemmers.

I haven’t seen third party word breakers, but some work from other products. As an example, here’s a post to load the Greek FTS search word breaker and stemmer from Sharepoint server if you are on SQL Server 2008. It’s included in SQL Server 2012.

Word Breakers

Let’s start with word breakers, which do just what the term implies: they break words. It would seem to be obvious that spaces are the word boundaries, and they are in English, but not necessarily in all languages. There are also the issues of characters in Asian languages like Japanese and Chinese. You can’t count spaces as the word boundaries on those languages.

Word breakers use the lexical rules of the language to determine word boundaries. Essentially they find what the words are, and then further action can be taken in building the FTS index or processing the query.

The “words” that the word breaker spits out are seen a “tokens” to the FTS index, and each can then be processed by stemmers, stoplists, thesaurus, etc.


Stemmers are an interesting part of the full-text search system. They remind me of my high school Latin classes, where we had to conjugate words. A stemmer takes a word and generates inflectional forms, or conjugations. The example in Books Online, and an easy one to understand is “run”. There are various forms of "run” that we would want to consider as equivalent when performing a search. For example, you would want to consider:

  • ran
  • running
  • runs
  • runner (perhaps)

The same could be said for “lay”. That would generate

  • lie
  • laying
  • lain
  • lays

This is one of the big advantages over the LIKE predicate in that stemmers can match these forms of the word being searched for. The index would relate all of these to the core, base word.


The Books Online page for Word Breakers and Stemmers has technical information on checking what’s installed, some troubleshooting, language settings, and some drier documentation on what you can do with word breakers, but not a lot of explanatory detail.

I used a little of the information in Books Online, and some from Pro Full-Text Search in SQL Server 2008. You can read more, but unfortunately I haven’t found a lot more documentation on the details of how things work.

You would probably learn more if you write your own Word Breaker and Stemmer, and there is a sample in the Windows SDK to get you started, but that’s beyond what I want to do.

Full-Text Search – Stoplists in SQL Server

Full-text search is an interesting subsystem in SQL Server. It allows you to implement searches through a variety of text formats stored in SQL Server. This is a one of a series of posts that looks at different facets of full-text search.

What is a Stoplist?

A stoplist is a list of stopwords that SQL Server should not include in a full-text index.  These are words that are seen as not adding any value to the full-text index. We typically see these words as important in language for structure, but not for content. Examples of stopwords are:

  • the
  • a
  • an
  • is
  • are

In previous versions of SQL Server, these were also known as noise words and a noise word list. You can read about the topic in Books Online.

How are they used?

Stoplists are used when building the index. The words that are contained in the text, and also in the stoplist are ignored and not populated inside the index. This makes for a smaller index, and it also means that the stopwords are not

The position of these words in the text being indexed, however, still do matter. This is to be sure that searches using proximity are still correctly carried out.

Each index can have a specific stoplist associated with it. You can associate a stoplist at index creation time, or alter the index later to add or change the stoplist.

Creating a Stoplist

For each language supported in the full-text system, there is a stoplist installed with SQL Server. These are the commonly used words that should be ignored for each language.

You can, however, create your own stoplist of word with the CREATE FULLTEXT STOPLIST command. The creation can be for a new stoplist, or you can copy an existing stoplist.This includes system stoplists, which you can use as a basis for your custom stoplist. The commands are simple, and they are well documented in BOL.

To add or remove words from a stoplist, the ALTER FULLTEXT STOPLIST command is used with the ADD or DROP parameters. Alterations to a stoplist must be for a specific language, which is specified with the LCID or name of the language.

Practical Points

The stoplists are important for limiting the size of the stoplist and making a more efficient index. Full-text indexes are very efficient and scalable in SQL Server, but the less data that needs to be indexed and searched, the most efficient the system will operate.

System stoplists works well for many natural language searches, but are not necessarily adequate for domain specific searches. For example, if I were indexing all white papers on SQL Server, I might want to ignore extremely common words or phrases that are in all documents. For example, I might consider “SQL” to be so common as to be useless in searches. Rather than bloat the size of the index with this word, I may add this to a stoplist for the full-text index and assume it’s a word like “the”, which I would not use for searches of these documents.

I haven’t necessarily found a reason to use custom stoplists in the past, but if my full-text index were extremely large or I had a large volume of searches, I might consider using stoplists to prune down my indexes.

If you have used these in your system, I’d be interested in knowing the reasons and effects.

Building a Full Text Index

I hadn’t used full-text indexing in production throughout my career. We hadn’t had the need in the applications I worked on, all of them depending on LIKE searches in specific, normalized data.

However I have always been interested in it and as I try to find data more often in various systems, I’ve been playing with it on the SQLServerCentral systems. One of the first things I had to do on a copy of the system was build an index. It’s surprisingly easy. I’ll build a basic index and explain a few options.

If you don’t know what a full-text index and full-text search (FTS) is, here’s a short introduction from BOL.

Creating an Index

Let’s take a basic table. In this case, let’s look at the AdventureWorks 2008 database. There’s a table called ProductDescription in there without an FTS index. Let’s add one there. First we right click the table and select the full-text index item.


This starts a wizard that we can use to pick the full text index. The first step is basing this on a unique index. The wizard is smart enough to only show those valid indexes for you to choose. I tend to choose the PK in most cases. That’s what I’ll do here. If your situation calls for something different, be sure you understand why.


The next step for is to choose from the available columns that are valid for full text indexes. In this case we only have one, so I’ll pick it.


Once I do that, I then can examine the other options. The middle item is for Word Breakers. These are the rules by which we decide where word boundaries are.


These rules can vary by language, and as you see above, there are multiple language choices. We’ll stick with English.

The last column is for the “type” of data stored in the column we are indexing. This is for the use of iFilters for our data.


In this case there is no other column that makes sense for a type. If you examine the full text index on the Production.Document table, there is a “FileExtension” column that is specified here.


That column is named appropriately, but the name has nothing to do with the use here. If the column were named “Type” or “Extention” or even “Blue” it would still work as long as it contained the correct file type extension for the data stored in the full-text indexed column.

Next we have the change tracking. You have three choices: automatic, manual, or do not track. The default is automatic, which I’ve often used.


If you have a large index, with large changes, you might choose to manually update the index, but you then need to do that. If you don’t want an automatic population to start after the wizard, then you need to choose “Do not track changes” as noted at the bottom.

The catalog is like a filegroup for the full text index. There can be multiple indexes in one catalog, but an index is only in one catalog. Note that you can select the actual filegroup in which this will be stored at the bottom.


I haven’t had enough experience here to give guidance. I’ve just had one catalog for the systems I’ve worked on.

The stoplist is the list of words ignored in the index. This was called the “noise word list” in SQL 2005 and prior. You can create your own stoplist, or use the system one.


Here I only have one, though I could not use one and allow all words in the index, which tends to bloat the size.

Next we have the standard scheduling mechanism in SQL Server. Here it’s applied to the index population, or what you might see as the rebuild.


No guidance here other than choose what works. If population is an issue, pick a low time to schedule things.

The last part is the summary.


My one complaint here is the fact that there is no “Script” button for this. That’s a royal PIA and it’s why I’m showing the GUI here. This is the easiest way to build the index. Once it’s done, you can go into SSMS and get the properties.There’s a script button there.


However if you press this, you get this


No scripting for the index, which is a hole. I’ve submitted an item on Connect, which you can vote for.

Searching Binary Data in SQL Server

More and more data collected in organizations is in an encoded format, essentially a binary classification of data. These can be images, audio files, video, or even common formats like Word and Excel files. This data contains lots of important data, but the formatting must be stripped out in order for users to effectively search this data.

This presentation starts with a discussion of the three types of data in SQL Server to set the framework. It demos and explains:

  • structured data
  • semi-structured data
  • unstructured data

The talk then looks at how unstructured data is stored in SQL Server, specifically briefly looking at Filestream and Filetable.

There is a short discussion of full text search, with a look at the changes in SQL Server 2012 before moving on to the iFilter interfaces which are used to search the binary data while ignoring the encoding. There are demos of the basics of CONTAINS and FREETEXT searches, along with some of the more advanced options, like customizable NEAR and weighting of search terms.

The talk finishes with a short look at the new semantic search feature in SQL Server 2012.

Level: 200

Length: 60 minutes

Downloads: PPTX, code

Presentation Schedule:

June 1, 2013 – SQL Saturday #200 – Philadelphia

May 3, 2013 – SQL Bits XI

April 7, 2013 – SQL Saturday #197 – Omaha

October 30, 2012 – SQL Connections, Fall 2012

Full Text Search – CONTAINS

I’ve been working on a new presentation for full text search and brushing up on some of my T-SQL operators. Part of my talk goes into the CONTAINS operator, which is one of the full text search keywords you need to know.

This operator is only used with full text indexes, so if you have a column that isn’t full-text indexed, it returns an error. If I issue this:

 FROM dbo.salary
  WHERE CONTAINS(empname, 'Steve')

I get this:

Msg 7601, Level 16, State 2, Line 3

Cannot use a CONTAINS or FREETEXT predicate on table or indexed view ‘dbo.salary’ because it is not full-text indexed.

I have a table that is full text indexed and I can issue a basic query, which looks like so many other T-SQL queries.

 FROM authordrafts
 WHERE CONTAINS(*, 'AlwaysOn')

This returns me all the rows where the columns in the full-text index (I used the star, *), have the term “AlwaysOn” in them. In this case, I’m hitting a FileTable table with lots of whitepapers in there.


This query is essentially a LIKE search, but it isn’t doing character matching. Instead it is working with those keywords in the full text index. I’ve used a simple search above. I could replace the * with the column, in this case the file_stream column.

 FROM authordrafts
 WHERE CONTAINS(file_stream, 'AlwaysOn')

I could also use a prefix term and the * wildcard, similar to LIKE.

 FROM authordrafts
 WHERE CONTAINS(file_stream, 'Always*')

These match other rows where “always” is the document, which matches “always” as a standalone word as well as “alwayson” as a term.

I could also limit the search to particular columns, using parenthesis and commas to separate them out. The BOL example from the CONTAINS page does a nice job of showing this.

Use AdventureWorks2012;
SELECT Name, Color
 FROM Production.Product
 WHERE CONTAINS((Name, Color), 'Red');

This is just a very basic look at CONTAINS. In another post, I’ll look at a few more possibilities with this term.