Posts

Showing posts from June 19, 2011

Google to FTS Syntax Cheat Sheet

Image
OPERATOR EXAMPLE DESCRIPTION   nut Searches for inflectional forms of the word nut crank arm crank AND arm Searches for documents containing inflectional forms of the words crank AND arm crank and ann. The keyword AND is optional. tire OR air Searches for documents containing inflectional forms of the words tire or air, “reflector bracket” Performs a phrase search for the phrase "reflector bracket". hardware -bracket Searches for documents containing inflectional forms of the word hardware but not the word bracket. +clamp Searches for the word darn') without generating inflectional forms. ~seat Searches for thesaurus forms of the word seat Assemb* Searches for words that begin with the prefix assemb . <washer nut> Searches for documents that contain the words washer in close proximity to the word nut

Soundex Limitations

Image
Names that sound alike do not always have the same soundex code. For example, Lee (L000) and Leigh (L200) are pronounced identically, but have different soundex codes because the silent g in Leigh is given a code. Names that sound alike but start with a different first letter will always have a different soundex code. Thus, names such as Carr (C600) and Karr (K600) have different soundex codes even though they sound alike. Name that sound alike but have different first letters should have each name calculated and searched for separately. Since the soundex system is based on English pronunciationn, some European names may not soundex correctly. For example, some French surnames with silent last letters will not code according to pronunciation. An example is the French name such as Beaux - where the x is silent. While Beau (B000) is pronounced identically to Beaux (B200), they will have different soundex codes. This could be true of any surname that does no

Sql Index

Image
Indexes help us retrieve data from tables quicker. Let's use an example to illustrate this point: Say we are interested in reading about how to grow peppers in a gardening book. Instead of reading the book from the beginning until we find a section on peppers, it is much quicker for us to go to the index section at the end of the book, locate which pages contain information on peppers, and then go to these pages directly. Going to the index first saves us time and is by far a more efficient method for locating the information we need. The same principle applies for retrieving data from a database table. Without an index, the database system reads through the entire table (this process is called a 'table scan') to locate the desired information. With the proper index in place, the database system can then first go through the index to find out where to retrieve the data, and then go to these locations directly to get the needed data. This is much faster.

The SOUNDEX coding algorithm

Image
The SOUNDEX code is a substitution code using the following rules: The first letter of the surname is always retained. The rest of the surname is compressed to a three digit code using the following coding scheme: A E I O U Y H W not coded B F P V coded as 1 C G J K Q S X Z coded as 2 D T coded as 3 L coded as 4 M N coded as 5 R coded as 6 Consonants after the initial letter are coded in the order they occur: HOLMES = H-452 ADOMOMI = A-355 The code always uses initial letter plus three digits. Further consonants in long names are ignored: VONDERLEHR = V-536 Zeros are used to pad out shorter names: BALL = B-400 SHAW = S-000 Double consonants are treated as one letter: BALL = B-400 As are adjacent consonants from the same code group: JACKSON = J-250 A consonant following an initial letter from the same code group is ignored: SCANLON = S-545 Abbreviated prefixes should be spelt out in full: ST JOHN = SAINTJOHN = S-532 Apostrophes and hyphens are