The SOUNDEX coding algorithm
The SOUNDEX code is a substitution code using the following rules:
The first letter of the surname is always retained.
The rest of the surname is compressed to a three digit code using the following coding scheme:
A E I O U Y H W not coded B F P V coded as 1 C G J K Q S X Z coded as 2 D T coded as 3 L coded as 4 M N coded as 5 R coded as 6 Consonants after the initial letter are coded in the order they occur:
HOLMES = H-452
ADOMOMI = A-355
The code always uses initial letter plus three digits. Further consonants in long names are ignored:
VONDERLEHR = V-536
Zeros are used to pad out shorter names:
BALL = B-400
SHAW = S-000
Double consonants are treated as one letter:
BALL = B-400
As are adjacent consonants from the same code group:
JACKSON = J-250
A consonant following an initial letter from the same code group is ignored:
SCANLON = S-545
Abbreviated prefixes should be spelt out in full:
ST JOHN = SAINTJOHN = S-532
Apostrophes and hyphens are ignored:
KING-SMITH = KINGSMITH = K-525
Consonants from the same code group separated by W or H are treated as one:
BOOTH-DAVIS = BOOTHDAVIS = B-312
Comments
Post a Comment