I dropped support for Soundex [1] in the project I'm working on [2]. In going over the diagnostic output when importing the data, I found that Soundex had over 6,000 collisions, while Metaphone [3] had less than a 1,000, and shorter collion chains (i.e. most Metaphone collisions have only two possibilities). It just wasn't worth the disk space to use Soundex at that point.
Then it was on to work doing a mock up on the web [4]. The logic is pretty much:
>
```
if city exists in latlong.database
then
fetch data from latlong.database using city
print data
exit
end
tag = metaphone(city)
if tag exists in metaphone.database
then
fetch cities from metaphone.database using tag
if count(cities) is 1
then
fetch data from latlong.database using cities
print data
else
print "select one from the list:"
for each city in cities
print city
end
end
end
exit
```
The mockup is quite plain in appearance, but that can be easily changed as most of the output is template based anyway. And it only works for the United States.
Next up, code in time zone and [DELETED-Day Light Savings-DELETED] Daylight Saving Time information for each city.
[1] http://www.archives.gov/research_room/genealogy/census/soundex.html
[3] http://www.nist.gov/dads/HTML/metaphone.html