Skip to main content

Python Program for Soundex Algorithm

This is a python implementation for Soundex Algorithm.

This Program builds a JSON document as a dictionary and is kept on building at every execution. Its constantly appended and referenced while the program  is executed.

Program:

from re import sub
def remove_symbols(input_string):
    #Convert the characters to lower case and then use
    #Regular expressions to remove non a-z chars
    return sub('[^A-Z]+', '', input_string)

def clean(input_string):
    #Convert the characters to lower case and then use
    #Regular expressions to remove non a-z chars
    return sub('[^a-z]+', '', input_string.lower())

word = "Input"

def soundex(word):
    #Step 1: Capitalize all letters in the word and drop all punctuation marks.
    word = remove_symbols(word.upper())
    
    #Step 2: Retain the first letter of the word. 
    first_letter = word[0]
    word = word[1:]
    
    #Step 3 & 4: Change ( 'A', E', 'I', 'O', 'U', 'H', 'W', 'Y') to 0
    #And ('B','F','P','V') => 1
    #('C','G','J','K','Q','S','X','Z') => 2
    #('D,'T') => 3 , ('L') =. 4 , ('M','N') => 5 and ('R') => 6  
    pre = ['[AEIOUWHY]','[BFPV]','[CGJKQSXZ]','[DT]','[L]','[MN]','[R]']
    post= ['0','1','2','3','4','5','6']
    for find , replace in zip(pre, post):
        word = sub(find, replace, word)
    
    #Step 5: Remove all pairs of digits which occur beside each other from the string that resulted after Step 4.
    new_word = ""
    maxpos = len(word) - 1
    for i in range(maxpos+1):
        if i< maxpos and word[i] != word[i+1]:
            new_word += word[i]
        elif i == maxpos and word[i] != word[i-1]:
            new_word += word[i]
            
    #Step 6: Remove all zeros from the string that results from step 5.0 (placed there in step 3)
    #(Retaining the first character as well) 
    word = first_letter + sub('0','', new_word)
    
    #Step 7:  Pad the string that reVeekramsulted from step (6) with trailing zeros and return only the first four positions,
    #which will be of the form <uppercase letter> <digit> <digit> <digit>
    length = len(word)
    if length >= 4:
        word = word[:4]
    else:
        word = word + ("0" * (4 - length))
        
    #print input, word
    return word

import json
fp = open("D:\\Vikram Projects\\Eclipse Workspace\\Soundex Algorithm\\repository.txt")
dic  = json.load(fp)
#dic = dic[0]
fp.close()

from os import listdir, path 
dict = {}
files = []
for fle in listdir("data"):
    #Ignore the ~ and . i.e. hidden / system files
    f = open(path.join("data", fle)) 
    if fle.startswith('~') or fle.startswith('.'): continue
    dict[fle] = list()
    for line in f.readlines():
        dict[fle] += map(clean, line.split())
    files.append(fle)     

word = raw_input("Enter a word: ").lower()
code = soundex(word)

if dic.has_key(code):
    print word+" has  following similar words: "
    print dic[code]
else:
    print word + " has no phonetically similar words."
    dic[code] = list()

try:       
    _ = dic[code].index(word)
except(ValueError):
    dic[code].append(word)

print "And it's present in following files: "
for fle in files:
    for word in dic[code]:
        try:
            if dict[fle].index(str(word)) != -1:
                print word +" is found in --> " + fle
                continue
        except (ValueError):
            pass
#insert further into dictionary

    #dic[code].add()
    
fp = open("repository.txt",'w')
json.dump(dic, fp)
 
------------------------------------------------------------------------------

Repository.txt
{
"C416": ["calpurnia", "calpoornia","calpornia"],
"V265": ["vikrant","vikramjeet", "veekram", "vikram"], "A123": ["abheejit"], "V220": ["vishakha"], "M622": ["markus", "markoos","merkus"],
"J310": ["jaydeep","jaydip"],
"V240": ["vishal"], 
"V625": ["virkam"],
"V230": ["viksto"],
"B632": ["brutus", "brutoos"]
}


Input Files:
 
F1.txt 
 
Brutus killed calpurnia, brutoos the evil brother of calpornia,
avenged her death. vikram gets angry when called veekram. 
 
F2.txt

Markus is the step brother of brutus. Brutus and markus are each others good friends. calpoornia, is also a friend.
Vikram is a friend of Markus.
 
 
F3.txt

brutus and markus were classmates but they changed roads after college. They do not have any similar interests now. brutoos is  a butcher and merkus weaves. 

veekram and markus were also 

classmates but they rarely spoke.
 
 
Ouptut
Enter a word: merkus
merkus has  following similar words:  

[u'markus', u'markoos']
And it's present in following files:
markus is found in --> f2.txt
markus is found in --> f3.txt
merkus is found in --> f3.txt
      ---------------------------------------------------------------------
Enter a word: vikram
vikram has  following similar words:
[u'vikrant', u'vikramjeet', u'veekram', u'vikram']
And it's present in following files:
veekram is found in --> f1.txt
vikram is found in --> f1.txt
vikram is found in --> f2.txt
veekram is found in --> f3.txt
---------------------------------------------------------------------
Enter a word: santiago
santiago has no phonetically similar words.
And it's present in following files:

Comments

Popular posts from this blog

Selenium + Python + UnexpectedAlertPresentException: Dealing with annoying alerts

Handling  UnexpectedAlertPresentException   Alerts who hates them? I Do!  Who doesn't hate an annoying alert causing your tests / scraping job to fail? I must say they are pretty much on point on the Unexpected part!  Fortunately, there are easy ways to mitigate the issue. 1. Disable alerts completely: driver . execute_script( 'window.alert = function(){};' ); execute this script just before where you anticipate the alert and you're golden. 2. You want to see the alert text but not disturb the execution flow. driver . execute_script( 'window.alert = console.info;' ); Now the alerts have been redirected to the console and you don't have to worry about them. (Unless you have to - then you'd have to monitor the console) 3. You know exactly when it comes and want to accept the alert and move on. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 from selenium import webdriver from selenium.webdriver.s

How to convert a Helium Wallet Address to Solana Wallet address?

Helium went to Solana Blockchain, on April 18, 2023. Helium addresses are not available on the Solana blockchain as addresses on the Solana blockchain are base-58 encoded. Here is a quick snippet on how to translate an existing Helium wallet address to a Solana wallet address using Python. You will need the base58 module for this, get it here: pip install base58 Code Chunk: def convert_hnt_wallet_addr_to_sol ( helium_wallet_address ) : return base58. b58encode ( base58. b58decode ( hnt_wallet_address ) [ 2 :- 4 ] ) . decode ( ) Using this convert your Helium wallet address to a Solana address! You can further explore our blog for interesting reads  OR   - you can contact us to learn a bit more over a FREE personal Skype coaching session. Just click on "Leave a message" and reach out to us. We get a lot of volume these days so FREE Sessions won't be here for a long time, Grab this opportunity while you can!