Skip to main content

Python Program for Soundex Algorithm

This is a python implementation for Soundex Algorithm.

This Program builds a JSON document as a dictionary and is kept on building at every execution. Its constantly appended and referenced while the program  is executed.

Program:

from re import sub
def remove_symbols(input_string):
    #Convert the characters to lower case and then use
    #Regular expressions to remove non a-z chars
    return sub('[^A-Z]+', '', input_string)

def clean(input_string):
    #Convert the characters to lower case and then use
    #Regular expressions to remove non a-z chars
    return sub('[^a-z]+', '', input_string.lower())

word = "Input"

def soundex(word):
    #Step 1: Capitalize all letters in the word and drop all punctuation marks.
    word = remove_symbols(word.upper())
    
    #Step 2: Retain the first letter of the word. 
    first_letter = word[0]
    word = word[1:]
    
    #Step 3 & 4: Change ( 'A', E', 'I', 'O', 'U', 'H', 'W', 'Y') to 0
    #And ('B','F','P','V') => 1
    #('C','G','J','K','Q','S','X','Z') => 2
    #('D,'T') => 3 , ('L') =. 4 , ('M','N') => 5 and ('R') => 6  
    pre = ['[AEIOUWHY]','[BFPV]','[CGJKQSXZ]','[DT]','[L]','[MN]','[R]']
    post= ['0','1','2','3','4','5','6']
    for find , replace in zip(pre, post):
        word = sub(find, replace, word)
    
    #Step 5: Remove all pairs of digits which occur beside each other from the string that resulted after Step 4.
    new_word = ""
    maxpos = len(word) - 1
    for i in range(maxpos+1):
        if i< maxpos and word[i] != word[i+1]:
            new_word += word[i]
        elif i == maxpos and word[i] != word[i-1]:
            new_word += word[i]
            
    #Step 6: Remove all zeros from the string that results from step 5.0 (placed there in step 3)
    #(Retaining the first character as well) 
    word = first_letter + sub('0','', new_word)
    
    #Step 7:  Pad the string that reVeekramsulted from step (6) with trailing zeros and return only the first four positions,
    #which will be of the form <uppercase letter> <digit> <digit> <digit>
    length = len(word)
    if length >= 4:
        word = word[:4]
    else:
        word = word + ("0" * (4 - length))
        
    #print input, word
    return word

import json
fp = open("D:\\Vikram Projects\\Eclipse Workspace\\Soundex Algorithm\\repository.txt")
dic  = json.load(fp)
#dic = dic[0]
fp.close()

from os import listdir, path 
dict = {}
files = []
for fle in listdir("data"):
    #Ignore the ~ and . i.e. hidden / system files
    f = open(path.join("data", fle)) 
    if fle.startswith('~') or fle.startswith('.'): continue
    dict[fle] = list()
    for line in f.readlines():
        dict[fle] += map(clean, line.split())
    files.append(fle)     

word = raw_input("Enter a word: ").lower()
code = soundex(word)

if dic.has_key(code):
    print word+" has  following similar words: "
    print dic[code]
else:
    print word + " has no phonetically similar words."
    dic[code] = list()

try:       
    _ = dic[code].index(word)
except(ValueError):
    dic[code].append(word)

print "And it's present in following files: "
for fle in files:
    for word in dic[code]:
        try:
            if dict[fle].index(str(word)) != -1:
                print word +" is found in --> " + fle
                continue
        except (ValueError):
            pass
#insert further into dictionary

    #dic[code].add()
    
fp = open("repository.txt",'w')
json.dump(dic, fp)
 
------------------------------------------------------------------------------

Repository.txt
{
"C416": ["calpurnia", "calpoornia","calpornia"],
"V265": ["vikrant","vikramjeet", "veekram", "vikram"], "A123": ["abheejit"], "V220": ["vishakha"], "M622": ["markus", "markoos","merkus"],
"J310": ["jaydeep","jaydip"],
"V240": ["vishal"], 
"V625": ["virkam"],
"V230": ["viksto"],
"B632": ["brutus", "brutoos"]
}


Input Files:
 
F1.txt 
 
Brutus killed calpurnia, brutoos the evil brother of calpornia,
avenged her death. vikram gets angry when called veekram. 
 
F2.txt

Markus is the step brother of brutus. Brutus and markus are each others good friends. calpoornia, is also a friend.
Vikram is a friend of Markus.
 
 
F3.txt

brutus and markus were classmates but they changed roads after college. They do not have any similar interests now. brutoos is  a butcher and merkus weaves. 

veekram and markus were also 

classmates but they rarely spoke.
 
 
Ouptut
Enter a word: merkus
merkus has  following similar words:  

[u'markus', u'markoos']
And it's present in following files:
markus is found in --> f2.txt
markus is found in --> f3.txt
merkus is found in --> f3.txt
      ---------------------------------------------------------------------
Enter a word: vikram
vikram has  following similar words:
[u'vikrant', u'vikramjeet', u'veekram', u'vikram']
And it's present in following files:
veekram is found in --> f1.txt
vikram is found in --> f1.txt
vikram is found in --> f2.txt
veekram is found in --> f3.txt
---------------------------------------------------------------------
Enter a word: santiago
santiago has no phonetically similar words.
And it's present in following files:

Comments

Popular posts from this blog

4. Lex and Yacc Program to detect errors in a 'C' Language Program

Lex and Yacc Program to detect errors in a 'C' Language Program   Lex Code : %{ #include"y.tab.h" #include<stdio.h> int LineNo = 1 ; %} identifier [ a - zA - Z ][ _a - zA - Z0 - 9 ]* number [ 0 - 9 ]+|([ 0 - 9 ]*\.[ 0 - 9 ]+) %% main \(\) return MAIN ; if return IF ; else return ELSE ; while return WHILE ; int | char | flaot return TYPE ; { identifier } return VAR ; { number } return NUM ; \> | \< | \<= | \>= | == return RELOP ; [\ t ] ; [\ n ] LineNo ++; . return yytext [ 0 ]; %% Yacc Code : %{ #include<string.h> #include<stdio.h> extern int LineNo ; int errno = 0 ; %} % token NUM VAR RELOP % token MAIN IF ELSE WHILE TYPE % left '-' '+' % left '*' '/' %% PROGRAM : MAIN BLOCK ; BLOCK : '{' CODE '}' ; CODE : BLOCK | STATEMENT CODE | STATEMENT ; STATEMENT : DECST ';' | DECST { printf ( "\nLine number %d...

Selenium + Python + UnexpectedAlertPresentException: Dealing with annoying alerts

Handling  UnexpectedAlertPresentException   Alerts who hates them? I Do!  Who doesn't hate an annoying alert causing your tests / scraping job to fail? I must say they are pretty much on point on the Unexpected part!  Fortunately, there are easy ways to mitigate the issue. 1. Disable alerts completely: driver . execute_script( 'window.alert = function(){};' ); execute this script just before where you anticipate the alert and you're golden. 2. You want to see the alert text but not disturb the execution flow. driver . execute_script( 'window.alert = console.info;' ); Now the alerts have been redirected to the console and you don't have to worry about them. (Unless you have to - then you'd have to monitor the console) 3. You know exactly when it comes and want to accept the alert and move on. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 from selenium import webdriver from selenium.webdriver.s...

2. Lex program that detects statement type i.e. Simple or Compound

Lex program that detects statement type i.e. Simple or Compound Note: Only AND | OR | BUT conjunctions are supported. Program: % option noyywrap %{ char test = 's' ; %} %% ( "" [ aA ][ nN ][ dD ] "" )|( "" [ oO ][ rR ] "" )|( "" [ bB ][ uU ][ tT ] "" ) { test = 'c' ;} . {;} \ n return 0 ; %% main () { yylex (); if ( test == 's' ) printf ( "\n Its a simple sentence" ); else if ( test == 'c' ) printf ( "\n This is compound sentence" ); }