Skip to main content

Posts

Showing posts from 2016

Python Program for Soundex Algorithm

This is a python implementation for Soundex Algorithm. This Program builds a JSON document as a dictionary and is kept on building at every execution. Its constantly appended and referenced while the program  is executed. Program: from re import sub def remove_symbols (input_string): #Convert the characters to lower case and then use #Regular expressions to remove non a-z chars return sub( '[^A-Z]+' , '' , input_string) def clean (input_string): #Convert the characters to lower case and then use #Regular expressions to remove non a-z chars return sub( '[^a-z]+' , '' , input_string . lower()) word = "Input" def soundex (word): #Step 1: Capitalize all letters in the word and drop all punctuation marks. word = remove_symbols(word . upper()) #Step 2: Retain the first letter of the word. first_letter = word[ 0 ] word = word[ 1 :] #Step 3 & 4: Change ( 'A...

Python program to print inverted index.

This is a Python program to print inverted index for the files provided to the program.   One of the most important tasks performed by every Information Retrieval System.   Output and used files are also included below.   input_path = "input" from os import listdir, path word_list = [] all_files = [] i = 0 filenamelist = [] for f in listdir(input_path): #Ignore Hidden Files if f . startswith( '.' ) or f . startswith( '~' ): continue file_handle = open (path . join(input_path,f)) word_list = file_handle . read() . split() file_handle . close() #Populate Dictionary all_files . insert(i,{}) #Start the word positions with 1, so that first word gets recognized as first pos = 1 for word in word_list: if all_files[i] . has_key(word): all_files[i][word] . append(pos) else : #Create a list and insert the position all_files[i][word] = ...