Multiple Word Search not Working Correctly (Python) -


i working on project requires me able search multiple keywords in file. example, if had file 100 occurrences of word "tomato", 500 word "bread", , 20 "pickle", want able search file "tomato" , "bread" , number of times occurs in file. able find people same issue/question, other languages on site.

i working program allows me search column name , tally how many times shows in column, want make bit more precise. here code:

def start():     location = raw_input("what folder containing data processed located? ")     #location = "c:/code/samples/dates/2015-06-07/large-scale data parsing/data files"     if os.path.exists(location) == true: #tests see if user entered valid path         file_extension = raw_input("what file type (.txt example)? ")         search_for(location,file_extension)     else:         print "i'm sorry, file location have entered not exist. please try again."         start()  def search_for(location,file_extension):     querylist = []     n = 5     while n == 5:         search_query = raw_input("what search in each file? use'done' indicate have finished request. ")         #list = ["cd90-n5722-15c", "cd90-nb810-4c", "cp90-n2475-8", "cd90-vn530-22b"]         if search_query == "done":             print "your queries are:",querylist             print ""             content = os.listdir(location)             run(content,file_extension,location,querylist)             n = 0         else:             querylist.append(search_query)             continue   def run(content,file_extension,location,querylist):     item in content:         if item.endswith(file_extension):             search(location,item,querylist)     quit()  def search(location,item,querylist):     open(os.path.join(location,item), 'r') f:         countlist = []         search in querylist: #any search value after first 1 incorrectly reporting "0"             countsearch = 0             line in f:                 if search in line:                     countsearch = countsearch + 1             countlist.append(search)             countlist.append(countsearch) #mechanism update countsearch not working value after first         print item, countlist  start() 

if use code, last part (def search) not working correctly. time put search in, search after first 1 enter in returns "0", despite there being 500,000 occurrences of search word in file.

i wondering, since have index 5 files 1,000,000 lines each, if there way write either additional function or count how many times "lettuce" occurs on files.

i cannot post files here due size , content. appreciated.

edit

i have piece of code here. if use this, correct count of each, better have user able enter many searches want:

def check_start():     #location = raw_input("what folder containing data processed located? ")     location = "c:/code/samples/dates/2015-06-07/large-scale data parsing/data files"     content = os.listdir(location)     item in content:         if item.endswith("processed"):              countcol1 = 0              countcol2 = 0              countcol3 = 0              countcol4 = 0              #print os.path.join(currentdir,item)              open(os.path.join(location,item), 'r') f:                   line in f:                       if "cd90-n5722-15c" in line:                           countcol1 = countcol1 + 1                       if "cd90-nb810-4c" in line:                           countcol2 = countcol2 + 1                       if "cp90-n2475-8" in line:                           countcol3 = countcol3 + 1                       if "cd90-vn530-22b" in line:                           countcol4 = countcol4 + 1              print item, "cd90-n5722-15c", countcol1, "cd90-nb810-4c", countcol2, "cp90-n2475-8", countcol3, "cd90-vn530-22b", countcol4 

you trying iterate on file more once. after first time, file pointer @ end subsequent searches fail because there's nothing left read.

if add line:

f.seek(0), reset pointer before every read:

def search(location,item,querylist):     open(os.path.join(location,item), 'r') f:         countlist = []         search in querylist: #any search value after first 1 incorrectly reporting "0"             countsearch = 0             line in f:                 if search in line:                     countsearch = countsearch + 1             countlist.append(search)             countlist.append(countsearch) #mechanism update countsearch not working value after first             f.seek(0)     print item, countlist 

ps. i've guessed @ indentation... really shouldn't use tabs.


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -