Multiple Word Search not Working Correctly (Python) -
i working on project requires me able search multiple keywords in file. example, if had file 100 occurrences of word "tomato", 500 word "bread", , 20 "pickle", want able search file "tomato" , "bread" , number of times occurs in file. able find people same issue/question, other languages on site.
i working program allows me search column name , tally how many times shows in column, want make bit more precise. here code:
def start(): location = raw_input("what folder containing data processed located? ") #location = "c:/code/samples/dates/2015-06-07/large-scale data parsing/data files" if os.path.exists(location) == true: #tests see if user entered valid path file_extension = raw_input("what file type (.txt example)? ") search_for(location,file_extension) else: print "i'm sorry, file location have entered not exist. please try again." start() def search_for(location,file_extension): querylist = [] n = 5 while n == 5: search_query = raw_input("what search in each file? use'done' indicate have finished request. ") #list = ["cd90-n5722-15c", "cd90-nb810-4c", "cp90-n2475-8", "cd90-vn530-22b"] if search_query == "done": print "your queries are:",querylist print "" content = os.listdir(location) run(content,file_extension,location,querylist) n = 0 else: querylist.append(search_query) continue def run(content,file_extension,location,querylist): item in content: if item.endswith(file_extension): search(location,item,querylist) quit() def search(location,item,querylist): open(os.path.join(location,item), 'r') f: countlist = [] search in querylist: #any search value after first 1 incorrectly reporting "0" countsearch = 0 line in f: if search in line: countsearch = countsearch + 1 countlist.append(search) countlist.append(countsearch) #mechanism update countsearch not working value after first print item, countlist start()
if use code, last part (def search) not working correctly. time put search in, search after first 1 enter in returns "0", despite there being 500,000 occurrences of search word in file.
i wondering, since have index 5 files 1,000,000 lines each, if there way write either additional function or count how many times "lettuce" occurs on files.
i cannot post files here due size , content. appreciated.
edit
i have piece of code here. if use this, correct count of each, better have user able enter many searches want:
def check_start(): #location = raw_input("what folder containing data processed located? ") location = "c:/code/samples/dates/2015-06-07/large-scale data parsing/data files" content = os.listdir(location) item in content: if item.endswith("processed"): countcol1 = 0 countcol2 = 0 countcol3 = 0 countcol4 = 0 #print os.path.join(currentdir,item) open(os.path.join(location,item), 'r') f: line in f: if "cd90-n5722-15c" in line: countcol1 = countcol1 + 1 if "cd90-nb810-4c" in line: countcol2 = countcol2 + 1 if "cp90-n2475-8" in line: countcol3 = countcol3 + 1 if "cd90-vn530-22b" in line: countcol4 = countcol4 + 1 print item, "cd90-n5722-15c", countcol1, "cd90-nb810-4c", countcol2, "cp90-n2475-8", countcol3, "cd90-vn530-22b", countcol4
you trying iterate on file more once. after first time, file pointer @ end subsequent searches fail because there's nothing left read.
if add line:
f.seek(0)
, reset pointer before every read:
def search(location,item,querylist): open(os.path.join(location,item), 'r') f: countlist = [] search in querylist: #any search value after first 1 incorrectly reporting "0" countsearch = 0 line in f: if search in line: countsearch = countsearch + 1 countlist.append(search) countlist.append(countsearch) #mechanism update countsearch not working value after first f.seek(0) print item, countlist
ps. i've guessed @ indentation... really shouldn't use tabs.
Comments
Post a Comment