xml parsing - Identify and replace elements of XML using BeautifulSoup in Python -
i trying use beautifulsoup4 find , replace specific elements within xml. more specifically, want find instances of 'file_name'(in example below file name 'cyp26a1_atra_minus_tet_plus.txt') , replace full path document - saved in 'file_name_replacement_dir' variable. first task, bit i'm stuck on, isolate section of interest can replace using replacewith() method.
the xml
<parametergroup name="experiment_22"> <parameter name="data row oriented" type="bool" value="1"/> <parameter name="experiment type" type="unsignedinteger" value="0"/> <parameter name="file name" type="file" value="cyp26a1_atra_minus_tet_plus.txt"/> <parameter name="first row" type="unsignedinteger" value="1"/>
there 44 experiments 4 different file names (so 11 file name 1, 11 file name 2 , on). above snippet of xml repeated 44 times, different files stored in "file name" line.
my code far
xml_dir = 'd:\mphil\model_building\models\retinoic_acid\[06]\rar_models\model_line_2' xml_file_name = 'rara_rxr_m22.cps' xml=model_dir+'\\'+model_name file_name_replacement_dir = d:\mphil\model_building\models\retinoic_acid\[06]\rar_models soup = beautifulsoup(open(xml)) print soup.find_all('parametergroup name="experiment_22"')
this returns empty list. i've tried few other functions in place of 'soup.findall()' still haven't been able find handle filename. know how i'm trying do?
xml = '<parametergroup name="experiment_22">\ <parameter name="data row oriented" type="bool" value="1"/>\ <parameter name="experiment type" type="unsignedinteger" value="0"/>\ <parameter name="file name" type="file" value="cyp26a1_atra_minus_tet_plus.txt"/>\ <parameter name="first row" type="unsignedinteger" value="1"/>\ </parametergroup>' bs4 import beautifulsoup import os soup = beautifulsoup(xml) tag in soup.find_all("parameter", {'name': 'file name'}): tag['value'] = os.path.join('new_dir', tag['value']) print soup
- close xml 'parametergroup' tag.
- capitalisation of tags may not work beautifulsoup, try
parameter
in lower case. - use
os.path
manipulate paths works cross-platforms.
Comments
Post a Comment