Python: how to generate a unique identifier for an XML document? -
xslt has generate-id(xml-document) function. can create unique identifier xml document.
in python how generate unique identifier xml document?
note: unique identifier should based on content of xml document, not file name of xml document. example, xml document
<root> <comment>hello, world</comment> </root>
and xml document
<document> <test>blah, blah</test> </document>
must generate different identifiers, if file names identical.
i have graph of xml documents. need way recognize, "hey, i've seen xml document." don't want compare entire xml documents. instead, want compare uuids correspond xml.
a colleague sent me answer:
for mapping text id, we’ve used md5 1 hash digest. give md5() function xml document (string) , return 32-character identifier.
more details:
genid.py
import sys import stdio hashlib import md5 def digest_md5(obj): if type(obj) unicode: obj = obj.encode('utf8') return md5(obj).hexdigest() s = sys.stdin.readline() stdio.writeln(digest_md5(s))
then turned exe file.
then dos command prompt typed command:
type input.txt | genid
where input.txt is:
<document>hello, world</document>
and got output:
df6f8283335bf3f657a89733e3d36b84
beautiful!
Comments
Post a Comment