python - How to deal with strings where encoding is unclear -
i know there quite lot on web , on stackoverflow python , character encoding, haven't found answer i'm looking for. @ risk of creating duplicate, i'm going ask anyway.
it's script gets dictionary, keys unicode. values strings unknown encoding. keys wouldn't matter much, keys simple unlike values. values can (and do) contain large variety of encodings. there dictionaries, values in ascii others utf-16be yet others cp1250.
that totally messes further processing, consists printing or concatenating (yes, simple).
the work-around came with, makes python print statements work is:
for key in data.keys(): # hope did not chose funky encoding try: print key+":"+data[key] # triggers unicodedecodeerror on many encodings current_data = data[key] except unicodedecodeerror: # trying cope funky encoding current_data = data[key].decode(chardet.detect(data[key])['encoding']) # doing on each value, because dictionary contains multiple encodings print key+":", # printing without newline workaround, because connecting didn't work print current_data.encode('utf-8')
in python works fine. in jython 2.7rc1 use in project (not option switch), prints characters not original encoding (funky looking characters). if has idea how can make work in jython that'd great!
edit (example): sample-value:
our latest scenarios explore 2 possible versions of future seen through fresh “lenses”.
creates string right , left double quotes turn \x8d , \x8e. don't know encoding is. in python after using above code strips them. in jython turns them white squares.
i'm not familiar jython, following link found may prove useful: http://python.6.x6.nabble.com/character-encoding-issues-td1766833.html
it says should keep unicode strings in separate files source, , read them codecs.open. seemed work person experiencing problem similar yours.
the following link mentions specifying encoding parameter jvm: https://answers.launchpad.net/sikuli/+question/156443
without seeing actual error output, extent of can provide.
Comments
Post a Comment