python - How to deal with strings where encoding is unclear -


i know there quite lot on web , on stackoverflow python , character encoding, haven't found answer i'm looking for. @ risk of creating duplicate, i'm going ask anyway.

it's script gets dictionary, keys unicode. values strings unknown encoding. keys wouldn't matter much, keys simple unlike values. values can (and do) contain large variety of encodings. there dictionaries, values in ascii others utf-16be yet others cp1250.

that totally messes further processing, consists printing or concatenating (yes, simple).

the work-around came with, makes python print statements work is:

for key in data.keys():    # hope did not chose funky encoding    try:        print key+":"+data[key] # triggers unicodedecodeerror on many encodings        current_data = data[key]    except unicodedecodeerror:    # trying cope funky encoding                      current_data = data[key].decode(chardet.detect(data[key])['encoding']) # doing on each value, because dictionary contains multiple encodings         print key+":", # printing without newline workaround, because connecting didn't work         print current_data.encode('utf-8') 

in python works fine. in jython 2.7rc1 use in project (not option switch), prints characters not original encoding (funky looking characters). if has idea how can make work in jython that'd great!

edit (example): sample-value:

our latest scenarios explore 2 possible versions of future seen through fresh “lenses”.  

creates string right , left double quotes turn \x8d , \x8e. don't know encoding is. in python after using above code strips them. in jython turns them white squares.

i'm not familiar jython, following link found may prove useful: http://python.6.x6.nabble.com/character-encoding-issues-td1766833.html

it says should keep unicode strings in separate files source, , read them codecs.open. seemed work person experiencing problem similar yours.

the following link mentions specifying encoding parameter jvm: https://answers.launchpad.net/sikuli/+question/156443

without seeing actual error output, extent of can provide.


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -