python - Pandas: cannot filter based on string equality -


using pandas 0.16.2 on python 2.7, osx.

i read data-frame csv file this:

import pandas pd  data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0)) 

the output of data.dtypes is:

name       object weight     float64 ethnicity  object dtype: object 

i expecting string types name, , ethnicity. found reasons here on over why they're "object" in newer pandas versions.

now, want select rows based on ethnicity, example:

data[data['ethnicity']=='asian'] out[3]:  empty dataframe columns: [name, weight, ethnicity] index: [] 

i same result data[data.ethnicity=='asian'] or data[data['ethnicity']=="asian"].

but when try following:

data[data['ethnicity'].str.contains('asian')].head(3) 

i results want.

however, not want use "contains"- check direct equality.

please note data[data['ethnicity'].str=='asian'] raises error.

am doing wrong? how correctly?

there whitespace in strings, example,

data = pd.dataframe({'ethnicity':[' asian', '  asian']}) data.loc[data['ethnicity'].str.contains('asian'), 'ethnicity'].tolist() # [' asian', '  asian'] print(data[data['ethnicity'].str.contains('asian')]) 

yields

  ethnicity 0     asian 1     asian 

to strip leading or trailing whitespace off strings, use

data['ethnicity'] = data['ethnicity'].str.strip() 

after which,

data.loc[data['ethnicity'] == 'asian'] 

yields

  ethnicity 0     asian 1     asian 

Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -