python - Pandas: cannot filter based on string equality -
using pandas 0.16.2 on python 2.7, osx.
i read data-frame csv file this:
import pandas pd data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0))
the output of data.dtypes
is:
name object weight float64 ethnicity object dtype: object
i expecting string types name, , ethnicity. found reasons here on over why they're "object" in newer pandas versions.
now, want select rows based on ethnicity, example:
data[data['ethnicity']=='asian'] out[3]: empty dataframe columns: [name, weight, ethnicity] index: []
i same result data[data.ethnicity=='asian']
or data[data['ethnicity']=="asian"]
.
but when try following:
data[data['ethnicity'].str.contains('asian')].head(3)
i results want.
however, not want use "contains"- check direct equality.
please note data[data['ethnicity'].str=='asian']
raises error.
am doing wrong? how correctly?
there whitespace in strings, example,
data = pd.dataframe({'ethnicity':[' asian', ' asian']}) data.loc[data['ethnicity'].str.contains('asian'), 'ethnicity'].tolist() # [' asian', ' asian'] print(data[data['ethnicity'].str.contains('asian')])
yields
ethnicity 0 asian 1 asian
to strip leading or trailing whitespace off strings, use
data['ethnicity'] = data['ethnicity'].str.strip()
after which,
data.loc[data['ethnicity'] == 'asian']
yields
ethnicity 0 asian 1 asian
Comments
Post a Comment