python - Pandas: cannot filter based on string equality -
using pandas 0.16.2 on python 2.7, osx.
i read data-frame csv file this:
import pandas pd  data = pd.read_csv("my_csv_file.csv",sep='\t', skiprows=(0), header=(0)) the output of data.dtypes is:
name       object weight     float64 ethnicity  object dtype: object i expecting string types name, , ethnicity. found reasons here on over why they're "object" in newer pandas versions.
now, want select rows based on ethnicity, example:
data[data['ethnicity']=='asian'] out[3]:  empty dataframe columns: [name, weight, ethnicity] index: [] i same result data[data.ethnicity=='asian'] or     data[data['ethnicity']=="asian"].
but when try following:
data[data['ethnicity'].str.contains('asian')].head(3) i results want.
however, not want use "contains"- check direct equality.
please note data[data['ethnicity'].str=='asian'] raises error.
am doing wrong? how correctly?
there whitespace in strings, example,
data = pd.dataframe({'ethnicity':[' asian', '  asian']}) data.loc[data['ethnicity'].str.contains('asian'), 'ethnicity'].tolist() # [' asian', '  asian'] print(data[data['ethnicity'].str.contains('asian')]) yields
  ethnicity 0     asian 1     asian to strip leading or trailing whitespace off strings, use
data['ethnicity'] = data['ethnicity'].str.strip() after which,
data.loc[data['ethnicity'] == 'asian'] yields
  ethnicity 0     asian 1     asian 
Comments
Post a Comment