regex - Lookup table with subset/grepl in R -


i'm analyzing set of urls , values extracted using crawler. while extract substrings url, i'd rather not bother regex so—is there simple way lookup table-style replacement using subset/grepl without resorting dplyr(do conditional mutate on vairables)?

my current process:

test <- data.frame(   url = c('google.com/testing/duck', 'google.com/evaluating/dog', 'google.com/analyzing/cat'),   content = c(1, 2, 3),   subdir = na )  test[grepl('testing', test$url), ]$subdir <- 'testing' test[grepl('evaluating', test$url), ]$subdir <- 'evaluating' test[grepl('analyzing', test$url), ]$subdir <- 'analyzing' 

obviously, little clumsy , doesn't scale well. dplyr, i'd able conditionals like:

test %<>% tbl_df() %>%    mutate(subdir = ifelse(     grepl('testing', subdir),      'test r',      ifelse(       grepl('evaluating', subdir),        'eval r',        ifelse(         grepl('analyzing', subdir),          'anal r',          na       )))) 

but, again, goofy , don't want incur package dependency if @ possible. there way regex-based subsetting sort of lookup table?

edit: few clarifications:

  1. for extracting subdirectories, yes, regex efficient; however, hoping more general pattern match dictionary-like struct of strings other, arbitrary values.
  2. of course, nested ifelse ugly , prone error—just wanted quick-and-dirty example dplyr up.

edit 2: thought i'd loop , post ended based upon bondeddust's approach. decided practice mapping , non-standard eval while @ it:

test <- data.frame(   url = c(     'google.com/testing/duck',     'google.com/testing/dog',     'google.com/testing/cat',     'google.com/evaluating/duck',      'google.com/evaluating/dog',      'google.com/evaluating/cat',      'google.com/analyzing/duck',     'google.com/analyzing/dog',     'google.com/analyzing/cat',     'banana'   ),   content = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),   subdir = na )  # list used key/value lookup, names can regex lookup <- c(   "testing" = "testing important",   "eval.*" = 'eval in r',   "analy(z|s)ing" = 'r fun' )  # dumb test error handling: # lookup <- c('test', 'hey')  # defining new lookup function regexlookup <- function(data, dict, searchcolumn, targetcolumn, ignore.case = true){   # basic check—need separate errors/handling   if(is.null(names(dict)) || is.null(dict[[1]])) {     stop("not valid replacement value; use key/value store `dict`.")   }    # non-standard eval column names; not sure if should   # add safetytype/checks these   searchcolumn <- eval(substitute(searchcolumn), data)   targetcolumn <- deparse(substitute(targetcolumn))    # define find-and-replace utility   findandreplace <- function (key, val){     data[grepl(key, searchcolumn, ignore.case = ignore.case), targetcolumn] <- val     data <<- data   }    # map on key/value store   mapply(findandreplace, names(dict), dict)    # return result, non-matching rows preserved   return(data) }  regexlookup(test, lookup, url, subdir, ignore.case = false) 

 (target in  c('testing','evaluating','analyzing') ) {                     test[grepl(target, test$url),'subdir' ] <- target }   test                         url content     subdir 1   google.com/testing/duck       1    testing 2 google.com/evaluating/dog       2 evaluating 3  google.com/analyzing/cat       3  analyzing 

the vector of targets have instead been name of vector in workspace.

targets <-   c('testing','evaluating','analyzing')  for( target in targets ) { ...} 

Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -