xml - Why XmlInputFormat is not provided by hadoop? -
i working hadoop map-reduce. have process data .xml
file, parse , store output database.
while working on when need pass xml mapper, found xmlinputformat.class
not provided hadoop default , have use mahout's xmlinputformat it.
i wonder when xml being use vastly, why hadoop haven't provided xmlinputformat
rather explicitly creating custom xmlinputformat bye extending textinputformat
it?
well though xml vastly used, providing framework special features towards technology, might not idea. may endorsement. @ high level, mapreduce designed accept different formats. infact these days json being used vastly due size features compared xml. had similar issue.
but user decide input of map reduce , can use, different parsers(jackson or gson json , jaxb xml) if in single line or above using recordreader implementation
Comments
Post a Comment