pmml - Spark JPMML import Issue -


am trying import pmml model file, generated in r spark context , use predict scores. code used in spark.

javardd<string> scoredata = data.map(new function<string, string>() {      @override     public string call(string line) throws exception {         string[] row = line.split(",");         pmml pmml;         evaluator evaluator;         filesystem fs = filesystem.get(new configuration());         fsdatainputstream instr = fs.open(new path("path_to_pmml_file"));         source transformedsource = importfilter.apply(new inputsource(instr));         pmml = jaxbutil.unmarshalpmml(transformedsource);         system.out.println(pmml.getmodels().get(0).getmodelname());         modelevaluatorfactory modelevaluatorfactory = modelevaluatorfactory.newinstance();         modelevaluator<?> modelevaluator = modelevaluatorfactory.newmodelmanager(pmml);         system.out.println(modelevaluator.getsummary());         evaluator = (evaluator) modelevaluator;          list<fieldname> activefields = evaluator.getactivefields();         double[] features = new double[row.length - 2]; // row - {contact_id,label}           stringbuilder strbld = new stringbuilder();         map<fieldname, fieldvalue> arguments = new linkedhashmap<fieldname, fieldvalue>();         strbld.append(row[0]);         (int = 3; <= row.length - 1; i++) {             //from f1 - f16             fieldvalue activevalue = evaluator.prepare(activefields.get(i - 3), double.parsedouble(row[i]));             arguments.put(activefields.get(i - 3), activevalue);         }     } 

the code worked fine when run in core java environment(without spark context), when running above code following exception

java.lang.nosuchmethoderror: com.google.common.collect.range.closed(ljava/lang/comparable;ljava/lang/comparable;)lcom/google/common/collect/range; @ org.jpmml.evaluator.classification$type.<clinit>(classification.java:278) @ org.jpmml.evaluator.probabilitydistribution.<init>(probabilitydistribution.java:26) @ org.jpmml.evaluator.generalregressionmodelevaluator.evaluateclassification(generalregressionmodelevaluator.java:333) @ org.jpmml.evaluator.generalregressionmodelevaluator.evaluate(generalregressionmodelevaluator.java:107) @ org.jpmml.evaluator.modelevaluator.evaluate(modelevaluator.java:266) @ org.zcoe.spark.pmml.pmmlspark_2$1.call(pmmlspark_2.java:146) @ org.zcoe.spark.pmml.pmmlspark_2$1.call(pmmlspark_2.java:1) @ org.apache.spark.api.java.javapairrdd$$anonfun$toscalafunction$1.apply(javapairrdd.scala:999) @ scala.collection.iterator$$anon$11.next(iterator.scala:328) @ scala.collection.iterator$class.foreach(iterator.scala:727) @ scala.collection.abstractiterator.foreach(iterator.scala:1157) @ scala.collection.generic.growable$class.$plus$plus$eq(growable.scala:48) @ scala.collection.mutable.arraybuffer.$plus$plus$eq(arraybuffer.scala:103) @ scala.collection.mutable.arraybuffer.$plus$plus$eq(arraybuffer.scala:47) @ scala.collection.traversableonce$class.to(traversableonce.scala:273) @ scala.collection.abstractiterator.to(iterator.scala:1157) @ scala.collection.traversableonce$class.tobuffer(traversableonce.scala:265) @ scala.collection.abstractiterator.tobuffer(iterator.scala:1157) @ scala.collection.traversableonce$class.toarray(traversableonce.scala:252) @ scala.collection.abstractiterator.toarray(iterator.scala:1157) @ org.apache.spark.rdd.rdd$$anonfun$17.apply(rdd.scala:813) @ org.apache.spark.rdd.rdd$$anonfun$17.apply(rdd.scala:813) @ org.apache.spark.sparkcontext$$anonfun$runjob$5.apply(sparkcontext.scala:1503) @ org.apache.spark.sparkcontext$$anonfun$runjob$5.apply(sparkcontext.scala:1503) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:61) @ org.apache.spark.scheduler.task.run(task.scala:64) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:203) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615) @ java.lang.thread.run(thread.java:745) 

the issue seems compatibility of guvava jar file required run code. removed jars containing com.google.common.collect.range class spark's class path, still same issue persists.

the spark job details below,

spark-submit --jars ./lib/pmml-evaluator-1.2.0.jar,./lib/pmml-model-1.2.2.jar,./lib/pmml-manager-1.1.20.jar,./lib/pmml-schema-1.2.2.jar,./lib/guava-15.0.jar --class

[stage 0:> (0 + 2) / 2]15/06/26 14:39:15 error yarnscheduler: lost executor 1 on hslave2: remote akka client disassociated 15/06/26 14:39:15 error yarnscheduler: lost executor 2 on hslave1: remote akka client disassociated [stage 0:> (0 + 2) / 2]15/06/26 14:39:33 error yarnscheduler: lost executor 4 on hslave1: remote akka client disassociated 15/06/26 14:39:33 error tasksetmanager: task 0 in stage 0.0 failed 4 times; aborting job

exception in thread "main" org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 4 times, recent failure: lost task 0.3 in stage 0.0 (tid 6, hslave1): executorlostfailure (executor 4 lost) driver stacktrace:         @ org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$$failjobandindependentstages(dagscheduler.scala:1203)         @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1192)         @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1191)         @ scala.collection.mutable.resizablearray$class.foreach(resizablearray.scala:59)         @ scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:47)         @ org.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:1191)         @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:693)         @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:693)         @ scala.option.foreach(option.scala:236)         @ org.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:693)         @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1393)         @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1354)         @ org.apache.spark.util.eventloop$$anon$1.run(eventloop.scala:48) 

please let me know if mistakes have done.

you should let both spark , jpmml have own version of guava libraries. not idea modify spark base installation when can achieve goal re-working packaging of spark application.

if move spark application apache maven, possible use relocation feature of maven shade plugin move jpmml's version of guava library package such org.jpmml.com.google. example application of jpmml-cascading project trick.

also, upside of moving apache maven spark application available uber-jar file, simplifies deployment. example, @ moment specifying pmml-manager-1.1.20.jar on command line, not needed.


Comments

Popular posts from this blog

python - No exponential form of the z-axis in matplotlib-3D-plots -

php - Best Light server (Linux + Web server + Database) for Raspberry Pi -

c# - "Newtonsoft.Json.JsonSerializationException unable to find constructor to use for types" error when deserializing class -