pmml - Spark JPMML import Issue -
am trying import pmml model file, generated in r spark context , use predict scores. code used in spark.
javardd<string> scoredata = data.map(new function<string, string>() { @override public string call(string line) throws exception { string[] row = line.split(","); pmml pmml; evaluator evaluator; filesystem fs = filesystem.get(new configuration()); fsdatainputstream instr = fs.open(new path("path_to_pmml_file")); source transformedsource = importfilter.apply(new inputsource(instr)); pmml = jaxbutil.unmarshalpmml(transformedsource); system.out.println(pmml.getmodels().get(0).getmodelname()); modelevaluatorfactory modelevaluatorfactory = modelevaluatorfactory.newinstance(); modelevaluator<?> modelevaluator = modelevaluatorfactory.newmodelmanager(pmml); system.out.println(modelevaluator.getsummary()); evaluator = (evaluator) modelevaluator; list<fieldname> activefields = evaluator.getactivefields(); double[] features = new double[row.length - 2]; // row - {contact_id,label} stringbuilder strbld = new stringbuilder(); map<fieldname, fieldvalue> arguments = new linkedhashmap<fieldname, fieldvalue>(); strbld.append(row[0]); (int = 3; <= row.length - 1; i++) { //from f1 - f16 fieldvalue activevalue = evaluator.prepare(activefields.get(i - 3), double.parsedouble(row[i])); arguments.put(activefields.get(i - 3), activevalue); } }
the code worked fine when run in core java environment(without spark context), when running above code following exception
java.lang.nosuchmethoderror: com.google.common.collect.range.closed(ljava/lang/comparable;ljava/lang/comparable;)lcom/google/common/collect/range; @ org.jpmml.evaluator.classification$type.<clinit>(classification.java:278) @ org.jpmml.evaluator.probabilitydistribution.<init>(probabilitydistribution.java:26) @ org.jpmml.evaluator.generalregressionmodelevaluator.evaluateclassification(generalregressionmodelevaluator.java:333) @ org.jpmml.evaluator.generalregressionmodelevaluator.evaluate(generalregressionmodelevaluator.java:107) @ org.jpmml.evaluator.modelevaluator.evaluate(modelevaluator.java:266) @ org.zcoe.spark.pmml.pmmlspark_2$1.call(pmmlspark_2.java:146) @ org.zcoe.spark.pmml.pmmlspark_2$1.call(pmmlspark_2.java:1) @ org.apache.spark.api.java.javapairrdd$$anonfun$toscalafunction$1.apply(javapairrdd.scala:999) @ scala.collection.iterator$$anon$11.next(iterator.scala:328) @ scala.collection.iterator$class.foreach(iterator.scala:727) @ scala.collection.abstractiterator.foreach(iterator.scala:1157) @ scala.collection.generic.growable$class.$plus$plus$eq(growable.scala:48) @ scala.collection.mutable.arraybuffer.$plus$plus$eq(arraybuffer.scala:103) @ scala.collection.mutable.arraybuffer.$plus$plus$eq(arraybuffer.scala:47) @ scala.collection.traversableonce$class.to(traversableonce.scala:273) @ scala.collection.abstractiterator.to(iterator.scala:1157) @ scala.collection.traversableonce$class.tobuffer(traversableonce.scala:265) @ scala.collection.abstractiterator.tobuffer(iterator.scala:1157) @ scala.collection.traversableonce$class.toarray(traversableonce.scala:252) @ scala.collection.abstractiterator.toarray(iterator.scala:1157) @ org.apache.spark.rdd.rdd$$anonfun$17.apply(rdd.scala:813) @ org.apache.spark.rdd.rdd$$anonfun$17.apply(rdd.scala:813) @ org.apache.spark.sparkcontext$$anonfun$runjob$5.apply(sparkcontext.scala:1503) @ org.apache.spark.sparkcontext$$anonfun$runjob$5.apply(sparkcontext.scala:1503) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:61) @ org.apache.spark.scheduler.task.run(task.scala:64) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:203) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615) @ java.lang.thread.run(thread.java:745)
the issue seems compatibility of guvava jar file required run code. removed jars containing com.google.common.collect.range class spark's class path, still same issue persists.
the spark job details below,
spark-submit --jars ./lib/pmml-evaluator-1.2.0.jar,./lib/pmml-model-1.2.2.jar,./lib/pmml-manager-1.1.20.jar,./lib/pmml-schema-1.2.2.jar,./lib/guava-15.0.jar --class
[stage 0:> (0 + 2) / 2]15/06/26 14:39:15 error yarnscheduler: lost executor 1 on hslave2: remote akka client disassociated 15/06/26 14:39:15 error yarnscheduler: lost executor 2 on hslave1: remote akka client disassociated [stage 0:> (0 + 2) / 2]15/06/26 14:39:33 error yarnscheduler: lost executor 4 on hslave1: remote akka client disassociated 15/06/26 14:39:33 error tasksetmanager: task 0 in stage 0.0 failed 4 times; aborting job
exception in thread "main" org.apache.spark.sparkexception: job aborted due stage failure: task 0 in stage 0.0 failed 4 times, recent failure: lost task 0.3 in stage 0.0 (tid 6, hslave1): executorlostfailure (executor 4 lost) driver stacktrace: @ org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$$failjobandindependentstages(dagscheduler.scala:1203) @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1192) @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1191) @ scala.collection.mutable.resizablearray$class.foreach(resizablearray.scala:59) @ scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:47) @ org.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:1191) @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:693) @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:693) @ scala.option.foreach(option.scala:236) @ org.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:693) @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1393) @ org.apache.spark.scheduler.dagschedulereventprocessloop.onreceive(dagscheduler.scala:1354) @ org.apache.spark.util.eventloop$$anon$1.run(eventloop.scala:48)
please let me know if mistakes have done.
you should let both spark , jpmml have own version of guava libraries. not idea modify spark base installation when can achieve goal re-working packaging of spark application.
if move spark application apache maven, possible use relocation feature of maven shade plugin move jpmml's version of guava library package such org.jpmml.com.google
. example application of jpmml-cascading project trick.
also, upside of moving apache maven spark application available uber-jar file, simplifies deployment. example, @ moment specifying pmml-manager-1.1.20.jar
on command line, not needed.
Comments
Post a Comment