java - SparkException: local class incompatible -
i'm trying submit spark job client cloudera cluster. in cluster using cdh-5.3.2 , spark version 1.2.0 , hadoop version 2.5.0. test our cluster submitting wordcount sample taken spark web site. can submit our spark job written in java. however, cannot write our result file on hdfs. got following error,
20/06/25 09:38:16 info dagscheduler: job 0 failed: saveastextfile @ simplewordcount.java:36, took 5.450531 s exception in thread "main" org.apache.spark.sparkexception: job aborted due stage failure: task 1 in stage 1.0 failed 4 times, recent failure: lost task 1.3 in stage 1.0 (tid 8, obelix2): java.io.invalidclassexception: org.apache.spark.rdd.pairrddfunctions; local class incompatible: stream classdesc serialversionuid = 8789839749593513237, local class serialversionuid = -4145741279224749316 @ java.io.objectstreamclass.initnonproxy(objectstreamclass.java:617) @ java.io.objectinputstream.readnonproxydesc(objectinputstream.java:1622) @ java.io.objectinputstream.readclassdesc(objectinputstream.java:1517) @ java.io.objectinputstream.readordinaryobject(objectinputstream.java:1771) @ java.io.objectinputstream.readobject0(objectinputstream.java:1350) @ java.io.objectinputstream.defaultreadfields(objectinputstream.java:1990) @ java.io.objectinputstream.readserialdata(objectinputstream.java:1915) @ java.io.objectinputstream.readordinaryobject(objectinputstream.java:1798) @ java.io.objectinputstream.readobject0(objectinputstream.java:1350) @ java.io.objectinputstream.defaultreadfields(objectinputstream.java:1990) @ java.io.objectinputstream.readserialdata(objectinputstream.java:1915) @ java.io.objectinputstream.readordinaryobject(objectinputstream.java:1798) @ java.io.objectinputstream.readobject0(objectinputstream.java:1350) @ java.io.objectinputstream.readobject(objectinputstream.java:370) @ org.apache.spark.serializer.javadeserializationstream.readobject(javaserializer.scala:62) @ org.apache.spark.serializer.javaserializerinstance.deserialize(javaserializer.scala:87) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:57) @ org.apache.spark.scheduler.task.run(task.scala:56) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:196) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1145) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:615) @ java.lang.thread.run(thread.java:745) driver stacktrace: @ org.apache.spark.scheduler.dagscheduler.org$apache$spark$scheduler$dagscheduler$$failjobandindependentstages(dagscheduler.scala:1214) @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1203) @ org.apache.spark.scheduler.dagscheduler$$anonfun$abortstage$1.apply(dagscheduler.scala:1202) @ scala.collection.mutable.resizablearray$class.foreach(resizablearray.scala:59) @ scala.collection.mutable.arraybuffer.foreach(arraybuffer.scala:47) @ org.apache.spark.scheduler.dagscheduler.abortstage(dagscheduler.scala:1202) @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:696) @ org.apache.spark.scheduler.dagscheduler$$anonfun$handletasksetfailed$1.apply(dagscheduler.scala:696) @ scala.option.foreach(option.scala:236) @ org.apache.spark.scheduler.dagscheduler.handletasksetfailed(dagscheduler.scala:696) @ org.apache.spark.scheduler.dagschedulereventprocessactor$$anonfun$receive$2.applyorelse(dagscheduler.scala:1420) @ akka.actor.actor$class.aroundreceive(actor.scala:465) @ org.apache.spark.scheduler.dagschedulereventprocessactor.aroundreceive(dagscheduler.scala:1375) @ akka.actor.actorcell.receivemessage(actorcell.scala:516) @ akka.actor.actorcell.invoke(actorcell.scala:487) @ akka.dispatch.mailbox.processmailbox(mailbox.scala:238) @ akka.dispatch.mailbox.run(mailbox.scala:220) @ akka.dispatch.forkjoinexecutorconfigurator$akkaforkjointask.exec(abstractdispatcher.scala:393) @ scala.concurrent.forkjoin.forkjointask.doexec(forkjointask.java:260) @ scala.concurrent.forkjoin.forkjoinpool$workqueue.runtask(forkjoinpool.java:1339) @ scala.concurrent.forkjoin.forkjoinpool.runworker(forkjoinpool.java:1979) @ scala.concurrent.forkjoin.forkjoinworkerthread.run(forkjoinworkerthread.java:107)
here our code sample
import java.util.arrays; import org.apache.spark.sparkconf; import org.apache.spark.api.java.javapairrdd; import org.apache.spark.api.java.javardd; import org.apache.spark.api.java.javasparkcontext; import org.apache.spark.api.java.function.flatmapfunction; import org.apache.spark.api.java.function.function2; import org.apache.spark.api.java.function.pairfunction; import scala.tuple2; public class simplewordcount { public static void main(string[] args) { sparkconf conf = new sparkconf().setappname("simple application"); javasparkcontext spark = new javasparkcontext(conf); javardd<string> textfile = spark.textfile("hdfs://obelix1:8022/user/u079681/deneme/example.txt"); javardd<string> words = textfile .flatmap(new flatmapfunction<string, string>() { public iterable<string> call(string s) { return arrays.aslist(s.split(" ")); } }); javapairrdd<string, integer> pairs = words .maptopair(new pairfunction<string, string, integer>() { public tuple2<string, integer> call(string s) { return new tuple2<string, integer>(s, 1); } }); javapairrdd<string, integer> counts = pairs .reducebykey(new function2<integer, integer, integer>() { public integer call(integer a, integer b) { return + b; } }); // system.out.println(counts.collect()); counts.saveastextfile("hdfs://obelix1:8022/user/u079681/deneme/result"); } }
and maven dependecies are
<dependency> <groupid>org.scala-lang</groupid> <artifactid>scala-library</artifactid> <version>2.10.5</version> </dependency> <dependency> <groupid>org.apache.spark</groupid> <artifactid>spark-core_2.10</artifactid> <version>1.2.0-cdh5.3.2</version> </dependency> <dependency> <groupid>org.apache.hadoop</groupid> <artifactid>hadoop-client</artifactid> <version>2.5.0-mr1-cdh5.3.2</version> </dependency>
i have absolutely no idea error comes from, since understanding application's spark version , cloudera's spark version same. idea more welcome.
note: can see result when write console.
after spending hours, have solved problem. our problem's root cause downloaded apache-spark official site , builded it. jars not competible cloudera distributions. have learned today, spark cloudera distribution available in github(https://github.com/cloudera/spark/tree/cdh5-1.2.0_5.3.2) , after building have saved job result hdfs.
Comments
Post a Comment