Environment:
Linux CENTOS 6.5
gcc++ 4.8.2
maven 3.5.4
java 1.8.0_181
spark cluster: 1.6.1
scala 2.10
Hello,Somebody helps! Actually,our team need to use xgboost in a lower version Spark cluster, then I believe v0.60 can be chose. I just follow what the article said in a similar issue(#1364)! HOWEVER,I get this error again and again, which means those methods cannot help me!
The one by one step as follows:
(1) git clone https://github.com/dmlc/xgboost
(2) git submodule init
(3) git submodule update
(4) git checkout v0.60
(5) cp make/config.mk .
(6) make -j4
(7) cd jvm-package
(8) mvn package
Then,I got a JAR package which can be used in my project. I have got this error when I submitted my project JAR package to spark cluster.
ERROR AS FOLLOWS:
18/10/08 15:33:04 INFO yarn.Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host:
ApplicationMaster RPC port: 0
queue: default
start time: 1538983980069
final status: UNDEFINED
tracking URL: http://iZbp16azw0apkefoxy0zi8Z:20888/proxy/application_1511057284989_2902/
user: suser
18/10/08 15:33:04 INFO cluster.YarnClientSchedulerBackend: Application application_1511057284989_2902 has started running.
18/10/08 15:33:04 INFO util.Utils: Successfully started service ‘org.apache.spark.network.netty.NettyBlockTransferService’ on port 43081.
18/10/08 15:33:04 INFO netty.NettyBlockTransferService: Server created on 43081
18/10/08 15:33:04 INFO storage.BlockManagerMaster: Trying to register BlockManager
18/10/08 15:33:04 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.28.29.236:43081 with 511.1 MB RAM, BlockManagerId(driver, 10.28.29.236, 43081)
18/10/08 15:33:04 INFO storage.BlockManagerMaster: Registered BlockManager
18/10/08 15:33:04 INFO scheduler.EventLoggingListener: Logging events to hdfs://emr-header-1:9000/spark-history/application_1511057284989_2902
18/10/08 15:33:09 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (emr-work-2:53361) with ID 2
18/10/08 15:33:09 INFO storage.BlockManagerMasterEndpoint: Registering block manager emr-work-2:36872 with 511.1 MB RAM, BlockManagerId(2, emr-work-2, 36872)
18/10/08 15:33:09 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (emr-worker-5.cluster-35705:59296) with ID 1
18/10/08 15:33:09 INFO storage.BlockManagerMasterEndpoint: Registering block manager emr-worker-5.cluster-35705:60861 with 511.1 MB RAM, BlockManagerId(1, emr-worker-5.cluster-35705, 60861)
18/10/08 15:33:09 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
18/10/08 15:33:10 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 291.4 KB, free 291.4 KB)
18/10/08 15:33:10 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 24.0 KB, free 315.4 KB)
18/10/08 15:33:10 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.28.29.236:43081 (size: 24.0 KB, free: 511.1 MB)
18/10/08 15:33:10 INFO spark.SparkContext: Created broadcast 0 from textFile at MLUtils.scala:71
18/10/08 15:33:10 INFO mapred.FileInputFormat: Total input paths to process : 1
18/10/08 15:33:10 INFO spark.SparkContext: Starting job: reduce at MLUtils.scala:105
18/10/08 15:33:10 INFO scheduler.DAGScheduler: Got job 0 (reduce at MLUtils.scala:105) with 2 output partitions
18/10/08 15:33:10 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at MLUtils.scala:105)
18/10/08 15:33:10 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/10/08 15:33:10 INFO scheduler.DAGScheduler: Missing parents: List()
18/10/08 15:33:10 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[5] at map at MLUtils.scala:103), which has no missing parents
18/10/08 15:33:10 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.9 KB, free 319.2 KB)
18/10/08 15:33:10 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.1 KB, free 321.3 KB)
18/10/08 15:33:10 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.28.29.236:43081 (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:10 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006
18/10/08 15:33:10 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[5] at map at MLUtils.scala:103)
18/10/08 15:33:10 INFO cluster.YarnScheduler: Adding task set 0.0 with 2 tasks
18/10/08 15:33:10 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, emr-worker-5.cluster-35705, partition 0,PROCESS_LOCAL, 2251 bytes)
18/10/08 15:33:10 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, emr-work-2, partition 1,PROCESS_LOCAL, 2251 bytes)
18/10/08 15:33:15 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:15 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on emr-work-2:36872 (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:15 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 24.0 KB, free: 511.1 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on emr-work-2:36872 (size: 24.0 KB, free: 511.1 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added rdd_4_0 in memory on emr-worker-5.cluster-35705:60861 (size: 83.6 KB, free: 511.0 MB)
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 6001 ms on emr-worker-5.cluster-35705 (1/2)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added rdd_4_1 in memory on emr-work-2:36872 (size: 83.1 KB, free: 511.0 MB)
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 6078 ms on emr-work-2 (2/2)
18/10/08 15:33:16 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at MLUtils.scala:105) finished in 6.111 s
18/10/08 15:33:16 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Job 0 finished: reduce at MLUtils.scala:105, took 6.232215 s
18/10/08 15:33:16 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 291.4 KB, free 612.7 KB)
18/10/08 15:33:16 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 24.0 KB, free 636.8 KB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.28.29.236:43081 (size: 24.0 KB, free: 511.1 MB)
18/10/08 15:33:16 INFO spark.SparkContext: Created broadcast 2 from textFile at MLUtils.scala:71
18/10/08 15:33:16 INFO mapred.FileInputFormat: Total input paths to process : 1
18/10/08 15:33:16 INFO spark.SparkContext: Starting job: reduce at MLUtils.scala:105
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Got job 1 (reduce at MLUtils.scala:105) with 2 output partitions
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (reduce at MLUtils.scala:105)
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Missing parents: List()
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[12] at map at MLUtils.scala:103), which has no missing parents
18/10/08 15:33:16 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 3.9 KB, free 640.6 KB)
18/10/08 15:33:16 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 2.1 KB, free 642.7 KB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.28.29.236:43081 (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:16 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 1 (MapPartitionsRDD[12] at map at MLUtils.scala:103)
18/10/08 15:33:16 INFO cluster.YarnScheduler: Adding task set 1.0 with 2 tasks
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, emr-worker-5.cluster-35705, partition 0,PROCESS_LOCAL, 2251 bytes)
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 3, emr-work-2, partition 1,PROCESS_LOCAL, 2251 bytes)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on emr-work-2:36872 (size: 2.1 KB, free: 511.0 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 2.1 KB, free: 511.0 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on emr-work-2:36872 (size: 24.0 KB, free: 511.0 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 24.0 KB, free: 511.0 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added rdd_11_0 in memory on emr-worker-5.cluster-35705:60861 (size: 83.6 KB, free: 510.9 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added rdd_11_1 in memory on emr-work-2:36872 (size: 83.1 KB, free: 510.9 MB)
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 2) in 113 ms on emr-worker-5.cluster-35705 (1/2)
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 3) in 114 ms on emr-work-2 (2/2)
18/10/08 15:33:16 INFO scheduler.DAGScheduler: ResultStage 1 (reduce at MLUtils.scala:105) finished in 0.115 s
18/10/08 15:33:16 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Job 1 finished: reduce at MLUtils.scala:105, took 0.138700 s
18/10/08 15:33:16 INFO spark.SparkContext: Starting job: collect at xgboost_try.scala:19
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Got job 2 (collect at xgboost_try.scala:19) with 2 output partitions
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (collect at xgboost_try.scala:19)
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Parents of final stage: List()
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Missing parents: List()
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (MapPartitionsRDD[13] at map at MLUtils.scala:108), which has no missing parents
18/10/08 15:33:16 INFO storage.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 3.9 KB, free 646.6 KB)
18/10/08 15:33:16 INFO storage.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 2.1 KB, free 648.8 KB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on 10.28.29.236:43081 (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:16 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1006
18/10/08 15:33:16 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (MapPartitionsRDD[13] at map at MLUtils.scala:108)
18/10/08 15:33:16 INFO cluster.YarnScheduler: Adding task set 2.0 with 2 tasks
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 4, emr-worker-5.cluster-35705, partition 0,PROCESS_LOCAL, 2251 bytes)
18/10/08 15:33:16 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 5, emr-work-2, partition 1,PROCESS_LOCAL, 2251 bytes)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on emr-work-2:36872 (size: 2.1 KB, free: 510.9 MB)
18/10/08 15:33:16 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 2.1 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on 10.28.29.236:43081 in memory (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on emr-work-2:36872 in memory (size: 2.1 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on emr-worker-5.cluster-35705:60861 in memory (size: 2.1 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 5) in 225 ms on emr-work-2 (1/2)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 4) in 227 ms on emr-worker-5.cluster-35705 (2/2)
18/10/08 15:33:17 INFO scheduler.DAGScheduler: ResultStage 2 (collect at xgboost_try.scala:19) finished in 0.228 s
18/10/08 15:33:17 INFO cluster.YarnScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Job 2 finished: collect at xgboost_try.scala:19, took 0.244199 s
18/10/08 15:33:17 INFO spark.ContextCleaner: Cleaned accumulator 2
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on 10.28.29.236:43081 in memory (size: 2.1 KB, free: 511.1 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on emr-work-2:36872 in memory (size: 2.1 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on emr-worker-5.cluster-35705:60861 in memory (size: 2.1 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO spark.ContextCleaner: Cleaned accumulator 1
Tracker started, with env={DMLC_NUM_SERVER=0, DMLC_TRACKER_URI=10.28.29.236, DMLC_TRACKER_PORT=9091, DMLC_NUM_WORKER=10}
18/10/08 15:33:17 INFO XGBoostSpark: repartitioning training set to 10 partitions
18/10/08 15:33:17 INFO java.RabitTracker$TrackerProcessLogger: 2018-10-08 15:33:17,156 INFO start listen on 10.28.29.236:9091
18/10/08 15:33:17 INFO spark.SparkContext: Starting job: foreachPartition at XGBoost.scala:125
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Registering RDD 14 (repartition at XGBoost.scala:48)
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Got job 3 (foreachPartition at XGBoost.scala:125) with 10 output partitions
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (foreachPartition at XGBoost.scala:125)
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 3)
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 3)
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 3 (MapPartitionsRDD[14] at repartition at XGBoost.scala:48), which has no missing parents
18/10/08 15:33:17 INFO storage.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 4.8 KB, free 641.7 KB)
18/10/08 15:33:17 INFO storage.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2.6 KB, free 644.3 KB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on 10.28.29.236:43081 (size: 2.6 KB, free: 511.1 MB)
18/10/08 15:33:17 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1006
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 3 (MapPartitionsRDD[14] at repartition at XGBoost.scala:48)
18/10/08 15:33:17 INFO cluster.YarnScheduler: Adding task set 3.0 with 2 tasks
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 6, emr-worker-5.cluster-35705, partition 0,PROCESS_LOCAL, 2240 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 3.0 (TID 7, emr-work-2, partition 1,PROCESS_LOCAL, 2240 bytes)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on emr-work-2:36872 (size: 2.6 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 2.6 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 3.0 (TID 7) in 97 ms on emr-work-2 (1/2)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 6) in 102 ms on emr-worker-5.cluster-35705 (2/2)
18/10/08 15:33:17 INFO cluster.YarnScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
18/10/08 15:33:17 INFO scheduler.DAGScheduler: ShuffleMapStage 3 (repartition at XGBoost.scala:48) finished in 0.103 s
18/10/08 15:33:17 INFO scheduler.DAGScheduler: looking for newly runnable stages
18/10/08 15:33:17 INFO scheduler.DAGScheduler: running: Set()
18/10/08 15:33:17 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 4)
18/10/08 15:33:17 INFO scheduler.DAGScheduler: failed: Set()
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[18] at mapPartitions at XGBoost.scala:60), which has no missing parents
18/10/08 15:33:17 INFO storage.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 7.6 KB, free 651.9 KB)
18/10/08 15:33:17 INFO storage.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.5 KB, free 655.4 KB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on 10.28.29.236:43081 (size: 3.5 KB, free: 511.1 MB)
18/10/08 15:33:17 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1006
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Submitting 10 missing tasks from ResultStage 4 (MapPartitionsRDD[18] at mapPartitions at XGBoost.scala:60)
18/10/08 15:33:17 INFO cluster.YarnScheduler: Adding task set 4.0 with 10 tasks
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 8, emr-worker-5.cluster-35705, partition 0,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 4.0 (TID 9, emr-work-2, partition 1,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on emr-work-2:36872 (size: 3.5 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on emr-worker-5.cluster-35705:60861 (size: 3.5 KB, free: 510.9 MB)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 2.0 in stage 4.0 (TID 10, emr-work-2, partition 2,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 WARN scheduler.TaskSetManager: Lost task 1.0 in stage 4.0 (TID 9, emr-work-2): java.lang.UnsatisfiedLinkError: ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I
at ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit(Native Method)
at ml.dmlc.xgboost4j.java.Rabit.init(Rabit.java:42)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:63)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:61)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 1.1 in stage 4.0 (TID 11, emr-worker-5.cluster-35705, partition 1,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 0.0 in stage 4.0 (TID 8) on executor emr-worker-5.cluster-35705: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 1]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 4.0 (TID 12, emr-work-2, partition 0,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 2.0 in stage 4.0 (TID 10) on executor emr-work-2: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 2]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 2.1 in stage 4.0 (TID 13, emr-work-2, partition 2,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 0.1 in stage 4.0 (TID 12) on executor emr-work-2: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 3]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 4.0 (TID 14, emr-worker-5.cluster-35705, partition 0,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 1.1 in stage 4.0 (TID 11) on executor emr-worker-5.cluster-35705: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 4]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 1.2 in stage 4.0 (TID 15, emr-work-2, partition 1,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 2.1 in stage 4.0 (TID 13) on executor emr-work-2: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 5]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 2.2 in stage 4.0 (TID 16, emr-worker-5.cluster-35705, partition 2,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 0.2 in stage 4.0 (TID 14) on executor emr-worker-5.cluster-35705: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 6]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 4.0 (TID 17, emr-work-2, partition 0,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 1.2 in stage 4.0 (TID 15) on executor emr-work-2: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 7]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 1.3 in stage 4.0 (TID 18, emr-worker-5.cluster-35705, partition 1,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 2.2 in stage 4.0 (TID 16) on executor emr-worker-5.cluster-35705: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 8]
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Starting task 2.3 in stage 4.0 (TID 19, emr-work-2, partition 2,NODE_LOCAL, 2251 bytes)
18/10/08 15:33:17 INFO scheduler.TaskSetManager: Lost task 0.3 in stage 4.0 (TID 17) on executor emr-work-2: java.lang.UnsatisfiedLinkError (ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I) [duplicate 9]
18/10/08 15:33:17 ERROR scheduler.TaskSetManager: Task 0 in stage 4.0 failed 4 times; aborting job
18/10/08 15:33:17 INFO cluster.YarnScheduler: Cancelling stage 4
18/10/08 15:33:17 INFO cluster.YarnScheduler: Stage 4 was cancelled
18/10/08 15:33:17 INFO scheduler.DAGScheduler: ResultStage 4 (foreachPartition at XGBoost.scala:125) failed in 0.156 s
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Job 3 failed: foreachPartition at XGBoost.scala:125, took 0.309975 s
Exception in thread “Thread-45” org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 4.0 failed 4 times, most recent failure: Lost task 0.3 in stage 4.0 (TID 17, emr-work-2): java.lang.UnsatisfiedLinkError: ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I
at ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit(Native Method)
at ml.dmlc.xgboost4j.java.Rabit.init(Rabit.java:42)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:63)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:61)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1431)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1419)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1418)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1418)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1640)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1599)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1588)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1832)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1845)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1858)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1929)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1.apply(RDD.scala:918)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreachPartition(RDD.scala:918)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anon$2.run(XGBoost.scala:125)
Caused by: java.lang.UnsatisfiedLinkError: ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit([Ljava/lang/String;)I
at ml.dmlc.xgboost4j.java.XGBoostJNI.RabitInit(Native Method)
at ml.dmlc.xgboost4j.java.Rabit.init(Rabit.java:42)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:63)
at ml.dmlc.xgboost4j.scala.spark.XGBoost$$anonfun$buildDistributedBoosters$1.apply(XGBoost.scala:61)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:69)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/10/08 15:33:17 WARN server.TransportChannelHandler: Exception in connection from emr-worker-5.cluster-35705/10.26.92.243:59296
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
18/10/08 15:33:17 WARN server.TransportChannelHandler: Exception in connection from emr-work-2/10.28.31.55:53361
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
at io.netty.buffer.PooledUnsafeDirectByteBuf.setBytes(PooledUnsafeDirectByteBuf.java:313)
at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:881)
at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:242)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:119)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
at java.lang.Thread.run(Thread.java:745)
18/10/08 15:33:17 INFO cluster.YarnClientSchedulerBackend: Disabling executor 2.
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 1)
18/10/08 15:33:17 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster.
18/10/08 15:33:17 INFO cluster.YarnClientSchedulerBackend: Disabling executor 1.
18/10/08 15:33:17 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, emr-work-2, 36872)
18/10/08 15:33:17 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor
18/10/08 15:33:17 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 2)
18/10/08 15:33:17 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster.
18/10/08 15:33:17 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, emr-worker-5.cluster-35705, 60861)
18/10/08 15:33:17 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor