[java-scala]Is XGBoost-Spark training thread-safe?


I have 3 small datasets with same column schema (totally < 1G), and wanted to train a XGBoost model for each (in Spark environment). Code snippet:

def myPipeline(): Pipeline = {
	val indexer = new StringIndexer()

	val assembler = new VectorAssembler()

    val xgbClassifier = new XGBoostClassifier()
    val pipeline = new Pipeline()
    	.setStages(Array(indexer, assembler, xgbClassifier))


def main(String[] args): Unit = {
	val d1 = spark.read.format("csv").load("path1")
	val d2 = spark.read.format("csv").load("path2")
	val d3 = spark.read.format("csv").load("path3")

	val data = Array(d1,d2,d3)
	val pipeline = myPipeline()

	data.foreach(d => {
		val model = pipeline.fit(d)
		val prediction = pipeline.transform(d)
		prediction.write.mode("overwrite").json(" ... ")


Despite the large number of executors that I allocated when submitting the Spark application, the training process always failed with errors like:

BTW, I have allocated 8G memory for each executor when submitting the jobs.

Does anybody know why? Thanks!