Custom Eval Function for pyspark implementation

JTW · December 15, 2022, 9:41pm

I am attempting to implement a custom evaluation metric for a SparkXGBClassifier. Specifically, I want my eval metric to maximize recall for class 1 in a binary logistic function. In my old, non-spark code, I could implement this by using:

def recall_eval(y_pred, dtrain):
y_true = dtrain.get_label()
err = 1-recall_score(y_true, np.round(y_pred))
return ‘recall_err’, err

…

clf.fit(train,np.ravel(self.labels),eval_metric=recall_eval)

################

In the new SparkXGBClassifier, it is unclear to me how to implement this. Specifically, In the below example, I want to replace “aucpr” with “recall”, and maximize this for class 1 in a classifier.

   self.xgbParams = dict(
      missing=0.0,
      evalMetric ="aucpr",
       maximizeEvaluationMetrics=True,
      numRound=5,
      numWorkers=6
    )

    self.xgb = (
      SparkXGBClassifier(**self.xgbParams)
    )

Could someone help me understand how to write the custom recall eval metric with the new pyspark bindings?

Thank you!