I am attempting to implement a custom evaluation metric for a SparkXGBClassifier. Specifically, I want my eval metric to maximize recall for class 1 in a binary logistic function. In my old, non-spark code, I could implement this by using:
def recall_eval(y_pred, dtrain):
y_true = dtrain.get_label()
err = 1-recall_score(y_true, np.round(y_pred))
return ‘recall_err’, err
…
clf.fit(train,np.ravel(self.labels),eval_metric=recall_eval)
################
In the new SparkXGBClassifier, it is unclear to me how to implement this. Specifically, In the below example, I want to replace “aucpr” with “recall”, and maximize this for class 1 in a classifier.
self.xgbParams = dict(
missing=0.0,
evalMetric ="aucpr",
maximizeEvaluationMetrics=True,
numRound=5,
numWorkers=6
)
self.xgb = (
SparkXGBClassifier(**self.xgbParams)
)
Could someone help me understand how to write the custom recall eval metric with the new pyspark bindings?
Thank you!