Cost matrix optimization

Could be possible a cost matrix instead of only cost function? For example, have errors to different cases like: false positive, false negative, true positive, true negative. It’s usefull to medic applications like reduce the error of false positive and negative via a cost matrix. In this case the cost of each 4 cases should be used at cost calculation, instead of only y and yhat

Any idea?

Thanks!

Not possible at the moment, but if you have related papers feel free to share.

It’s a cost matrix instead of a weight

For example:

Today:
error = sum(error_metric(y, y_hat) * weight)

Cost matrix:
error = sum(
error_metric(y, y_hat) * weight_true_positive * is_true_positive +
error_metric(y, y_hat) * weight_true_negative * is_true_negative +
error_metric(y, y_hat) * weight_false_positive * is_false_positive +
error_metric(y, y_hat) * weight_false_negative * is_false_negative
)

The error is weighted by the confusion matrix, not only “row” weight

Examples of costcla python decision tree lib:

http://albahnsen.github.io/CostSensitiveClassification/Tutorials.html
https://nbviewer.jupyter.org/github/albahnsen/CostSensitiveClassification/blob/master/doc/tutorials/tutorial_edcs_credit_scoring.ipynb

I will open a feature request on xgboost.

But I think a better way to do this is by using customized metric. You can define one in Python, see example in: https://github.com/dmlc/xgboost/blob/master/demo/guide-python/custom_rmsle.py

the problem of custom metric is know what rows and dataframe are being used, in special the cross validation function where we “random” cut the data in sizes/points that at metric function we don’t know how to find the index of cost matrix

I opened a feature request on https://github.com/dmlc/xgboost/issues/6790 . Will go through the notebook later.