Rmse xgboost check


#1

I have two questions: Can i use RMSE as a valid metric of model accuracy or maybe a normalized rmse ( i doubt this since i have very skewed data please let me know what you think)
Sorry for all of these questions, but I am new to XGboost and trying to understand its functionality in regards to statistical concepts.

Thank you so much


#2

Let me make an effort at this and learn from any corrections in additional post.

I think RMSE might benefit from the central limit theorem if the child weight is large enough.

On a practical note, I have done multiple validations with different metrics. Note, that I use the log transformation in my data which already address skew and outliers. But for this data the MAE and RMSE give almost exactly the same result. Also note that this is very noisy data—evidence that RMSE can work, I think. I use large child weights, e.g. 500. I would think the central limit theorem should apply in this case.

Still, with the results being about the same I use MAE. When I merge all of my data to make predictions out-of-sample, I do not want that extreme outlier affecting my results—especially if I am not sure what effect it may be having. And I do not want to start removing outliers as I always remove too many which seems to cause a decline in predicate performance when a hold-out test sample is available.

Hope this helps some and that I learn from additional answers correcting any of my errors.


#3

Thank you so much for your answer. This is very helpful !

Thank you!!


#4

I have never used the count:poisson objective. But I am trying to learn and searched this on google.

I found this link which, I think, is someone asking a similar question for similar reasons. I would expand on the answer given if I had a more informed opinion.

Link: https://stackoverflow.com/questions/42945557/intuition-behind-nloglikelihood-value-in-xgboost-poisson-run