XGboost modelling rates consult


#1

Hello,

I will make this pretty short.

I am working on a problem of predicting a frequency (rate). The objective value is count:poisson in xgb.train function.

I updated the margin
setinfo(xgbMatrixTest, “base_margin”, test_baked %>% pull(offset_var) %>% log())

to effectively include the offset in the model

My question is why do I get rates when predicting using test_baked ( features in the test set not including the offset ) dataset but get counts when using
xgbMatrixTest ( as seen above ) , my guess here is the log transformation.

log(μ) = α + β x + log (t)

In order to go from the rates to the counts, both become equal.
( exp(log(offset) * rate ) * exp(log (2))

Has anyone encountered this same situation and has a better insight/ or agree with the above?

Thank you