I am using xgboost to separate simulated signal from simulated noise. My events have both a weight (simulation due to an energy spectrum) and are imbalanced (much more background events than signal events). Taking this as an example, how would I represent this in the parameters of xgboost?
At the moment, I am leaving pos_scale at the default 1 and normalize the total weight of the background and signal to 0.5 each. Is this the best way to do this?
Also, how does this interact with other parameters like min_child_weight? I guess it takes the integrated weight of the events going this way instead of just the number of events but with this method, interpreting this becomes very hard. Or should I just keep the weights unnormalised and then scale up with pos_scale_weight?
As I said, I am only interested in binary classification (I read that there might be problems if you want to get proper probabilities for each class while using pos_scale_weight).
Thanks for any help understanding proper weighting and its effect in xgboost!