If anyone can help me understand the below behavior
- Using a lower learning rate say .01 and more trees I end up with a poorer model as measured by auc.
- Using a higher learning rate say .4 and fewer trees I end up with a superior model as measured by auc.
A smaller and more performant model is preferable to a larger less performant model. So, I am happy with the results. Still, the outcome is different that what I have learned with respect to boosting with respect to learning rate and number of trees. I suppose it is simply the difference in classifiers. Still, I would like to understand the why as best possible as I plant to replace the old classifier with XgBoost in a production environment next year.
The dataset is highly unbalanced . Below is the parameter setup.
trainlabel <- trainloandata$event trainpred <- trainloandata[,!(names(trainloandata) %in% c('event'))] dtrain <- xgb.DMatrix(data = data.matrix(trainpred), label = trainlabel) testlabel <- testloandata$event testpred <- testloandata[,!(names(testloandata) %in% c('event'))] dtest <- xgb.DMatrix(data = data.matrix(testpred), label = testlabel) params <- list(booster = "gbtree", objective = "binary:logistic", tree_method = 'approx', eta = 0.4, gamma = 0, max_depth = 8, max_delta_step = 2, min_child_weight = 1, subsample = .5, colsample_bytree = 1.0) params_constrained <- params watchlist <- list(train = dtrain, test = dtest) xgboost_model <- xgb.train(params = params, data = dtrain, nrounds = 500, watchlist = watchlist, print_every_n = 1, early_stopping_rounds = 5, maximize = TRUE, eval_metric = "auc", metric_name = "dtrain_auc")