How can I calculate the most optimal values for the following parameters: max_depth, colsample_bytree, min_child_weight, eta, nrounds in R

How can I calculate the most optimal values ​​for the following parameters: max_depth, colsample_bytree, min_child_weight, eta, nrounds in R?

Please!!,

These parameters are highly problem dependent, so you would need to do hyperparameter optimization for your particular problem/dataset.

Take a look here https://xgboost.readthedocs.io/en/latest/tutorials/param_tuning.html

1 Like

Hi @cairo86, I think you want to find the best hype-params for your model.

There are three parts for you to do it.

  1. Set the random values for each hype-params.
  2. Use function expand.grid to get all pos.
  3. Use function purrr::pmap run all results to get best bundle.

Here is a sample code for you.

eta <- c(0.05,0.01,0.03)
nround <- c(5,10,15)
max_depth <- seq(2,15,5)
min_child_weight <- seq(10,25,5)
gamma <- seq(0.4,0.8,0.2)
subsample <- seq(0.5,0.9,0.2)
colsample_bytree <- seq(0.7,1,0.2)
hyper_grid <- 
    expand.grid(
        eta=eta,
        nround=nround,
        max_depth = max_depth,
        min_child_weight = min_child_weight,
        gamma = gamma,
        subsample = subsample,
        colsample_bytree = colsample_bytree
    )
hyper_grid

xgb_mod <- 
    function(max_depth,min_child_weight,gamma,subsample,colsample_bytree){
        xgb <- xgb.train(
         data=dtrain,
         ## 1
           eta = 0.3,
           nround=10,
         ## 2
           max_depth = max_depth,
           min_child_weight = min_child_weight,  
           gamma = gamma,
         ## 3
           subsample = subsample,
           colsample_bytree = colsample_bytree,
         ## 评价标准
           ## eval.metric = "error",
           ## eval.metric = "rmse",
           ## eval.metric = ks_value,
           eval.metric = "auc",
           ## eval.metric = "logloss",
         ## objective
           ## objective = "reg:linear", ## 这是一个回归问题
           objective = "binary:logistic",
         ## 其他
           seed = 123,
           watchlist=watchlist,
           nfold = 5,
           early.stop = 2000,
           nthread = 8
           )
        data.table::data.table(
            best_score = xgb$best_score[1]
            ,best_iteration = xgb$best_iteration
            ,niter = xgb$niter
        )
    }

find_auc <- 
    hyper_grid %>% 
    mutate(mod = pmap(
                      list(max_depth=max_depth
                           ,min_child_weight=min_child_weight
                           ,gamma=gamma
                           ,subsample=subsample
                           ,colsample_bytree=colsample_bytree)
                      ,xgb_mod
                      )
               ) %>% 
    unnest()
2 Likes

Hi thanks very much!