[migrated from https://github.com/dmlc/xgboost/issues/3455]
I found big data (100w * 150 matrix) will make the nthread=40
unvalid in R-xgboost 0.71.2. @hetong007 told me that dmlc-core updated recentlty and it might affected the nthread. Here is the experimental code:
devtools::install_version("xgboost", version = "0.71.2", repos = "http://cran.us.r-project.org")
x <- matrix(rnorm(1000000 * 150), ncol = 150)
y <- rnorm(1000000)
set.seed(2015)
model <- xgboost(data = x, label = y, nrounds = 1,
save_name = "/dev/null", objective = "reg:linear",
max_depth = 6, min_child_weight = 10, nthread = 40, colsample_bytree = 0.3, eta = 1, subsample = 0.6, num_parallel_tree = 60)
Runing it and you will find only about 4 or 5 threads are used. If you test small data (1w * 150), the 40 threads are all used. My machine has 40 cpus and 80 threads。R sessionInfo is below
> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS
Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8
[8] LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] xgboost_0.71.2
Another question is when changing the xgboost verion to 0.71.1 or earlier version and testing big data,only about 30 threads are used but not 40 threads.