Big data will break the nthread setting in R-xgboost 0.71.2

hcho3 · July 7, 2018, 6:15am

[migrated from https://github.com/dmlc/xgboost/issues/3455]

I found big data (100w * 150 matrix) will make the nthread=40 unvalid in R-xgboost 0.71.2. @hetong007 told me that dmlc-core updated recentlty and it might affected the nthread. Here is the experimental code:

devtools::install_version("xgboost", version = "0.71.2", repos = "http://cran.us.r-project.org")
x <- matrix(rnorm(1000000 * 150), ncol = 150)
y <- rnorm(1000000)
set.seed(2015)
model <- xgboost(data = x, label = y, nrounds = 1, 
  save_name = "/dev/null", objective = "reg:linear", 
  max_depth = 6, min_child_weight = 10, nthread = 40, colsample_bytree = 0.3,  eta = 1, subsample = 0.6, num_parallel_tree = 60)

Runing it and you will find only about 4 or 5 threads are used. If you test small data (1w * 150), the 40 threads are all used. My machine has 40 cpus and 80 threads。R sessionInfo is below

> sessionInfo()
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/openblas-base/libblas.so.3
LAPACK: /usr/lib/libopenblasp-r0.2.18.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8
 [8] LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] xgboost_0.71.2

Another question is when changing the xgboost verion to 0.71.1 or earlier version and testing big data，only about 30 threads are used but not 40 threads.

hcho3 · July 7, 2018, 5:27pm

@joegaotao Thanks for posting the experimental data. I’m wondering which tree_method you are using. I am aware that tree_method=hist doesn’t scale very well for a high number of threads, but I’m not aware of any similar phenomenon for tree_method=exact or tree_method=approx.

joegaotao · July 7, 2018, 11:18am

I run the code by default tree_method = “exact”

hcho3 · July 7, 2018, 5:30pm

According to the XGBoost doc, the default choice for tree_method is exact sometimes and approx other times. I’m hoping to take a look at this soon, but in the meanwhile, can you try specifying tree_method explicitly? Try using exact, approx, and hist for comparison.

hetong007 · July 8, 2018, 2:21am

According to the same doc:

Because old behavior is always use exact greedy in single machine, user will get a message when approximate algorithm is chosen to notify this choice.

So I think users will be awared if approx is used.

joegaotao · July 8, 2018, 6:27am

I have tested that and got the similar result by using exact and approx, while hist seem to utilize more threads but still not full 40 threads. As @hetong007 mentioned, if I feed too much data in xgboost, about 500w, approx method will be triggered automatically.

hcho3 · October 27, 2018, 10:59pm

Related: R-xgboost 0.71.2 multi-thread is much slower than 0.6.4

I am looking at this issue right now. For now, you can downgrade to 0.6.4 or use Python package.

hcho3 · October 28, 2018, 3:49am

@joegaotao I ran experiment on my EC2 instance and so far I didn’t see the same issue. In particular, all threads appear to be used, within maybe 70-90% utilization range. See performance results at R-xgboost 0.71.2 multi-thread is much slower than 0.6.4

joegaotao · November 1, 2018, 5:08am

The data size might affect. Could you have machine to test 1e6 rows data? I tested the small data is OK