CPU faster than GPU using xgb and XGBclassifier

cinzero · August 12, 2020, 3:39am

Hi Everyone,

I apologize in advance as I am a beginner. I am trying out gpu vs cpu tests on xgb and XGBclassifier. The results are as follows:

   passed time with xgb (gpu): 0.390s
   passed time with XGBClassifier (gpu): 0.465s
   passed time with xgb (cpu): 0.412s
   passed time with XGBClassifier (cpu): 0.421s

I am wondering why cpu seems to perform on par if not better than gpu.
This is my setup:

Python 3.6.1
OS: Windows 10 64bit
GPU: NVIDIA RTX 2070 Super 8gb vram
CPU i7 10700 2.9Ghz
Running on Jupyter Notebook
installed the nightly build of xgboost 1.2.0 via pip
** also tried using the version of xgboost installed from a pre-built binary wheel using pip: same issue

Here is the test code i’m using (lifted from here)

param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8, 
              'colsample_bytree':0.8, 'eta':0.5, 'min_child_weight':1,
              'tree_method':'gpu_hist'
              }

num_round = 100

dtrain = xgb.DMatrix(X_train2, y_train)
tic = time.time()
model = xgb.train(param, dtrain, num_round)
print('passed time with xgb (gpu): %.3fs'%(time.time()-tic))

xgb_param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8, 
         'colsample_bytree':0.8, 'learning_rate':0.5, 'min_child_weight':1,
         'tree_method':'gpu_hist'}
model = xgb.XGBClassifier(**xgb_param)
tic = time.time()
model.fit(X_train2, y_train)
print('passed time with XGBClassifier (gpu): %.3fs'%(time.time()-tic))

param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8, 
         'colsample_bytree':0.8, 'eta':0.5, 'min_child_weight':1,
         'tree_method':'hist'}
num_round = 100

dtrain = xgb.DMatrix(X_train2, y_train)
tic = time.time()
model = xgb.train(param, dtrain, num_round)
print('passed time with xgb (cpu): %.3fs'%(time.time()-tic))

xgb_param = {'max_depth':5, 'objective':'binary:logistic', 'subsample':0.8, 
         'colsample_bytree':0.8, 'learning_rate':0.5, 'min_child_weight':1,
         'tree_method':'hist'}
model = xgb.XGBClassifier(**xgb_param)
tic = time.time()
model.fit(X_train2, y_train)
print('passed time with XGBClassifier (cpu): %.3fs'%(time.time()-tic))

Any idea why im not getting a speedup from using GPU?

Thank you very much!

jiamingy · August 11, 2020, 7:43pm

Thanks for raising the question. Do you have data that takes more than 20 seconds to train? When the data size is small it can’t fill the GPU pipeline.

cinzero · August 12, 2020, 3:49am

Thanks for your reply!

I am using a dataset with 75k observations. How large does a dataset usually need to be to see gain when using the GPU?

I have also tried doing an example with grid search on the same dataset then compared the gpu vs cpu performance. Here are the results:

passed time with XGBClassifier (gpu): 209.414s
Best parameter (CV score=0.329):
{'xgbclass__eta': 0.01, 'xgbclass__gamma': 0.6, 'xgbclass__n_estimators': 100}


passed time with XGBClassifier (cpu): 53.189s
Best parameter (CV score=0.333):
{'xgbclass__eta': 0.01, 'xgbclass__gamma': 0.2, 'xgbclass__n_estimators': 100}

With a dataset like mine, is this behavior normal? If I increase the grid search space, would the GPU performance start to catch up and eventually overtake the CPU performance? Or does training time really mostly depend on the size of the data itself and not how big the grid search space is?

Here is the code I used to test grid search performance:

import xgboost as xgb
from sklearn.model_selection import train_test_split

#create pipe for all features
preprocessor = ColumnTransformer(transformers=[('numtrans', num_trans, numpredcols),
                                               ('cattrans', cat_trans, catpredcols)],
                                  remainder='passthrough')

#create pipeline steps
steps_cpu =  [('preprocess', preprocessor),
              ('xgbclass', xgb.XGBClassifier(alpha = 10, tree_method='hist'))]

steps_gpu =  [('preprocess', preprocessor),
              ('xgbclass', xgb.XGBClassifier(alpha = 10, tree_method='gpu_hist', 
                                             n_jobs=-1))]

#define search space
param_grid = { 
              'xgbclass__eta': [0.001, 0.01], 
              'xgbclass__n_estimators': [20, 50, 100, 500],
              'xgbclass__gamma': [0.2, 0.4, 0.6]
             }

#create pipes
gpu_pipe = Pipeline(steps_gpu)
cpu_pipe = Pipeline(steps_cpu)


#fit with gpu
gscv_gpu = GridSearchCV(gpu_pipe, param_grid, scoring=mcc_scorer, cv=5)
tic = time.time()
gscv_gpu.fit(X_train, y_train)
print('passed time with XGBClassifier (gpu): %.3fs'%(time.time()-tic))
print("Best parameter (CV score=%0.3f):" % gscv_gpu.best_score_)
print(gscv_gpu.best_params_)

print()
print()


#fit with cpu
gscv_cpu = GridSearchCV(cpu_pipe, param_grid, scoring=mcc_scorer, cv=5, n_jobs=-1)
tic = time.time()
gscv_cpu.fit(X_train, y_train)
print('passed time with XGBClassifier (cpu): %.3fs'%(time.time()-tic))
print("Best parameter (CV score=%0.3f):" % gscv_cpu.best_score_)
print(gscv_cpu.best_params_)

Thanks for your help!

cinzero · August 12, 2020, 7:43am

I retried it on a larger grid search space and CPU seems to still be much faster. Here are the results:

passed time with XGBClassifier (gpu): 2457.510s
Best parameter (CV score=0.490):
{'xgbclass__alpha': 100, 'xgbclass__eta': 0.01, 'xgbclass__gamma': 0.2, 'xgbclass__max_depth': 5, 'xgbclass__n_estimators': 100}


passed time with XGBClassifier (cpu): 383.662s
Best parameter (CV score=0.487):
{'xgbclass__alpha': 100, 'xgbclass__eta': 0.1, 'xgbclass__gamma': 0.2, 'xgbclass__max_depth': 2, 'xgbclass__n_estimators': 20}

Here is the code used:

import xgboost as xgb
from sklearn.model_selection import train_test_split

#create pipe for all features
preprocessor = ColumnTransformer(transformers=[('numtrans', num_trans, numpredcols),
                                               ('cattrans', cat_trans, catpredcols)],
                                  remainder='passthrough')

#create pipeline steps
steps_cpu =  [('preprocess', preprocessor),
              ('xgbclass', xgb.XGBClassifier(tree_method='hist'))]

steps_gpu =  [('preprocess', preprocessor),
              ('xgbclass', xgb.XGBClassifier(tree_method='gpu_hist'))]

#define search space
param_grid = { 
              'xgbclass__eta': [0.001, 0.01, 0.1],
              'xgbclass__n_estimators': [20, 50, 100, 200],
              'xgbclass__gamma': [0.2, 0.4, 0.6],
              'xgbclass__alpha': [1, 10, 100],
              'xgbclass__max_depth': [2, 5, 10]
             }


#create pipes
gpu_pipe = Pipeline(steps_gpu)
cpu_pipe = Pipeline(steps_cpu)


#fit with gpu
gscv_gpu = GridSearchCV(gpu_pipe, param_grid, scoring=mcc_scorer, cv=5)
tic = time.time()
gscv_gpu.fit(X_train, y_train)
print('passed time with XGBClassifier (gpu): %.3fs'%(time.time()-tic))
print("Best parameter (CV score=%0.3f):" % gscv_gpu.best_score_)
print(gscv_gpu.best_params_)

print()
print()


#fit with cpu
gscv_cpu = GridSearchCV(cpu_pipe, param_grid, scoring=mcc_scorer, cv=5, n_jobs=-1)
tic = time.time()
gscv_cpu.fit(X_train, y_train)
print('passed time with XGBClassifier (cpu): %.3fs'%(time.time()-tic))
print("Best parameter (CV score=%0.3f):" % gscv_cpu.best_score_)
print(gscv_cpu.best_params_)

I was expecting to see GPU performance catch up as the search space gets bigger. However, it does not seem to be the case. Perhaps, the gains in GPU performance just relies more on the data size itself and not how big the grid search space is?