Speed comparison xgboost @ GTX 1060 vs RTX 3090

Hello,

I have moved from a 12 year old pc with a GTX 1060 graphic card to a new pc with a RTX 3090 graphic card (see details below).

I was expecting a huge increase in speed. Although if I run a hyperparamtertuning-script in R on a dataset with 2 million observations and 110 features (from which 104 are from a one-hot-encoded factor) the processing time only halves.

To investigate if it could be caused by an improper R-installation, I have run a test-script (see below) on both R and Python on both my old and new pc.

These are the processing times in seconds:

I am most interested in the improvement in speed for R since I am an R user. The improvement between the old and new pc for R on the GPU is 60.4%. For Python this is a approximate similar 67.8%. I conclude the R installation is fine.

Since the difference in configurations is quiet substantial, I was expecting a much larger improvement in speed. Is this justified or is a speed improvement of 60.4% as can be expected?

Thanks a lot!

Python script:

import pandas as pd 
import numpy as np
import time

from sklearn.datasets import fetch_covtype
from sklearn.model_selection import train_test_split

import xgboost as xgb

# Fetch dataset using sklearn
cov = fetch_covtype()
X = cov.data
y = cov.target

# code snippet to save data as csv for use in R:
#df = pd.DataFrame(data=cov['data'], columns = cov['feature_names'])
#df.to_csv('my_cov.csv', sep = ',', index = False)
#np.savetxt("my_cov2.csv", y, delimiter=",")

# Create 0.75/0.25 train/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, train_size=0.75,
                                                    random_state=42)

# Specify sufficient boosting iterations to reach a minimum
num_round = 3000

# Leave most parameters as default
param = {'objective': 'multi:softmax', # Specify multiclass classification
         'num_class': 8, # Number of possible output classes
         'tree_method': 'gpu_hist' # Use GPU accelerated algorithm
         }

# Convert input data from numpy to XGBoost format
dtrain = xgb.DMatrix(X_train, label=y_train)
dtest = xgb.DMatrix(X_test, label=y_test)

'''

gpu_res = {} # Store accuracy result
tmp = time.time()
# Train model
xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=gpu_res)
print("GPU Training Time: %s seconds" % (str(time.time() - tmp)))

'''

# Repeat for CPU algorithm
tmp = time.time()
param['tree_method'] = 'hist'
cpu_res = {}
xgb.train(param, dtrain, num_round, evals=[(dtest, 'test')], evals_result=cpu_res)
print("CPU Training Time: %s seconds" % (str(time.time() - tmp)))

R script:

library(xgboost)
library(dplyr) 

setwd("D:~/python_datafiles/")

# dataset laden:
mydata1 <- read.csv(file="my_cov.csv", header=TRUE, sep=",")
mydata2 <- read.csv(file="my_cov2.csv", header=FALSE, sep=",")
colnames(mydata2) <- "target"
mydata3 <- bind_cols(mydata1, mydata2)

mydata3$id <- 1:nrow(mydata3)
train <- mydata3 %>% dplyr::sample_frac(.75)
test  <- dplyr::anti_join(mydata3, train, by = 'id')

train$id <- NULL
test$id <- NULL

val_targets <- test %>%
  dplyr::select(target)
val_targets <- data.matrix(val_targets)
test$target <- NULL
val <- data.matrix(test)
xgb_val <- xgb.DMatrix(data = val, label = val_targets)

trainval_targets <- train %>%
  dplyr::select(target)
trainval_targets <- data.matrix(trainval_targets)
train$target <- NULL
trainval <- data.matrix(train)
xgb_trainval <- xgb.DMatrix(data = trainval, label = trainval_targets)


start.time <- Sys.time()

model_n <- xgb.train(data = xgb_trainval,
                     tree_method = "hist",
                     objective = "multi:softmax",
                     num_class = 8,
                     nrounds = 3000,
                     print_every_n = 100,
                     watchlist = list(train = xgb_trainval, val = xgb_val)
)

end.time <- Sys.time()
time.taken <- end.time - start.time
time.taken