CPP API XGBoosterPredict cost too much time!

I use the below params to train.

XGBoosterSetParam(h_booster, "booster", "gbtree");
XGBoosterSetParam(h_booster, "objective", "binary:logistic");
XGBoosterSetParam(h_booster, "max_depth", "5");
XGBoosterSetParam(h_booster, "eta", "0.1");
XGBoosterSetParam(h_booster, "silent", "1");
XGBoosterSetParam(h_booster, "min_child_weight", "1");
XGBoosterSetParam(h_booster, "subsample", "0.5");
XGBoosterSetParam(h_booster, "colsample_bytree", "1");
XGBoosterSetParam(h_booster, "num_parallel_tree", "1");

It cost 800ms to make a prediction with 40 dim features of 1 data !!

more info: https://github.com/dmlc/xgboost/issues/3512

Is 800 ms too slow for you? What is your application requirement?

I need less than 100ms.
Will GPU speed up it?

I don’t think GPU will help you here, since copying data to GPU introduces additional latency.

XGBoosterPredict() is not very fast for single-instance prediction since there is an extra step of forming DMatrix. Can you assemble your data so that you have multiple instances per batch?

For single-instance prediction, you may want to consider https://github.com/dmlc/treelite. This project will produce an interface to do single-instance prediction. Keep in mind that Treelite is still in beta.

Thank you very much !!

Is it a bug? I use Python API and predict 2000 data in 20 ms.

def get_time_stamp():
    ct = time.time()
    local_time = time.localtime(ct)
    data_head = time.strftime("%Y-%m-%d %H:%M:%S", local_time)
    data_secs = (ct - int(ct)) * 1000
    time_stamp = "%s.%03d" % (data_head, data_secs)
    return time_stamp
import time
print(get_time_stamp())
begin_time = int(time.time()*1000)
valid_predict = xgb_trained_model.predict(valid_data)
print(int(time.time()*1000)-begin_time)
print(get_time_stamp())

Sorry, the time print of my Cpp code is wrong.

Has anything changed within XGBoost since this post to improve single instance prediction computation time?
If not, do you still recommend Treelite?

XGBoost now offers an in-place prediction function

I apologize if this is obvious. I’m very new to C/C++. I’m struggling to shake out which function this is in the C API. I looked at the Python implementation and it looks like XGBoosterPredictFromDense, XGBoosterPredictFromCSR, XGBoosterPredictFromCudaArray, and XGBoosterPredictFromCudaColumnar all predict in-place?

If that’s the case, I think I want to use XGBoosterPredictFromDense. However, I’m missing at how that differs from XGBoosterPredictFromDMatrix as I thought the “D” in DMatrix stood for Dense?

With XGBoosterPredictFromDense, you don’t need to build a DMatrix object. Instead, you pass a reference to dense array to XGBoosterPredictFromDense. When passing the reference, you have to use an “array interface”. See https://numpy.org/doc/stable/reference/arrays.interface.html

Maybe this isn’t what I need then. Or it’s possible I’m misunderstanding what you mean?
My current workflow is to create a multidimensional array of floats, then call:
safe_xgboost(XGDMatrixCreateFromMat(reinterpret_cast<float*>(feature_matrix_), num_rows_, num_features_, NAN, &input_dmatrix_)) where input_dmatrix_ is a DMatrixHandle and feature_matrix_ is my multidimensional array of float

I got the above function call from the c-api tutorial
After I create my DMatrix, I call XGBoosterPredict, also like in the c-api tutorial. Like the OP, this takes roughly 800ms. However, I notice that in the documentation the function is deprecated.

What I think you’re saying is I can somehow skip creating the DMatrix and predict directly on my multidimensional array inplace with a different function and that should be much faster?

Again, apologies if what I’m saying doesn’t make sense or is completely off. I’m quite new to C++.

Currently, there isn’t an accessible tutorial on the use of XGBoosterPredictFromDense. For now, please use Treelite instead.