Single Instance Prediction time vs Multi Instance Prediction time

I have an xgboost trained model. I need to predict a single instance at a time. But I see that prediction time for all the sizes of a matrix (num_instances x num_features) is same (~ 90 sec)
[1 (num_instances) x 300k (num_features)]
[500(num_instances) x 300k(num_features)]
[20k(num_instances) x 300k(num_features)]

bst = xgb.Booster()
bst.load_model(‘trained.model’)
dtest = xgb.Dmatrix(‘test_data_libsvm_format.txt’)
ypred = bst.predict(dtest)

I have very sparse data. I tried using libsvm format and scipy.sparse format. Both takes almost same time for single instance.

How can I get a single instance prediction time < 1 sec ? How can xgboost can be used in a time-sensitive production systems where user might not wait for 90 sec. ?

It is surprising that prediction time remains the same regardless of number of instances.

For 300k features it’s likely that prediction times will be non-trivial, but shouldn’t be that long.

I can suggest taking a look at treelite as a way to speed up prediction and prepare models for deployment.

Treelite is an interesting project and I look forward to trying it out. Seems like it focuses on improving the performance for batch prediction. I am wondering if it will also be useful for single instance prediction, thanks!

Thanks @thvasilo @everglory99 for your reply. I guess I found the problem. I realized that the number of features is around 300M and not 300k. I verified with 300k features, it is taking < 1 sec.