[Cpp] Non-Usable interface of xgboost/predictor.h

The class Predictor (https://github.com/dmlc/xgboost/blob/master/include/xgboost/predictor.h) seems a very usable interface in the scenarios where model is trained in python/R and prediction function is used in c++.

But this interface seems non usable as the interface refers to gbm::GBTreeModel class for model definition which is not exposed for public usage.

We only expose interfaces/classes defined in include/ while gbm::GBTreeModel is defined under src/ (https://github.com/dmlc/xgboost/blob/master/src/gbm/gbtree_model.h)

Probably that’s why we don’t find any examples of usage of Predictor class, despite hitting the requirements.

If other agree, I can go ahead and create an issue for better tracking and fixing

The interface is not meant for external consumption. As you may have noticed, it exposes lots of XGBoost internals and thus subject to frequent change.

For the time being, users are directed to use the stable C API instead. See https://github.com/dmlc/xgboost/tree/master/demo/c-api/inference for a demo.

We have yet to expose a suitable C++ API that’s fit to use from external applications. We current have a feature request open: https://github.com/dmlc/xgboost/issues/4895

Thanks for your reply. I have 2 follow up questions:
1.
a. What’s the current purpose of predictor.h class? Curious to understand the direction we are moving into.
b. The current predictor interface looks decent from functionality & stability perspective, except the exposure of gbtree_model.h in include.
2. Do you also not recommend Learner interface (https://github.com/dmlc/xgboost/blob/master/include/xgboost/learner.h)?
This one appears to be complete, though overloaded with extra functionality related to training, given the use-case is only for prediction.

Also I believe there is decent acceptability among c++ users about changes in API with upgrades. So I guess this factor should not limit the users from having a working c++ API.

The current purpose of predictor is an internal interface to different modules within XGBoost.

The predictor.h header may appear to be so, but we’ve made substantial changes to related classes, such as Learner, GBTreeModel, and especially DMatrix. In addition, usability is a concern. For example, users will need to carefully make object of type PredictionCacheEntry, and it’s easy to get it wrong.

  1. No, we also don’t recommend the Learner interface.

I see your point about C++ users more accepting of breakage with upgrades. However, as one of the maintainers of the project, I am reluctant to expose an API that’s hard to use (due to the clutters introduced by internals) and thus invite lots of questions and issue reports.

1 Like

@mohitk08 if your goal is model serving and inference, you may want to eschew taking XGBoost as a direct dependency and instead use libraries that are dedicated to model serving. Two examples are Treelite and Forest Inference Library.

In the meanwhile, we will really try to prioritize https://github.com/dmlc/xgboost/issues/4895 in the upcoming release. I understand that using the C API is awkward in the context of a C++ application, and myself want a better alternative.

Thanks for your time and reply.

I have seen other threads too where you have recommended treelite. You have also recommend treelite over xgboost for single instance prediction for speed.

Given that the training happens in xgboost python, isn’t using treelite over xgboost for prediction gonna introduce debugging issues and may be implementation dependent prediction issues? I am not much familiar with treelite at this moment

Treelite has been battle tested in conjunction with Forest Inference Library and Amazon SageMaker Neo, both of which are used by enterprises. However, I understand if you are reluctant to put trust in a new package you haven’t used before. In that case, you may choose to wait for the addition of the C++ API in XGBoost.

@mohitk08 Hello, I started working on a pull request to create an easy-to-use C++ API. The API will be designed so that it will be easy to use from an external C++ application.

That’s great news. Thanks for the update. I’ll be happy to help with review.