Is there a C++ API document and usage examples?

For integrating with a C++ environment, looking for C++ API documents with some usage examples. Searched for such a document, could not get it. If anyone can provide a pointer, it will be helpful.

I don’t think we provide any “C++ API,” per se. The C++ functions are all internal and thus subject to change without notice. Your best bet would be to use the C API functions listed in https://github.com/dmlc/xgboost/blob/master/include/xgboost/c_api.h, since the C functions don’t tend to change much. (Python and R wrappers depend on the C functions)

Thanks for the quick response. I have looked at the c_api.h. It has very terse comments. Could you please give me an API usage document or any write up showing usage scenarios?

I agree that the C API functions are currently not well documented. For now, you should refer to the Python wrapper. In particular, pay attention to how the wrapper invokes the C API functions. They are easy to spot, since every call to C functions are wrapped with _check_call.

For instance, the following snippet invokes XGDMatrixCreateFromFile() to load a data matrix (DMatrix) from a given file.

1 Like

Thank you for the help. I am looking at both python and julia integration code. However, having a C API documentation with some examples will immensely help the library integrators.

Here is a small snippet I wrote: How do I set class labels?
Hope it helps. C API documentation is currently in the backlog and has no ETA at this point.

The example is of great help. A C API document is a necessity for all developers. Please prioritize it. In the meantime, looking for all the parameters set in XGBoosterSetParam function for training and prediction. Also, model names supported by xgboost.

I, too, would greatly appreciate a C API document.

C API demo: https://github.com/dmlc/xgboost/tree/master/demo/c-api
C API doc: https://xgboost.readthedocs.io/en/latest/dev/c__api_8h.html

2 Likes

But there are legitimate use cases and also examples for using a »C++ API« that are not currently supported by the C-API.
E.g. currently we derive from DataSource to provide an in-memory data source mapping from our own data structures to XGB. Also we implement our on Objective derived from ObjFunction. I know there is the plugin idea around, but extending XGB in our own code base by derivation and treating XGB as an external dependency feels much easier to handle in practice.

What’s the reasoning behind hiding C++ interface from the user?
Seams like this is mostly only a matter of adding XGB includes to the CMake install target. Btw. why are there a CMake AND Makefile based build systems around? Which one is the reference?

This is because the C++ codebase may be refactored in ways that can break backward compatibility. On the other hand, we do make a fairly strong guarantee about backward compatibility when it comes to C API functions.

why are there a CMake AND Makefile based build systems around?

CMake is the reference. Makefile exists mostly as a legacy.

Ok. I understand this, but if I’m willing to accept this why make it difficult to install a C++ interface using CMake e.g. by hiding all the headers.
Btw. I think providing the interface will also improve XGB code base and avoid stuff like
this.
I mean why is there an include dir and additional headers spread around the source if you don’t intend to provide some kind of interface? This is not consistent.

Personally I would be much happier to adopt my code base to follow modest XGB API changes once in a while rather than have to deal with all the ungly patterns a C-API implies. E.g. Resource management for data handles and correctly freeing everything seams much more natural done right in C++ from the view of a C++ code base.
If switching the existing code to C-API now, I would probably implement something to wrap all the *Create and *Free calls, but I’m wondering why I’m not allowed to keep using your C++ interface in the first place.

The headers in include/ directory are meant to be an interface exposed to internal XGBoost code only, e.g. to expose class definitions to different translation units. We (XGBoost developers) didn’t intend to provide an interface that’s externally exposed.

The recent changes were not really “modest”. See https://github.com/dmlc/xgboost/pull/4833/files, https://github.com/dmlc/xgboost/pull/4686/files

You are more than welcome to do anything. However, it’s a different matter for us if we were to 1) document and 2) expose C++ interface to external libraries. By documenting C++ functions, we would implicitly make promises about API stability (many users will take it that way) and this assurance is something we are currently not willing to make.

I agree that the linked fragment of code needs to be refactored, by moving the header into include/ directory. However, making C++ API public will limit our ability to do the refactoring in the first place.

If you continue to strongly feel about having a C++ API, I suggest that you post a proposal (Request for Comment) in our GitHub Issues tracker.

One good way forward is to have a separate layer of public C++ API functions that are to remain relatively stable. The layer will be separate from internal C++ codebase.

My request/idea/need would just to modify the CMakeList.txt to (may be optional) include the C++ headers in the installation and to fix problem like the one I showed you above to make this possible. I’m not requesting/proposing a whole new interface. So if you are ok with that I can prepare an RFC.

@codingforfun Take a look at https://pytorch.org/cppdocs/frontend.html#end-to-end-example. PyTorch has many internal C++ functions, but not every C++ function is exposed to users via CMake targets. Instead, PyTorch provides a modern C++ frontend. Personally, this is a route I am willing to go with.

If you plan to draft an RFC, please address my concern about implicit promise of API stability, should C++ headers are exposed as CMake targets.

I’m afraid I don’t really get your point about a C++ API.
All the essential interface classes already exist. Do you mean wrapping everything, hiding it behind something official? I don’t think this makes much sense, because if a change is really necessary to provide any new functionality you can’t do without changing the interface anyway and if it’s not you could also just restrict yourself to not changing interfaces to much in the first place.
Also I don’t think your concerns about API stability really matter from the perspective of a project directly accessing the C++ interface.

  1. ABI stability – I think this is the number one reason for C-Interfaces. You don’t have that anyway when using C++.
  2. A lot of stuff that can be done in C++ easily is PITA or impossible using the C-Interface. This outweights the need to refactoring for interface changes.
    Example: Currently we
    • Derive from DataSource to feed data in memory into XGB
    • Derive from ObjFunction to define our own objective
    • Derive from Metric to define our own metric
    • Derive from dmlc::Stream to get the trained classifier into QByteArray

I mean XGB is C++, a lot of projects are C++. If you don’t have only R or python in mind it seams somewhat ridiculous to force people to write wrappers around your C-API to do resource management in their C++ projects for using a library which basically is C++ hiding behind a C interface.

XGB even already provides means to register custom implementations. You call it plugins, but in fact it’s not a plugin because I need to modify the source tree and rebuild the library. That doesn’t fit very well into a project where XGB is just an external dependency. To treat extensions as real plugin code we need to access C++ interface anyway.

Okay, I suppose we can make the C++ interfaces public, on the condition that we make very clear the lack of API stability in C++. (That is, your program may break in the future.) I’ll create a new RFC post to get other maintainers involved.

1 Like

@codingforfun Here it is: https://github.com/dmlc/xgboost/issues/4894. Thanks for taking time to discuss.

1 Like

Thanks.
I think more people also using XGB through C++ will help improving XGB itself.
E.g. while moving our code from v0.80 of v0.90 I found several things in XGB that caused me problems, that are not optimal.
Of course this is due to refactoring in XGB and you might argue that’s exactly what you told me, but that’s not the problem in the first place but just reveals problems with the code itself.

E.g.:
I found that there are some dynamic_casts in the code that are not checked (at least using assert). E.g. in SimpleDMatrix::GetRowBatches() it’s silently assumed that the underlying DataSource is SimpleCSRSource, but technically this is not guaranteed since you my construct SimpleDMatrix from any DataSource subclass and which one is used is decided by DMatrix::Create(). So this is most probably wrong usage on our side, but also reveals some open issues in the code.