How do I set class labels?


#1

I have a classification task with more than 2 classes. How do I prepare the class label data for the training data? Is it a vector with a class number for each training sample, or is it a matrix with a one-hot encoding that has as many columns as classes with a 1 indicating membership in that class and a 0 for every other class?

How do I pass that vector/matrix to the BoosterHandle? I think it might be XGDMatrixSetGroup, or it could be XGDMatrixSetUIntInfo. The documentation of the C code is not very helpful. Thank you!


#2

The easiest way is to use the libsvm format, described here.

In short, the format is: <label> <index1>:<value1> <index2>:<value2> ...


#3

Thank you for this. I can certainly put my data into this format. What do I do then? How do I load this into XGBoost using the C functions? I see functions like XGDMatrixCreateFromCSREx or XGDMatrixCreateFromCSCEx …


#4

Is there a specific reason to use C functions? For most uses, Python and R are recommended.


#5

My entire environment and user 8nterface are in c++. If I were to use another language I’d have to code up a message passing interface. This like that because this is working in real time.


#6

You should use XGDMatrixCreateFromFile() to first obtain a DMatrixHandle:

DMatrixHandle dmat;
assert( XGDMatrixCreateFromFile("training_data.libsvm", 0, &dmat) == 0);

Then you create the booster handle:

DMatrixHandle dmats[1] = {dmat};
BoosterHandle booster;

// Create booster
assert( XGBoosterCreate(dmats, 1, &booster) == 0);

// Set training parameters, always use strings for values
assert( XGBoosterSetParam(booster, "eta", "0.1") == 0);
assert( XGBoosterSetParam(booster, "max_depth", "6") == 0);
assert( XGBoosterSetParam(booster, "objective", "multi:softprob") == 0);
assert( XGBoosterSetParam(booster, "eval_metric", "mlogloss") == 0);
assert( XGBoosterSetParam(booster, "num_class", "3") == 0);

// Run training iterations
for (int i = 0; i < 10; ++i) {   // 10 iterations
  assert( XGBoosterUpdateOneIter(booster, i, dmat) == 0);
}

// Save trained model
assert( XGBoosterSaveModel(booster, "my.model") == 0);

Is there a C++ API document and usage examples?
#7

Ah. Perfect. Thank you very much. XGBoost is a great tool but the documentation needs improvement. This is a great help!