I am using XGBoostJ for a multiclass classification problem with 4 attributes currently(would be increasing it further soon). Since the code help for java implementing XGBoost seems very limited, I figured out how to get it working and did it. I have a few questions a about which I am not very sure, so I thought I can ask here. Currently this is my implementation for training,
Map<String, DMatrix> watches = new HashMap<String, DMatrix>() {
{
put("train", trainigDM);
put("test", validationDM);
}
};
Booster booster = XGBoost.train(trainingDMatrix, params, 100, watches, null, null);
And this is my configuration,
num_rounds = 100
params.put("objective", "multi:softmax");
params.put("verbosity", 1);
params.put("eta", 0.3);
params.put("alpha", 2);
params.put("lambda", 3);
params.put("gamma", 0);
params.put("num_class", 4);
I also test it to get the accuracy using,
float pred[][] = booster.predict(testDM);
The validation set is 20% of the input, test is 10% of the input and te training set is the remaining 70%. Ofcourse, it is shuffled and there is no pattern in the input.
My question are,
I use this constructor of DMatrix to create it since I receive the input as REST call,
DMatrix(float[] data, int num_rows, int num_cols);
But since there is a log of categorical and string features in my data, my feature set becomes huge and I crash since I run out of memory because I encode string and categorical data using one hot encoding(my own implementation). How can I work around this? Is there a converter to libsvm format so that I can use that maybe? Whats a good solution to this?
Why is the predict returning a float[][] instead of a float[]? Is this because if my result(class) is a vector it can be returned? My feature is currently being label encoded. Is that wrong?
Is there a way I can draw curves to evaluate and see if I am facing overfitting or I am underfitting?
Also, very importantly I see that my errors after the training seem to finish at,
[84] test-merror:0.636145 train-merror:0.490371
What is considered a good error? Is something like test-merror:0.111 and train-merror:0.111
a good value to aim for? I am asking this since I am not able to figure(or search online for) out a good metric for these numbers, to consider them good or bad.