[jvm-packages] xgboost4j doesn't learn on linux for eta < 1

Hi,

I encountered an odd behaviour of xgboost4j under linux (Ubuntu 17.10).
Namely, if I specify eta to be smaller than 1.0 e.g. 0.3 (the default listed in the documentation), then the resulting model seems to not have learned anything outputting the same probabilities for all inputs if the objective multi:softprob is used.
Note that this happens for 0.72 and 0.81.
Did anyone else encounter this issue or can tell me how to avoid it?

Thank you very much.

Best,

AtR1an

Here is a minimal example with which I can reproduce the issue:

try {
	final DMatrix dmat = new DMatrix(libsvmFile);
	final Map<String, Object> params = new HashMap<>();
	params.put("objective", "multi:softprob");
	params.put("num_class", 2);
	params.put("eta", 0.3);
	final Map<String, DMatrix> watches = new HashMap<>();
	watches.put("train", dmat);
	final int nround = 100;
        final Booster booster = XGBoost.train(dmat, params, nround, watches, null, null);
} catch (final XGBoostError e) {
	e.printStackTrace();
}

Note that the same issue appears if we use a float or string for eta.

Best,

AtR1an

Can you post the data too?

The file I used is a bit too large to share but this happens for any data I use.
Here is a small snippet that creates a dummy matrix for which I also see the issue:

private static DMatrix createLabeledPointMatrix() throws XGBoostError {
		final Random random = new Random();
		final List<LabeledPoint> train = new ArrayList<>(100);
		for (int i = 0; i < 100; i++) {
			final float label = i < 50 ? 0 : 1;
			float feature = (float) random.nextGaussian();
			if (i < 50) {
				feature += 0.5f;
			} else {
				feature -= 0.5f;
			}
			train.add(new LabeledPoint(label, new int[] { 0 }, new float[] { feature }));
		}
		return new DMatrix(train.iterator(), null);
	}

Best,

AtR1an

@hcho3 do you have an idea here? this is kind of a blocker for us at the moment :frowning:

1 Like