NaN values with early stopping

I have a dataset that has small number of rows (~10) and large number of features (~100). I use early stopping and CV and keep getting the following error. Dataset does not contain nan. I have tried tweaking many parameters - but still keep getting this error. Any idea on how to resolve this ?

0%| | 0/100 [22:09<?, ?trial/s, best loss=?]/home/ubuntu/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/ FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details:
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/lib/python3.7/”, line 214, in onecmd
func = getattr(self, 'do
’ + cmd)
AttributeError: ‘Pdb’ object has no attribute ‘do_score’
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/sklearn/model_selection/”, line 531, in _fit_and_score, y_train, **fit_params)
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 436, in inner_f
return f(**kwargs)
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 1187, in fit
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 197, in train
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 76, in _train_internal
bst = callbacks.before_training(bst)
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 376, in before_training
model = c.before_training(model=model)
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 515, in before_training
self.starting_round = model.num_boosted_rounds()
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 2007, in num_boosted_rounds
_check_call(_LIB.XGBoosterBoostedRounds(self.handle, ctypes.byref(rounds)))
File “/home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/”, line 210, in _check_call
raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [14:53:47] …/src/metric/ Unknown metric function l
Stack trace:
[bt] (0) /home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/lib/ [0x7f6014a5f33f]
[bt] (1) /home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/lib/ [0x7f6014bcad0f]
[bt] (2) /home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/lib/ [0x7f6014ba0378]
[bt] (3) /home/ubuntu/anaconda3/lib/python3.7/site-packages/xgboost/lib/ [0x7f6014a4d39a]
[bt] (4) /home/ubuntu/anaconda3/lib/python3.7/lib-dynload/…/…/ [0x7f604d1f79dd]
[bt] (5) /home/ubuntu/anaconda3/lib/python3.7/lib-dynload/…/…/ [0x7f604d1f7067]
[bt] (6) /home/ubuntu/anaconda3/lib/python3.7/lib-dynload/ [0x7f604bc86794]
[bt] (7) /home/ubuntu/anaconda3/lib/python3.7/lib-dynload/ [0x7f604bc86ff8]
[bt] (8) python(_PyObject_FastCallKeywords+0x48b) [0x56210a01ea5b]


…/src/metric/ Unknown metric function l

Can you check the hyperparameters? It appears that you are trying to use a metric that does not exist in XGBoost.

For whatever reason, the errors don’t make much sense because the same code with same hyperparameters work for many other data sets. My snippet looks like below. I do have a very small dataset with lot of features but I don’t know why the score would all be nan. I’ve played with different parameters, but do end up with similar failure.

 # trainer_params = {'learning_rate': '0.040', 'n_estimators': 110, 'max_depth': 7, 'colsample_bytree': '0.700',  'subsample': '0.700', 'min_child_weight': '3.000', 'gamma': '3.000', 'reg_lambda': '10.000', 'reg_alpha': '4.000', 'tree_method': 'hist'}
# fit_params=  {'early_stopping_rounds': 1, 'eval_metric': 'logloss', 'verbose': 3, 'eval_set': [[array([[0., 0., 0., ..., 0., 0., 0.],...SNIPPED..               
# eval_metric = "logloss"

clf = xgb.XGBClassifier(nthread = 1,use_label_encoder=False,**trainer_params)
# This fails
score = cross_val_score(clf, x_train, y_train, cv = 2, verbose = 3, scoring = cross_val_scoring, fit_params = fit_params)

# This works (using same dataset for early stopping and training)
score = cross_val_score(clf, x_valid, y_valid, cv = 2, verbose = 3, scoring = cross_val_scoring, fit_params = fit_params)


Here’s a minimalist repro w/ extreme # of features.

Appreciate any pointers!


Scikit-learn somehow treats “mlogloss” as indexable data and splits it up.

Thanks for looking into it. Are there any best practices to deal when there are small number of training rows compared to number of features ? Wonder if there are any xgboost specific tricks to reduce model complexity.

You can look at the SHAP values or global feature importance from the trained model and select the features that are important, then train the model again with removed features or with column sampling and feature weights. I think there are many techniques and some literature around using tree models for feature selection. Feel free to post your discovery.