As yet unexplained nan's from Booster.predict


#1

Hi folks.

I’m working in a large CPython project, that mostly uses CPython 3.6 or CPython 3.7.

It also uses XGBoost 0.9 in combination with scikit-learn 0.21.3.

I’ve been getting (sometimes!) some nan’s that result in a traceback from a “simple” pytest that has a kinda huge callstack beneath it. The pytest feeds in random values.

I’ve put an SSCCE, sans input files (at least for now), at http://stromberg.dnsalias.org/svn/xgboost-predict-nans/trunk/ttt-sscce . I’m also pasting the same thing immediately below:

#!/usr/bin/python3.6

"""
An SSCCE for our NaN issue in XGBoost.

This is with regard to Grokstream issue RM-454, the test_train transient error.
This script fails every time, and completes quickly.

The matter could easily be an input problem rather than an XGBoost bug.
"""

import xgboost.sklearn
import xgboost.core
import numpy


def main():
    """Replicate."""
    classifier = xgboost.sklearn.XGBClassifier()

    classifier.load_model('xgboost-sklearn-model-file')

    booster = classifier.get_booster()

    test_dmatrix = xgboost.core.DMatrix('test-dmatrix')

    class_probs = booster.predict(test_dmatrix, ntree_limit=0, validate_features=True)

    print(class_probs)

    if all(numpy.isfinite(class_probs)):
        print('Good, all values are finite.')
    else:
        raise SystemExit('Uh oh, one or more values are not finite.')


main()

Is there any way of telling, without the inputs, why this is giving all nan’s?

Hopefully on Tuesday my employer will be able to make a final decision on whether I can share the two input files here.

PS: I asked about how to print a DMatrix at Is there a way to print a DMatrix as ASCII or JSON?
The question also arises: Is there a way of printing an XGBoost XGBModel from CPython as ASCII or JSON?
I’m hoping these two print operations will allow me (and possibly someone more familiar with the algorithms involved) to scan the two input files for bad values.

Thanks!


#2

Hi again.

I got permission to include the two binary input files. They are now at http://stromberg.dnsalias.org/svn/xgboost-predict-nans/trunk/test-dmatrix and http://stromberg.dnsalias.org/svn/xgboost-predict-nans/trunk/xgboost-sklearn-model-file

I’m not attaching the files here, because they are binary and they do not have a discuss.xgboost.ai-approved file type (the forum software won’t let me upload them). I guess I could base64 them - please do let me know if that’d help.

Does anyone have any thoughts on why this small program with the linked inputs would be getting NaN’s? And on whether it is an input problem or an XGBoost problem?

Thanks!


#3

Thanks a lot for uploading a reproducible example. I will try to run it myself.


#4

Thanks for taking an interest in it. I hope it reproduces on your end.


#5

Does the SSCCE replicate the NaN’s for anyone else reading this thread?

I just checked on one of my computers at home, and it replicated there.


#6

I managed to reproduce the bug with XGBoost 0.90. I will take a look.

Curiously, when I tried using the latest source from https://github.com/dmlc/xgboost, I got this error instead:

Traceback (most recent call last):
  File "test.py", line 26, in <module>
    main()
  File "test.py", line 10, in main
    classifier.load_model('xgboost-sklearn-model-file')
  File "/home/ubuntu/xgboost/python-package/xgboost/sklearn.py", line 375, in load_model
    self._Booster.load_model(fname)
  File "/home/ubuntu/xgboost/python-package/xgboost/core.py", line 1532, in load_model
    self.handle, c_str(os_fspath(fname))))
  File "/home/ubuntu/xgboost/python-package/xgboost/core.py", line 189, in _check_call
    raise XGBoostError(py_str(_LIB.XGBGetLastError()))
xgboost.core.XGBoostError: [07:40:13] ../src/objective/./regression_loss.h:89: Check failed: base_score > 0.0f && base_score < 1.0f: base_score must be in (0,1) for logistic loss, got: -0
Stack trace:
  [bt] (0) /home/ubuntu/xgboost/python-package/xgboost/../../lib/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x7c) [0x7f718335b24c]
  [bt] (1) /home/ubuntu/xgboost/python-package/xgboost/../../lib/libxgboost.so(xgboost::obj::LogisticRegression::ProbToMargin(float)+0xe7) [0x7f71834d1cf7]
  [bt] (2) /home/ubuntu/xgboost/python-package/xgboost/../../lib/libxgboost.so(xgboost::LearnerImpl::Configure()+0xd1b) [0x7f718344df1b]
  [bt] (3) /home/ubuntu/xgboost/python-package/xgboost/../../lib/libxgboost.so(xgboost::LearnerImpl::LoadModel(dmlc::Stream*)+0xf22) [0x7f718344fb92]
  [bt] (4) /home/ubuntu/xgboost/python-package/xgboost/../../lib/libxgboost.so(XGBoosterLoadModel+0x7a2) [0x7f71833507e2]
  [bt] (5) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call_unix64+0x4c) [0x7f71b6111dae]
  [bt] (6) /usr/lib/x86_64-linux-gnu/libffi.so.6(ffi_call+0x22f) [0x7f71b611171f]
  [bt] (7) /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(_ctypes_callproc+0x2b4) [0x7f71b63255c4]
  [bt] (8) /usr/lib/python3.6/lib-dynload/_ctypes.cpython-36m-x86_64-linux-gnu.so(+0x11c33) [0x7f71b6325c33]

#7

Your model appears to contain NaNs. You can obtain text representation of the model by running

booster.dump_model('model.txt', dump_format='text')

which produces

...
booster[22]:
0:leaf=-0
booster[23]:
0:leaf=-nan
booster[24]:
0:leaf=-0
booster[25]:
0:leaf=-0
booster[26]:
0:leaf=-0
booster[27]:
0:leaf=-0
booster[28]:
0:leaf=-0
booster[29]:
0:leaf=-0
booster[30]:
0:leaf=-0
booster[31]:
0:leaf=-0
booster[32]:
0:leaf=-nan
booster[33]:
0:leaf=-0
booster[34]:
0:leaf=-0
booster[35]:
0:leaf=-0
booster[36]:
0:leaf=-0
booster[37]:
0:leaf=-0
booster[38]:
0:leaf=-0
booster[39]:
0:leaf=-0
booster[40]:
0:leaf=-0
booster[41]:
0:leaf=-nan
...

#8

Thank you! I really appreciate it.