Xgb predictor unstable


#1
  • observe

    • for the same batch of samples, sometimes all results were 0.5, when I were retrying to predict later, the results seems to be normal. Specially, in multi proceess (using bash ‘&’), the-always-0.5 will happen by large probability(1/22), each process has two predictor, the the-always-0.5 appears in second predictor.
    • for one sample, sometimes the result was 0.209669 and sometimes the result was 0.201879, is this stable or not ?
  • env (for predict)

    • os: CentOS release 6.7
    • python: Python 3.5.2 |Anaconda 4.2.0 (64-bit)
    • gcc: GCC 4.4.7
    • xgboost: 0.7 (using pip to install)
  • detail
    I used same python env, but trained and predicted on diffrent machines, trained on CentOS Linux release 7.2.1511(GCC version 4.8.5), did the os or gcc cause the problem?

class XGBModel():
    def __init__(self, model_path):
        self.model_path = model_path
        self.model = self._load(self.model_path)

    def _load(self, path):
        with open(path, 'rb') as fr:
            data = pickle.load(fr)
        return data

    def predict(self, libsvm_filename):
        dtest = xgb.DMatrix(libsvm_filename)
        pred = self.model.predict(dtest)
        return pred

#2

I think I may find the key point - missing the ntree_limit parameter with DART booster in predicting. But how the-always-0.5 happened ?


#3

If you have a reproducible script, consider filing an issue in the GitHub repo. We will look at it.