XGBClassifier un-pickling fails if scikit-learn is not istalled

Hello xgb community!

This is a quick python API related question: I am helping a data scientist from my team to pickle and then (in another context) un-pickle an XGBClassifier object. Un-pickling is done within a python environment where only “pip install xgboost” is run. Then an attempt to un-pickle XGBClassifier fails with

    Traceback (most recent call last):
    File "unpickle.py", line 13, in <module>
      print(doit(sys.argv[1]))
    File "unpickle.py", line 8, in doit
      return pickle.load(fd)
  _pickle.UnpicklingError: NEWOBJ class argument isn't a type object

Installing scikit-learn fixed the problem (yes, my colleague has it installed in his environment on the pickling side).

Should scikit-learn be an install requirement for the python pypi package? Or do I miss something?
Your help highly appreciated!

Thanks,
-Yassen

No, scikit-learn is only required only when you are using xgboost.XGBClassifier object, since XGBClassifier object directly interacts with scikit-learn API. You will not have this problem if you are only using xgboost.train(), which produces xgboost.Booster object.

hcho3: thanks for your reply!

If adding scikit-learn to install requirements is not appropriate, how about providing a XGBClassifier.__reduce__() which would invoke super().__reduce__ but catch an UnpicklingError and in case it carries the above obscure message, re-rase it with addition to the message like NEWOBJ class argument isn't a type object (do you have scikit-learn installed?)

This may save someone else tons of time if in a similar situation like us couple days ago. We lost lots of time trying to bring xgboost, python interpreters, OS to the very same version to find the root cause of that error. A helpful hint like that would have saved our day.

I would propose a PR if that makes sense (I hope it does).
Thanks again!
-Y.

I agree that a better error message would benefit other users. Would you like to file a pull request?

Absolutely. It would be an honour to contribute (although so small a thing).
Should be able to within couple of days. Cheers!
-Y.

1 Like

I tried to find a pull request on github related to this… was this never resolved?

This is actually a problem for cases like a web app deployed on heroku, where the ‘slug size’ should be less than 300 MB, (if not the loading is slow), adding scikit-learn to requirements.txtalong with packages like xgboost itself, pandas, dash/flask makes it more than 300MB always.

Should this be discussed here or would it be appropriate to open an issue?

@pu239 It is not possible to unpickle XGBClassifier without installing scikit-learn. You can avoid installing scikit-learn by always using xgboost.train() instead of XGBClassifier.fit() when training the model.

I’m sorry if this is a dumb cuestion. But when you all say “installing scikit-learn”, you mean doing pip install scikit-learn on the enviorment??