I have a use case as follows -
- Training is done in XGBoost4J (with Spark behind) to use the parallel training.
- Saving the model
- Loading it in Python
- Trying to predict on new data in Python
For training, I use Spark DataFrame.
For prediction in the Python version, how should I use? I believe numpy array is the relevant solution, however I get errors e.g.:
'numpy.ndarray' object has no attribute 'feature_names'
Even for 1 record - what’s the best practice of generating the data in order to make a prediction on the python version?
I am asking as I think it’s some kind of a use cases that would help anyone, and if needed I can elaborate more on the documentation in here - https://xgboost.readthedocs.io/en/latest/jvm/xgboost4j_spark_tutorial.html#interact-with-other-bindings-of-xgboost
@hcho3 can you kindly assist / anyone else?