Move trained xgboost classifier from PySpark EMR notebook to S3


I built a trained classifier in an AWS EMR notebook

bst = xgb.train(param, dtrain)

When I try and save this function to s3, I get the error

[17:44:03] /workspace/dmlc-core/src/ Please compile with DMLC_USE_S3=1 to use S3
Stack trace:

Is there any way I can upload a trained model inside of an EMR notebook to S3?


I think you can save the model in a local drive first and then use boto3 to upload it to S3.


If I save using,


Then I go to load it back up, I get a NoneType object


Are you able to locate the model file on the local disk?


Yes, I can locate it using
loader = bst.load_model('path')
but then loader is of NoneType

or, if I can locate it, then push it into S3 using

s3_client.upload_file('home/hadoop/###//classifier.model', "###-data-science", "classifier.model")

but when I go to download it from S3 using
s3_client.download_file('###-data-science', 'classifier.model', 'classifier.model')

I get error:

[Errno 13] Permission denied: ‘classifier.model.26B9A3Aa’
Traceback (most recent call last):

And I DO have permissions to read and write from S3


Can you ensure that you have full read/write access to the local disk? If not, using /tmp may be a solution.


Where would /tmp go?


Do you have access to the local disk? As for /tmp, see


Yes I have access to the local disk


I have the exact same error. I am running the prediction with deploy mode = ‘cluster’. I am guessing there is some issue with sharing the resources with the master and the worker. The model runs fine (small data) on the client mode deploy.