It took long time to predict after deploying xgb model with flask


I deployed a XGB model by using flask+gunicorn and published it as API. After profiling, I found that it only took 2 ms for predict function in xgboost/ ( line 1338) if I run it locally on my computer, while it took 200 ms for predict function if I called my API.

My XGB version is 0.81 and I used save_model() to save the model in binary format

my computer: cpu = 4, memory = 8G
Server: cpu = 2, memory = 8G

It seems that there is no extreme difference in hardware environment.

I wonder if anyone knows why it happened.