I deployed a XGB model by using flask+gunicorn and published it as API. After profiling, I found that it only took 2 ms for predict function in xgboost/core.py (https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/core.py line 1338) if I run it locally on my computer, while it took 200 ms for predict function if I called my API.
My XGB version is 0.81 and I used save_model() to save the model in binary format
my computer: cpu = 4, memory = 8G
Server: cpu = 2, memory = 8G
It seems that there is no extreme difference in hardware environment.
I wonder if anyone knows why it happened.