Hi! I’ve recently got surprised by an XGBoost inference behaviour, and any feedback or insights would be useful!
I had the same XGBClassifier
model both dumped (pickled) and saved (the more recent JSON format), and loading these two back, the doing inference takes wildly different amount of time for them (5-10x time difference with models I’ve tried briefly). Any reason why this might happen?
I did a quick dummy test with the Iris dataset, as below. I’ve done a quick model training and save, the code is in the next collapsed section, resulting in a pickle and a JSON of the same model.
Model training and export code
Here’s the code that I used to train and save the model:
import pickle
import time
import xgboost as xgb
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris["data"], iris["target"]
xgb_model = xgb.XGBClassifier(n_jobs=1).fit(X, y)
# Model export: dump
with open("xgb_iris.pkl", "wb") as f:
pickle.dump(xgb_model, f)
# Model export: save
xgb_model.save_model("xgb_iris.json")
Then I had a quick benchmark to run inference a number of times on the whole dataset, and print out the average inference time per item. The code is below in the collapsed section to do the pickled/dumped model first:
Pickled model loading and benchmarking code
import pickle
import time
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris["data"], iris["target"]
# Model load from dump
with open("xgb_iris.pkl", "rb") as f:
xgb_dumped = pickle.load(f)
repeats = 5
start = time.perf_counter()
for _ in range(repeats):
for x_inference in X:
xgb_dumped.predict_proba([x_inference])
stop = time.perf_counter()
print(f"Dumped model: Time per single inference: {(stop-start)*1000/len(X)/repeats:.3f} ms")
The result is pretty fast:
Dumped model: Time per single inference: 5.442 ms
Then I repeated the same thing for the JSON saved model:
JSON saved model loading and benchmarking cod
import time
import xgboost as xgb
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris["data"], iris["target"]
# Model load from save
xgb_saved = xgb.XGBClassifier()
xgb_saved.load_model("xgb_iris.json")
repeats = 5
start = time.perf_counter()
for _ in range(repeats):
for x_inference in X:
xgb_saved.predict_proba([x_inference])
stop = time.perf_counter()
print(f"Saved model: Time per single inference: {(stop-start)*1000/len(X)/repeats:.3f} ms")
Saved model: Time per single inference: 50.858 ms
This results a ~9x slowdown, while the model does the same thing (and checked elsewhere that the results seem the same)
Finally, if in the same code I switch between the two different objects to do inference, the dumped model slows down as well…
Combined/interleaved model benchmarking code
import pickle
import time
import xgboost as xgb
from sklearn.datasets import load_iris
iris = load_iris()
X, y = iris["data"], iris["target"]
# Inference
# Model load from dump
with open("xgb_iris.pkl", "rb") as f:
xgb_dumped = pickle.load(f)
# Model load from save
xgb_saved = xgb.XGBClassifier()
xgb_saved.load_model("xgb_iris.json")
def run_timing(model, model_name, X, repeats=1):
"""Run timing and display results"""
start = time.perf_counter()
for _ in range(repeats):
for x_inference in X:
model.predict_proba([x_inference])
stop = time.perf_counter()
print(f"{model_name} model: Time per single inference: {(stop-start)*1000/len(X)/repeats:>7.3f} ms")
for _ in range(3):
run_timing(xgb_dumped, "Dumped", X, repeats=5)
run_timing(xgb_saved, "Saved ", X, repeats=5)
There the first dumped model is fast, the repeated runs are similar speed as the loaded model.
Dumped model: Time per single inference: 5.156 ms
Saved model: Time per single inference: 51.047 ms
Dumped model: Time per single inference: 47.337 ms
Saved model: Time per single inference: 50.523 ms
Dumped model: Time per single inference: 50.274 ms
Saved model: Time per single inference: 48.661 ms
Anything that I’m missing? I expected both methods the same speed. And also wondering if Pickle can be fast, can the JSON loaded version be sped up?
Cheers!