RAM memory keeps increasing with iterative training of XGB models

thomas · January 8, 2020, 5:36pm

I am training XGBoost models in an iterative way, where I train a couple of decision trees, pause, and then continue training. I noticed that my program had a memory leak in it and noticed quite high memory usage by XGBoost during my investigations. I’ve created a reproducible example below.

Data: https://www.openml.org/d/73 (~roughly 128 MB in memory (pandas), ~360 MB after one-hot encoding)
Code: https://gist.github.com/thuijskens/ae0c608af41a833f6d110da1e24ed2f5 (point the data file to wherever you store it).

Memory profiling the above example gives:

You can see the memory spikes dramatically, at some point consuming around 3 GB in total.

Running the above example, but setting n_iterations to 1 (so that no iterative fitting is done) results in a memory profile that “only” uses around 800 MB at the fit call (I’d show an image but I’m only allowed to display one image).

Why is the memory increasing so much with iterative fitting? I was wondering if this was related to the following sentence on the GPU support page:

If you train xgboost in a loop you may notice xgboost is not freeing device memory after each training iteration. This is because memory is allocated over the lifetime of the booster object and does not get freed until the booster is freed. A workaround is to serialise the booster object after training. See demo/gpu_acceleration/memory.py for a simple example.

Does this hold true for CPU training too?

In any case I find the memory usage very high, and similar experiments with LightGBM on the same data set give me memory consumptions that are 100x lower. Has anybody else run into similar observations?

hcho3 · January 8, 2020, 11:38pm

@thomas Thanks for your report. I just tried running your script and got

Line #    Mem usage    Increment   Line Contents
================================================
     9    305.1 MiB    305.1 MiB   @profile
    10                             def iterative_train(n_iterations, X, y):
    11    305.1 MiB      0.0 MiB       model = XGBClassifier(n_estimators=5, n_jobs=-1)
    12
    13   1737.2 MiB      0.0 MiB       for _ in range(n_iterations):
    14   1737.2 MiB      0.0 MiB           try:
    15                                         # if the model is already fitted, then do another XGB_RESOURCE_UNIT rounds of boosting
    16   1050.5 MiB      0.0 MiB               booster = model.get_booster()
    17    305.1 MiB      0.0 MiB           except XGBoostError:
    18                                         # if the model hasn't been fitted before, do the first round of boosting
    19    305.1 MiB      0.0 MiB               booster = None
    20
    21   1737.2 MiB    718.0 MiB           model.fit(X, y, xgb_model=booster)
    22   1737.2 MiB      0.2 MiB           print(f"Number of trees: {len(model._Booster.trees_to_dataframe().Tree.unique())}")
    23
    24   1737.2 MiB      0.0 MiB       return model

In my machine, the memory consumption peaks at 1.7 GB. Which XGBoost version are you using? I used XGBoost 0.90. At any rate, it appears that there is some memory leak. I will investigate.

In any case I find the memory usage very high, and similar experiments with LightGBM on the same data set give me memory consumptions that are 100x lower. Has anybody else run into similar observations?

LightGBM has a feature known as “Exclusive Feature Bundling,” (EFB) which bundles mutually exclusive features (i.e., they rarely take nonzero values simultaneously), to reduce the number of features. EFB is quite effective at tackling memory consumption for one-hot-encoded, high-dimensional data. See their paper for details.

hcho3 · January 8, 2020, 11:37pm

Update: I re-wrote your script to use the “native” interface of XGBoost (one that does not integrate with scikit-learn):

import numpy as np
import pandas as pd

from memory_profiler import profile
import xgboost

@profile
def iterative_train(n_iterations, dtrain):
    params = {}
    booster = None

    for i in range(n_iterations):
        booster = xgboost.train(params, dtrain, num_boost_round=5, xgb_model=booster)
        print(f"Number of trees: {len(booster.trees_to_dataframe().Tree.unique())}")

    return booster

if __name__ == "__main__":
    data = pd.read_csv("./data.csv")

    X = pd.get_dummies(data.drop(columns=["class"])).values
    y = np.array([(0 if x == 'bad' else 1) for x in data["class"].values])
    dtrain = xgboost.DMatrix(X, label=y)

    iterative_train(10, dtrain)

The peak memory consumption is down to 1 GB:

Line #    Mem usage    Increment   Line Contents
================================================
     7    660.1 MiB    660.1 MiB   @profile
     8                             def iterative_train(n_iterations, dtrain):
     9    660.1 MiB      0.0 MiB       params = {}
    10    660.1 MiB      0.0 MiB       booster = None
    11
    12   1036.4 MiB      0.0 MiB       for i in range(n_iterations):
    13   1036.5 MiB    363.4 MiB           booster = xgboost.train(params, dtrain, num_boost_round=5, xgb_model=booster)
    14   1036.4 MiB      0.4 MiB           print(f"Number of trees: {len(booster.trees_to_dataframe().Tree.unique())}")
    15
    16   1036.4 MiB      0.0 MiB       return booster

The likely reason is that, in your original script, the matrix object gets allocated again every time XGBClassifier.fit() is called. In the new script, the matrix object is allocated only once.

There is an on-going discussion to solve the performance discrepancy between the “native” interface and the scikit-learn interface: https://github.com/dmlc/xgboost/issues/5152.

thomas · January 9, 2020, 9:17am

Hi @hcho3!

Thanks a lot for looking into this so quickly and for the tip regarding EFB in LightGBM! I am using XGBoost 0.90 as well. Yeah it is indeed clear that the sklearn API is more memory consuming, I will see if I can use the native API in my code.

~360 MB of RAM usage still seems like quite a lot to me, since we are only talking about a small ensemble (5 decision trees), but is this the amount of RAM usage that you would expect in this case?

Thanks again!

hcho3 · January 11, 2020, 5:02pm

In “iterative training,” you are adding new trees, so you end up with 50 trees (5 trees * 10 times).