Xgboost can't delete DMatrix object to release memory

voibest · April 7, 2020, 2:45am

Hi!!!

I’m training a xgboost model with a loop. I tried to delete and DMtraix object after each loop to reduce memory consumption. However, it didn’t work and the process crashed in the second loop.

My environment info:
operation system: Debian 9.12
python version: 3.5.3
xgboost version: 1.0.2 (installed by pip)

I used the memory profiler and below is the code & memory result:
Please note that column “memory increment” doesn’t show large negative increment value correctly. It’s a bug in memory profiler. I used process object from psutil package to calculate memory change instead.

Line # Mem usage Increment Line Contents

29   2852.1 MiB   2852.1 MiB   @profile
30                             def xgb_score(params):
31   2852.1 MiB      0.0 MiB       cv_scores = []
32   2852.1 MiB      0.0 MiB       try:
33  21206.2 MiB      0.0 MiB           for i in range(27, 29):
34  21206.2 MiB      0.0 MiB               mem0 = proc.memory_info().rss
35  21206.2 MiB      0.0 MiB               logging.info('!!!Mark!!! memory usage: {}'.format(mem0 / 1024**2))
36  23342.2 MiB   2136.0 MiB               df_train = df_train_raw.loc[df_train_raw['date_block_num'] < i]
37  23403.8 MiB     84.1 MiB               df_val = df_train_raw.loc[df_train_raw['date_block_num'] == i]
38  23403.8 MiB      0.0 MiB               mem1 = proc.memory_info().rss
39  23403.8 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memory usage: {},  increased by {} MB'.format(mem1/1024**2, (mem1-mem0)/1024**2))
40  23325.6 MiB      0.0 MiB               df_train.drop(['data_type', 'ID'], axis=1, inplace=True)
41  23323.2 MiB      0.0 MiB               df_val.drop(['data_type', 'ID'], axis=1, inplace=True)
42  23323.2 MiB      0.0 MiB               print("started to copy data.")
43  25094.9 MiB   9076.1 MiB               dtrain = xgb.DMatrix(df_train[features],  df_train['item_cnt_day']) ## CRASHED HERE!!!!!
44  14452.1 MiB    426.7 MiB               dvalid = xgb.DMatrix(df_val[features], df_val['item_cnt_day'])
45  14452.1 MiB      0.0 MiB               mem2 = proc.memory_info().rss
46  14452.1 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {},  increased by {} MB'.format(mem2/1024**2, (mem2-mem1)/1024**2))
47  12453.4 MiB      0.0 MiB               del df_train, df_val. ### actually here the memory decreased 1999MB
48  12453.4 MiB      0.0 MiB               gc.collect()
49  12453.4 MiB      0.0 MiB               mem3 = proc.memory_info().rss
50  12453.4 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem3/1024**2, (mem3-mem2)/1024**2)) 
51  12453.4 MiB      0.0 MiB               logging.info("cleared space!!!")
52  12453.4 MiB      0.0 MiB               watchlist = [(dtrain, 'train'), (dvalid, 'eval')]
53  12453.4 MiB      0.0 MiB               start = time.time()
54  12453.4 MiB      0.0 MiB               num_boost_round = 4
55  12453.4 MiB      0.0 MiB               early_stopping_rounds = 20
56                                         #callbacks = [log_evaluation(1, True)]
57  12453.4 MiB      0.0 MiB               gbm = xgb.train(params, dtrain, num_boost_round, evals=watchlist, 
58  21604.1 MiB   9150.6 MiB                         early_stopping_rounds=early_stopping_rounds, verbose_eval=True)
59  21604.1 MiB      0.0 MiB               cv_scores.append(gbm.best_score)
60  21604.1 MiB      0.0 MiB               mem4 = proc.memory_info().rss
61  21604.1 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem4/1024**2, (mem4-mem3)/1024**2))
62  21206.2 MiB      0.0 MiB               del gbm, dtrain, dvalid ## but here the memory only decreased by ~300MB. Something is wrong here, dtrain & dvalid should be 9G 
63  21206.2 MiB      0.0 MiB               gc.collect()
64  21206.2 MiB      0.0 MiB               mem5 = proc.memory_info().rss
65  21206.2 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem5/1024**2, (mem5-mem4)/1024**2))
66  21206.2 MiB      0.0 MiB               logging.info('Finished {}th iteration. Used time: {}\n'.format(i-27, time.time()-start))
67  25094.9 MiB      0.0 MiB       except MemoryError as error:
68  23323.6 MiB      0.0 MiB           logging.error("Some error happened!!")
69  23323.6 MiB      0.0 MiB       logging.info('==============Finished one CV computation. Mean score: {}'.format(np.mean(cv_scores)))
70  23323.6 MiB      0.0 MiB       return np.mean(cv_scores)

So, in line 43, after I read df_train&df_test as DMatrix objects, the memory usage increased by 9G.
However at line 62, after deleting the model and DMatrix objects, python only releases ~400MB of memory.
That leads to memory error in the second loop when another 9G of DMtraix objects have been created…

How to solve this problem… Please help…

voibest · April 8, 2020, 12:36am

Just found the answer… I have to call dtrain.del() to delete it…