Xgboost can't delete DMatrix object to release memory

Hi!!!

I’m training a xgboost model with a loop. I tried to delete and DMtraix object after each loop to reduce memory consumption. However, it didn’t work and the process crashed in the second loop.

My environment info:
operation system: Debian 9.12
python version: 3.5.3
xgboost version: 1.0.2 (installed by pip)

I used the memory profiler and below is the code & memory result:
Please note that column “memory increment” doesn’t show large negative increment value correctly. It’s a bug in memory profiler. I used process object from psutil package to calculate memory change instead.

Line # Mem usage Increment Line Contents

29   2852.1 MiB   2852.1 MiB   @profile
30                             def xgb_score(params):
31   2852.1 MiB      0.0 MiB       cv_scores = []
32   2852.1 MiB      0.0 MiB       try:
33  21206.2 MiB      0.0 MiB           for i in range(27, 29):
34  21206.2 MiB      0.0 MiB               mem0 = proc.memory_info().rss
35  21206.2 MiB      0.0 MiB               logging.info('!!!Mark!!! memory usage: {}'.format(mem0 / 1024**2))
36  23342.2 MiB   2136.0 MiB               df_train = df_train_raw.loc[df_train_raw['date_block_num'] < i]
37  23403.8 MiB     84.1 MiB               df_val = df_train_raw.loc[df_train_raw['date_block_num'] == i]
38  23403.8 MiB      0.0 MiB               mem1 = proc.memory_info().rss
39  23403.8 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memory usage: {},  increased by {} MB'.format(mem1/1024**2, (mem1-mem0)/1024**2))
40  23325.6 MiB      0.0 MiB               df_train.drop(['data_type', 'ID'], axis=1, inplace=True)
41  23323.2 MiB      0.0 MiB               df_val.drop(['data_type', 'ID'], axis=1, inplace=True)
42  23323.2 MiB      0.0 MiB               print("started to copy data.")
43  25094.9 MiB   9076.1 MiB               dtrain = xgb.DMatrix(df_train[features],  df_train['item_cnt_day']) ## CRASHED HERE!!!!!
44  14452.1 MiB    426.7 MiB               dvalid = xgb.DMatrix(df_val[features], df_val['item_cnt_day'])
45  14452.1 MiB      0.0 MiB               mem2 = proc.memory_info().rss
46  14452.1 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {},  increased by {} MB'.format(mem2/1024**2, (mem2-mem1)/1024**2))
47  12453.4 MiB      0.0 MiB               del df_train, df_val. ### actually here the memory decreased 1999MB
48  12453.4 MiB      0.0 MiB               gc.collect()
49  12453.4 MiB      0.0 MiB               mem3 = proc.memory_info().rss
50  12453.4 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem3/1024**2, (mem3-mem2)/1024**2)) 
51  12453.4 MiB      0.0 MiB               logging.info("cleared space!!!")
52  12453.4 MiB      0.0 MiB               watchlist = [(dtrain, 'train'), (dvalid, 'eval')]
53  12453.4 MiB      0.0 MiB               start = time.time()
54  12453.4 MiB      0.0 MiB               num_boost_round = 4
55  12453.4 MiB      0.0 MiB               early_stopping_rounds = 20
56                                         #callbacks = [log_evaluation(1, True)]
57  12453.4 MiB      0.0 MiB               gbm = xgb.train(params, dtrain, num_boost_round, evals=watchlist, 
58  21604.1 MiB   9150.6 MiB                         early_stopping_rounds=early_stopping_rounds, verbose_eval=True)
59  21604.1 MiB      0.0 MiB               cv_scores.append(gbm.best_score)
60  21604.1 MiB      0.0 MiB               mem4 = proc.memory_info().rss
61  21604.1 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem4/1024**2, (mem4-mem3)/1024**2))
62  21206.2 MiB      0.0 MiB               del gbm, dtrain, dvalid ## but here the memory only decreased by ~300MB. Something is wrong here, dtrain & dvalid should be 9G 
63  21206.2 MiB      0.0 MiB               gc.collect()
64  21206.2 MiB      0.0 MiB               mem5 = proc.memory_info().rss
65  21206.2 MiB      0.0 MiB               logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem5/1024**2, (mem5-mem4)/1024**2))
66  21206.2 MiB      0.0 MiB               logging.info('Finished {}th iteration. Used time: {}\n'.format(i-27, time.time()-start))
67  25094.9 MiB      0.0 MiB       except MemoryError as error:
68  23323.6 MiB      0.0 MiB           logging.error("Some error happened!!")
69  23323.6 MiB      0.0 MiB       logging.info('==============Finished one CV computation. Mean score: {}'.format(np.mean(cv_scores)))
70  23323.6 MiB      0.0 MiB       return np.mean(cv_scores)

So, in line 43, after I read df_train&df_test as DMatrix objects, the memory usage increased by 9G.
However at line 62, after deleting the model and DMatrix objects, python only releases ~400MB of memory.
That leads to memory error in the second loop when another 9G of DMtraix objects have been created…

How to solve this problem… Please help…

Just found the answer… I have to call dtrain.del() to delete it…