Hi!!!
I’m training a xgboost model with a loop. I tried to delete and DMtraix object after each loop to reduce memory consumption. However, it didn’t work and the process crashed in the second loop.
My environment info:
operation system: Debian 9.12
python version: 3.5.3
xgboost version: 1.0.2 (installed by pip)
I used the memory profiler and below is the code & memory result:
Please note that column “memory increment” doesn’t show large negative increment value correctly. It’s a bug in memory profiler. I used process object from psutil package to calculate memory change instead.
Line # Mem usage Increment Line Contents
29 2852.1 MiB 2852.1 MiB @profile
30 def xgb_score(params):
31 2852.1 MiB 0.0 MiB cv_scores = []
32 2852.1 MiB 0.0 MiB try:
33 21206.2 MiB 0.0 MiB for i in range(27, 29):
34 21206.2 MiB 0.0 MiB mem0 = proc.memory_info().rss
35 21206.2 MiB 0.0 MiB logging.info('!!!Mark!!! memory usage: {}'.format(mem0 / 1024**2))
36 23342.2 MiB 2136.0 MiB df_train = df_train_raw.loc[df_train_raw['date_block_num'] < i]
37 23403.8 MiB 84.1 MiB df_val = df_train_raw.loc[df_train_raw['date_block_num'] == i]
38 23403.8 MiB 0.0 MiB mem1 = proc.memory_info().rss
39 23403.8 MiB 0.0 MiB logging.info('!!!!Mark!!!! memory usage: {}, increased by {} MB'.format(mem1/1024**2, (mem1-mem0)/1024**2))
40 23325.6 MiB 0.0 MiB df_train.drop(['data_type', 'ID'], axis=1, inplace=True)
41 23323.2 MiB 0.0 MiB df_val.drop(['data_type', 'ID'], axis=1, inplace=True)
42 23323.2 MiB 0.0 MiB print("started to copy data.")
43 25094.9 MiB 9076.1 MiB dtrain = xgb.DMatrix(df_train[features], df_train['item_cnt_day']) ## CRASHED HERE!!!!!
44 14452.1 MiB 426.7 MiB dvalid = xgb.DMatrix(df_val[features], df_val['item_cnt_day'])
45 14452.1 MiB 0.0 MiB mem2 = proc.memory_info().rss
46 14452.1 MiB 0.0 MiB logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem2/1024**2, (mem2-mem1)/1024**2))
47 12453.4 MiB 0.0 MiB del df_train, df_val. ### actually here the memory decreased 1999MB
48 12453.4 MiB 0.0 MiB gc.collect()
49 12453.4 MiB 0.0 MiB mem3 = proc.memory_info().rss
50 12453.4 MiB 0.0 MiB logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem3/1024**2, (mem3-mem2)/1024**2))
51 12453.4 MiB 0.0 MiB logging.info("cleared space!!!")
52 12453.4 MiB 0.0 MiB watchlist = [(dtrain, 'train'), (dvalid, 'eval')]
53 12453.4 MiB 0.0 MiB start = time.time()
54 12453.4 MiB 0.0 MiB num_boost_round = 4
55 12453.4 MiB 0.0 MiB early_stopping_rounds = 20
56 #callbacks = [log_evaluation(1, True)]
57 12453.4 MiB 0.0 MiB gbm = xgb.train(params, dtrain, num_boost_round, evals=watchlist,
58 21604.1 MiB 9150.6 MiB early_stopping_rounds=early_stopping_rounds, verbose_eval=True)
59 21604.1 MiB 0.0 MiB cv_scores.append(gbm.best_score)
60 21604.1 MiB 0.0 MiB mem4 = proc.memory_info().rss
61 21604.1 MiB 0.0 MiB logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem4/1024**2, (mem4-mem3)/1024**2))
62 21206.2 MiB 0.0 MiB del gbm, dtrain, dvalid ## but here the memory only decreased by ~300MB. Something is wrong here, dtrain & dvalid should be 9G
63 21206.2 MiB 0.0 MiB gc.collect()
64 21206.2 MiB 0.0 MiB mem5 = proc.memory_info().rss
65 21206.2 MiB 0.0 MiB logging.info('!!!!Mark!!!! memoery usage: {}, increased by {} MB'.format(mem5/1024**2, (mem5-mem4)/1024**2))
66 21206.2 MiB 0.0 MiB logging.info('Finished {}th iteration. Used time: {}\n'.format(i-27, time.time()-start))
67 25094.9 MiB 0.0 MiB except MemoryError as error:
68 23323.6 MiB 0.0 MiB logging.error("Some error happened!!")
69 23323.6 MiB 0.0 MiB logging.info('==============Finished one CV computation. Mean score: {}'.format(np.mean(cv_scores)))
70 23323.6 MiB 0.0 MiB return np.mean(cv_scores)
So, in line 43, after I read df_train&df_test as DMatrix objects, the memory usage increased by 9G.
However at line 62, after deleting the model and DMatrix objects, python only releases ~400MB of memory.
That leads to memory error in the second loop when another 9G of DMtraix objects have been created…
How to solve this problem… Please help…