Hello,
I am running XGBoost v0.82 over a large dataset, so I would like to use the external memory functionality. I’ve converted my input data to libsvm format and loaded it into a dmatrix following the example here. However, the error printed out at each round does not change at all, and the final roc auc is 0.5. When I train without using #dtrain.cache, the training works normally.
Any ideas what’s happening here? I have posted the relevant snippet of code below, but unfortunately cannot share the data.
Thanks,
Jennet
params={'objective':'binary:logistic',
'nthread':4,
'max_depth':max_depth,
'min_child_weight':min_child_weight,
'gamma':gamma,
'n_estimators':num_rounds,
'eta':eta,
'subsample':subsample,
'colsample_bytree':colsample_bytree}
# Matrices kept in EXTERNAL MEMORY
dtrain = xgb.DMatrix("train.dat#dtrain.cache")
dtest = xgb.DMatrix("test.dat#dtest.cache")
# Fit the algorithm
watchlist = [(dtest, 'eval'), (dtrain, 'train')]
myboost = xgb.train(params,dtrain,num_rounds,watchlist)
# Predict training set:
dtrain_predictions = myboost.predict(dtrain)
dtest_predictions = myboost.predict(dtest)