Error message "terminate called after throwing an instance of 'dmlc::Error'"


#1

XGBoost 0.90
Python 3.6.3

When I use XGBoost to train small data sets it always works perfectly. When I use larger data sets around 20GB I get the following error message. Ive searched google for this message and I see others reporting the same issue, that on small data sets everything works perfectly but on large data sets this error occurs. The server has 1TB RAM and there is more than enough RAM.

terminate called after throwing an instance of ‘dmlc::Error’
what(): [19:43:11] /workspace/src/tree/updater_histmaker.cc:311: fv=inf, hist.last=inf
Stack trace:
[bt] (0) /opt/rh/rh-python36/root/usr/xgboost/libxgboost.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x24) [0x7fa4befcecb4]
[bt] (1) /opt/rh/rh-python36/root/usr/xgboost/libxgboost.so(xgboost::tree::CQHistMaker::HistEntry::Add(float, xgboost::detail::GradientPairInternal)+0x3c8) [0x7fa4bf10af98]
[bt] (2) /opt/rh/rh-python36/root/usr/xgboost/libxgboost.so(xgboost::tree::CQHistMaker::UpdateHistCol(std::vector<xgboost::detail::GradientPairInternal, std::allocator<xgboost::detail::GradientPairInternal > > const&, xgboost::common::Span<xgboost::Entry const, -1l> const&, xgboost::MetaInfo const&, xgboost::RegTree const&, std::vector<unsigned int, std::allocator > const&, unsigned int, std::vector<xgboost::tree::CQHistMaker::HistEntry, std::allocatorxgboost::tree::CQHistMaker::HistEntry >*)+0x773) [0x7fa4bf10f923]
[bt] (3) /opt/rh/rh-python36/root/usr/xgboost/libxgboost.so(+0x242364) [0x7fa4bf110364]
[bt] (4) /lib64/libgomp.so.1(+0x163c5) [0x7fa4c79fc3c5]
[bt] (5) /lib64/libpthread.so.0(+0x7e65) [0x7fa586a99e65]
[bt] (6) /lib64/libc.so.6(clone+0x6d) [0x7fa5860b988d]


#2

Can you try installing the latest nightly build?

pip install https://s3-us-west-2.amazonaws.com/xgboost-nightly-builds/xgboost-1.0.0_SNAPSHOT%2Bf4e7b707c92ea5aec0a0611dc878937bc2855a63-py2.py3-none-manylinux1_x86_64.whl