Learning with xgboost-0.90 vs 1.0.0

astrolearner · August 25, 2019, 4:30am

Hey Guys!

This is my first post in a long time so forgive my noobiness.

I am using xgoost with the following parameters:
params = {‘objective’:‘binary:logistic’, ‘colsample_bytree’: 0.3,‘learning_rate’: 0.1,
‘max_depth’: 5, ‘alpha’: 10, ‘n_estimators’:10}
The task is to classify real from bogus events.
I have a small number of real events as the data is from a telescope that measures spectroscopic information of stars.
I have a large set of bogus events.
My test dataset has 43000 bogus events and 0 real events.
My training data is 206 real events and 164 bogus events.

While using v0.90, I got great results with a warning:
/home/astrolearner/anaconda3/lib/python3.7/site-packages/xgboost/core.py:587: FutureWarning: Series.base is deprecated and will be removed in a future version
if getattr(data, ‘base’, None) is not None and
/home/astrolearner/anaconda3/lib/python3.7/site-packages/xgboost/core.py:588: FutureWarning: Series.base is deprecated and will be removed in a future version
data.base is not None and isinstance(data, np.ndarray) \

xgboost was installed with pip on ubuntu and am using it with python.

I found that the .base class was being depreciated, the issue is closed and fixed.
So I uninstalled v0.90 and got xgboost from git and enabled it in anaconda.
That solved the warning but my classifier became worse with no change to my code.

My test data which was giving me 2254 false positives is now giving me ~23000 FPs.
I have downgraded and I can live with the warning but I want to find out what is happening?

Happy to provide more information on request,

hcho3 · August 15, 2019, 4:40am

Can you post your data?

astrolearner · August 15, 2019, 4:56am

Let me know if you have trouble accessing it.

https://drive.google.com/drive/folders/1FrDmaoDVZjVtqpdt7Dwc9x6xDL0FyhU5?usp=sharing

hcho3 · August 20, 2019, 3:09pm

Thanks, I got it. I will take a look.

hcho3 · September 5, 2019, 8:50pm

@astrolearner I created a new proposal for making it easy to debug accuracy problems: https://github.com/dmlc/xgboost/issues/4837

I’m hoping to prevent silent errors from making it to new releases.

hcho3 · September 26, 2019, 12:14am

Let me look at it this weekend. Thanks for your patience.