Trained DaskXGBRegressor has lower accuracy than sklearn XGBRegressor

dding3 · July 3, 2020, 2:56am

I trained a sklearn.XGBRegressor with:
XGBRegressor(n_estimators=500, max_depth=50, n_jobs=-1, tree_method=‘hist’,
random_state=2,learning_rate=0.1, min_child_weight=1, seed=0,
subsample= 0.8, colsample_bytree=0.8, gamma=0, reg_alpha=0,
reg_lambda=1,verbosity=0)

I also train DaskXGBRegressor with LocalCluster(n_workers=1, threads_per_worker=1) as:
xgboost.dask.DaskXGBRegressor(n_estimators=500, max_depth=50, n_jobs=-1, tree_method=‘hist’,
random_state=2,learning_rate=0.1, min_child_weight=1, seed=0,
subsample= 0.8, colsample_bytree=0.8, gamma=0, reg_alpha=0,
reg_lambda=1,verbosity=0,objective=‘reg:squarederror’)

Data value feed to both models are the same except one is pandas dataframe while the other one is dask dataframe. And dask dataframe is converted from the pandas dataframe as:
x = dask.dataframe.from_pandas(train_x, npartitions=1)

However I find when using trained DaskXGBRegressor to do prediction, the accuracy is much lower than trained XGBRegressor.

Could this be a bug or DaskXGBRegressor was incorrectly used?

aik · July 16, 2020, 1:22pm

I am a beginer to Dask. So glad to see you actually managed to get DaskXGBRegressor working.
I didn’t get further than:
import xgboost
import dask

If I then enter:
xgboost.dask.DaskXGBRegressor(n_estimators=500, max_depth=50, n_jobs=-1)
I get:
module ‘xgboost’ has no attribute ‘dask’

If I enter:
dask.DaskXGBRegressor(n_estimators=500, max_depth=50, n_jobs=-1)
I get:
module ‘dask’ has no attribute ‘DaskXGBRegressor’

I had searched all over and no example, except for yours.

Could you please share how to get DaskXGBRegressor to load? Many thanks, aik

dding3 · July 17, 2020, 11:58pm

I use “pip install xgboost” and my code can successfully run. I find someone also reported "module ‘xgboost’ has no attribute ‘dask’ ", hope Trained DaskXGBRegressor has lower accuracy than sklearn XGBRegressor can help.

aik · July 20, 2020, 4:23am

Thanks a lot. That’s reassuring.