I have a book going to publication soon on XGBoost and I just updated the software. Using scikit-learn. So far, getting worse results with gblinear but everything else seems close to normal. Until I get to sparse matrices.
As it stands, I currently cannot use sparse matrices as part of a pipeline using the XGBRegressor. When I changed to LinearRegression, it worked, so I know the issue is not with my code, but with the sparse matrix implementation. Furthermore, it worked before when I had previous versions (say 8 months old) installed on the computer.
I could really use help as I have a deadline coming. It would take a lot of code to get to the point of running the pipeline (can provide if essential), but here is the error:
TypeError Traceback (most recent call last)
in
1 model = XGBRegressor(max_depth=2, min_child_weight=3, subsample=0.9, colsample_bytree=0.8, gamma=2, objective=‘reg:squarederror’)
----> 2 model.fit(X_train_transformed, y_train)
3 y_pred = model.predict(X_test_transformed)
4 rmse = MSE(y_pred, y_test)**0.5
5 rmse
~/opt/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, callbacks)
505 base_margin=base_margin,
506 missing=self.missing,
–> 507 nthread=self.n_jobs)
508
509 evals_result = {}
~/opt/anaconda3/lib/python3.7/site-packages/xgboost/core.py in init(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread)
436 threads=self.nthread,
437 feature_names=feature_names,
–> 438 feature_types=feature_types)
439 assert handle is not None
440 self.handle = handle
~/opt/anaconda3/lib/python3.7/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types)
528 if _has_array_protocol(data):
529 pass
–> 530 raise TypeError(‘Not supported type for data.’ + str(type(data)))
531
532
TypeError: Not supported type for data.<class ‘scipy.sparse.coo.coo_matrix’>