Sparse matrix no longer supported in scikit-learn

I have a book going to publication soon on XGBoost and I just updated the software. Using scikit-learn. So far, getting worse results with gblinear but everything else seems close to normal. Until I get to sparse matrices.

As it stands, I currently cannot use sparse matrices as part of a pipeline using the XGBRegressor. When I changed to LinearRegression, it worked, so I know the issue is not with my code, but with the sparse matrix implementation. Furthermore, it worked before when I had previous versions (say 8 months old) installed on the computer.

I could really use help as I have a deadline coming. It would take a lot of code to get to the point of running the pipeline (can provide if essential), but here is the error:


TypeError Traceback (most recent call last)
in
1 model = XGBRegressor(max_depth=2, min_child_weight=3, subsample=0.9, colsample_bytree=0.8, gamma=2, objective=‘reg:squarederror’)
----> 2 model.fit(X_train_transformed, y_train)
3 y_pred = model.predict(X_test_transformed)
4 rmse = MSE(y_pred, y_test)**0.5
5 rmse

~/opt/anaconda3/lib/python3.7/site-packages/xgboost/sklearn.py in fit(self, X, y, sample_weight, base_margin, eval_set, eval_metric, early_stopping_rounds, verbose, xgb_model, sample_weight_eval_set, callbacks)
505 base_margin=base_margin,
506 missing=self.missing,
–> 507 nthread=self.n_jobs)
508
509 evals_result = {}

~/opt/anaconda3/lib/python3.7/site-packages/xgboost/core.py in init(self, data, label, weight, base_margin, missing, silent, feature_names, feature_types, nthread)
436 threads=self.nthread,
437 feature_names=feature_names,
–> 438 feature_types=feature_types)
439 assert handle is not None
440 self.handle = handle

~/opt/anaconda3/lib/python3.7/site-packages/xgboost/data.py in dispatch_data_backend(data, missing, threads, feature_names, feature_types)
528 if _has_array_protocol(data):
529 pass
–> 530 raise TypeError(‘Not supported type for data.’ + str(type(data)))
531
532

TypeError: Not supported type for data.<class ‘scipy.sparse.coo.coo_matrix’>

You should convert the sparse matrix into the CSR layout:

# x is coo_matrix
x_csr = x.tocsr()

Is 1.2 already outdated? I just updated a couple of weeks ago.

No, 1.2 is the latest version.

In the previous versions, the COO matrix was automatically converted into the CSR layout. In 1.2.0 version (the latest), the conversion is not automatic. I will file a small patch to fix it.

Great. Working on this now.

Thanks so much. Totally worked. I was thrown because whatever I did worked before. Really appreciate.