Text type of XGBoost

I am new to use XGBoost, have some stupid questions need u guys help:

  1. Can I use pandas data frame type to train model? or I have to convert it to DMatrix type?
  2. Does libsvm is better for sparse data compared with csv or any other text type?
  3. Does all text type should be converted to DMatrix?

Thanks a lot,
Xin

  1. DMatrix constructor takes Pandas data frame as argument:
df = pd.DataFrame(np.arange(12).reshape((4,3)), columns=['a', 'b', 'c'])
xgboost.DMatrix(df)
  1. LIBSVM will be more compact than CSV for sparse data, since LIBSVM does not store missing values.

  2. Yes, XGBoost only understands DMatrix type.

1 Like