In Python, XGBoost has two ways of being used: a native API and a SciKit API.
- The Native API requires you to convert you data into DMatrix: “an internal data structure that is used by XGBoost, which is optimized for both memory efficiency and training speed”
- The Scikit-learn API takes as inputs Dataframes or Numpy Arrays.
As far as a I understand, the Scikit API is just a wrapper over the xgb.train and under the hood, it converts the data to Dmatrix.
So there should be no difference in performance between both as both are using DMatrix under the hood. Is my understanding correct?