Understanding the XGBoost LogisticRegression Gradients

nedStackAdapt · July 23, 2019, 3:32pm

I’m going through the math to try to figure out how the gradient and second order gradient for the logistic regression loss function are derived, but I’m getting different answers than what is in the XGBoost code. The code says:

struct LogisticRegression {
...
  static T FirstOrderGradient(T predt, T label) { return predt - label; }
  template <typename T>
  static T SecondOrderGradient(T predt, T label) {
    const T eps = T(1e-16f);
    return std::max(predt * (T(1.0f) - predt), eps);
  }
...
}

But I get something like grad = (pred - label)/(pred*(1-pred)) for the gradient, and a more complex expression for the second order gradient. Also, when I take the derivative with respect to predt of the first order gradient in the code, I don’t get the second order gradient in the code…

Would anyone be able to explain this a little more clearly?