Question about scaling learning rate in source code

jinlow · April 30, 2022, 9:55pm

I was trying to better understand some of the internals of XGBoost, and I saw that in the source code, when a tree is grown, the learning rate is scaled by the number of trees, at that iteration.
I was curious why this is? And so this means, as the number of estimators increases, the learning rate used gets smaller?
See the source code bellow, I also see this in all of the updater classes.

github.com

dmlc/xgboost/blob/50d854e02e96480ab5bf009db43fd23b35bef7b0/src/tree/updater_histmaker.cc#L33


DMLC_REGISTRY_FILE_TAG(updater_histmaker);


class HistMaker: public BaseMaker {
 public:
  void Update(HostDeviceVector<GradientPair> *gpair, DMatrix *p_fmat,
              common::Span<HostDeviceVector<bst_node_t>> out_position,
              const std::vector<RegTree *> &trees) override {
    interaction_constraints_.Configure(param_, p_fmat->Info().num_col_);
    // rescale learning rate according to size of trees
    float lr = param_.learning_rate;
    param_.learning_rate = lr / trees.size();
    // build tree
    for (auto tree : trees) {
      this->UpdateTree(gpair->ConstHostVector(), p_fmat, tree);
    }
    param_.learning_rate = lr;
  }
  char const* Name() const override {
    return "grow_histmaker";
  }

PabloBrianese · February 28, 2023, 7:07pm

I have the same doubts. Should the documentation on learning rate reflect this linear schedule?