How xgboost reach incremental learning?

pahome · March 18, 2019, 2:49am

I read the paper but found nothing talking about how to implement incremental learning.

Can someone share some basic or deep knowledge?

When new data comes in, how to train incrementally?

How to avoid catastrophic forgetting?

thvasilo · March 18, 2019, 2:58pm

Incremental learning can be achieved by producing a model, then when new data becomes available, start from the current model and continue learning.

This is valid under the assumption that the data are independent and identically distributed, an assumption often violated in the real world.

pahome · March 19, 2019, 1:43am

When new data come in, how to determine the split point?
Model will save previous split feature, split value, split sample? or something else?

thvasilo · March 19, 2019, 10:04am

Every iteration a new tree is grown. When you add new data, the new tree will take into consideration whatever data you feed it as training.

pahome · March 20, 2019, 3:27am

From what you said, new data will be fit into new trees and old trees save in the same model.

Originally, I thought train A incrementally-> train B incrementally is equal to train (A+B). But my experiments shows not.

1.
Form my experiments firstly, I have 12 data of 12 months, ex: M1,M2…M12.
I train incrementally from M1 to M12, and I thought it can reach efficiency of train 12 data once.

But it didn’t. Some patterns in M1(or previous month) seems been forgotten.

2.
I change another way to train with more data ex: M123,M456…M101112.
M123 means combine the M1,M2,M3 into a file.

It seems better but a few patterns in M123 seems been forgotten.

3.
Then I change training combination again, this time each data will overlap with another data
ex: M123,M234,M345…M101112.
It seems model remember the patterns in M123 and M234, it’s better than above two methods.

My experiments is about regression, so the MAE about the 3 experiments:

7532
4521
2100

So maybe incremental training isn’t like I train whole data once
It seems like it train batch and every training will sweep some result of previous training result?