What does "leaf weight" mean?

I looked through Tianqi Chen’s presentation, but I’m struggling to understand the details of what the leaf weights are, and I would appreciate if someone could help confirm/clarify my understanding.

To put the equations into words on the slide “Put into context: Model and Parameters”, the predicted value/score (denoted as yhat) is equal to a sum of the K trees of the model, which each maps the attributes to scores. So far so good, I think.

Then in this slide shown below it gives a mock example of a decision tree that calculates the amount that someone likes a computer game X. (Aside: Isn’t this a weird example? Who likes a computer game X by an amount 2? What does that even mean? Why not choose an example that has a more concrete, relatable meaning?)


Now this is where I start to get lost. I can tabulate mock data for the mock example, assuming that the model is perfect so that the weights (w1, w2, w3) are equal to the true value. But even that seems weird: what’s the difference between the predicted value and the weights?

       x_i: "attributes"          y_i, the true score (not yhat_i, which is the predicted score)
                |                                 |
|¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯|
| Age <15 (x_0) | is male (x_1) | Amount likes the computer game X  |
| 1             | 1             | 2                                 | (Young boy)
| 1             | 0             | 0.1                               | (Young girl)
| 0             | 1             | -1                                | (Old man)
| 0             | 0             | -1                                | (Old woman)
| 0             | 0             | -1                                | (Young woman, older than 15)

My question is: Can someone please share what would be the ACTUAL function f? I assume it’s a vector/matrix, but what are the actual numbers? Then my bonus follow up is how would you compute the f for this mock example? I feel it’s such a simple question, but I can’t figure out the answer. If someone could break this down for me in nitty gritty detail it would be a huge help. Thanks!

The function f is a step function, i.e. a function that’s constant piece-wise. In the example, you’d have the step function of two variables

Screen Shot 2020-05-02 at 12.16.11 AM

The “leaf weight” is a fancy way of saying the real output associated with each leaf (exit) node.

Now how does this step function correspond to the decision tree? Let’s say we have a test data point (age=10, gender=female). To get the prediction for the data point, we traverse the
tree from the top to bottom, performing a series of tests. At each intermediate node, we compare a
feature against a threshold. Depending on the result of the comparison, we proceed to either left or
right child node. For (10, female), we’d first perform the test “age < 15” and proceed to the left branch, because “age < 15” is true. Then we perform the second test “gender = male?”, which evaluates to false, so we proceed to the right branch. We end up at the Leaf 2, whose output (leaf weight) is 0.1.

Now imagine doing this step for all values in 2D real space, and you get the step function
Screen Shot 2020-05-02 at 12.16.11 AM

I have put up my master’s thesis, and it has pseudocode for the XGBoost algorithm: https://drive.google.com/file/d/0B0c0MbnP6Nn-eUNRRkVOOGpkbFk/view. Hope it helps.

Thank you @hcho3.

I have a few followup questions:

  1. Given the 5 records of input data above, how would you actually calculate the model for this example? It would be very useful to me to see a worked-through example rather than pseudocode, to help build my understanding.
  2. Is there already an example of a simple worked-through xgboost example somewhere you could point me to?
  3. I understand that the theoretical model is a decision tree/step function, but I don’t think the model is saved as a bunch of if-statements; I assume it’s saved as a matrix that transforms a record to the score, right?

By the way, is “real output” another way of saying the model’s predicted output, or model score?

I mean that each output is a real number.

Unfortunately, I don’t have a worked through example on my hand. For now you should look at the algorithm description in my master’s thesis.

Edit. I found https://github.com/eriklindernoren/ML-From-Scratch/blob/master/mlfromscratch/supervised_learning/xgboost.py, which is a simplified implementation of the XGBoost algorithm. You can add print() statements in the middle of the program and see what’s going on.

The model is actually saved as a bunch of conditions. The decision tree is a tree graph, where some nodes contain a test condition. You can use either arrays or adjacency graph to represent tree graphs.