Why is the sum of the hessian used for min_child_weight and not just the sum of the weight?

I believe in squared error loss the hessian is directly proportional to the weight so I can see why the hessian works in this situation. What is the justification for other loss functions?

Is this a performance optimization given that the sum of the (weighted) hessian will already have been calculated as part of the splitting procedure?