Theoretical Advantage of Information Gain-based Split over Random Split

I am pondering the splitting stage in decision tree construction: often/historically, Information Gain (IG) is the metric presented. This Information Gain-based split is to be compared with a random split (wherein I randomly choose an attribute and value to construct my tree).

I am looking for articles that theoretically quantify the contribution of this IG-based splitting stage compared to a random split. This could lead to conclusions such as:

“If the initial distribution is like this, then IG-based splitting allows an improvement of XXX compared to a random split…” or preferably,
“Given the initial distribution, IG-based splitting results in an enhancement of XXX compared to a random split…”

Does anyone know of such a scientific article? Thank you in advance for your responses.

Thanks a lot.