You can take a look at Section 3.4 of the XGBoost paper [1]:
“In many real-world problems, it is quite common for the
input x to be sparse. There are multiple possible causes
for sparsity: 1) presence of missing values in the data; 2)
frequent zero entries in the statistics; and, 3) artifacts of
feature engineering such as one-hot encoding. It is impor-
tant to make the algorithm aware of the sparsity pattern in
the data. In order to do so, we propose to add a default
direction in each tree node, which is shown in Fig. 4. When
a value is missing in the sparse matrix
x , the instance is classified into the default direction. There are two choices
of default direction in each branch. The optimal default di-
rections are learnt from the data. The algorithm is shown in
Alg. 3. The key improvement is to only visit the non-missing
entries I k . The presented algorithm treats the non-presence
as a missing value and learns the best direction to handle
missing values. The same algorithm can also be applied
when the non-presence corresponds to a user specified value
by limiting the enumeration only to consistent solutions.”
[1] https://arxiv.org/pdf/1603.02754.pdf