Do you want to collect all features used in each subtree? Here is a short script that achieves it, using JSON dump and a little bit of Python:
import xgboost
import json
import pprint
def annotate_tree(node):
if 'children' in node:
annotate_tree(node['children'][0])
annotate_tree(node['children'][1])
node['features_used'] = node['children'][0]['features_used'] + node['children'][1]['features_used'] + [node['split']]
else:
node['features_used'] = []
def print_tree(node, depth=0):
indent = ' ' * depth
if 'children' in node:
print(f'{indent}{node["nodeid"]}:[f{node["split"]}<{node["split_condition"]}] yes={node["yes"]}, no={node["no"]}, missing={node["missing"]}, features used in this subtree: {node["features_used"]}')
print_tree(node['children'][0], depth=depth + 1)
print_tree(node['children'][1], depth=depth + 1)
else:
print(f'{indent}{node["nodeid"]}:leaf={node["leaf"]}')
bst = xgboost.Booster(model_file='xgb.model')
pp = pprint.PrettyPrinter(indent=4)
for tree_id, tree_dump in enumerate(bst.get_dump(dump_format='json')):
print(f'booster[{tree_id}]:')
tree = json.loads(tree_dump)
annotate_tree(tree)
print_tree(tree)
Example output:
booster[0]:
0:[f29<-9.53674316e-07] yes=1, no=2, missing=1, features used in this subtree: [56, 109, 29]
1:[f56<-9.53674316e-07] yes=3, no=4, missing=3, features used in this subtree: [56]
3:leaf=-0.856615365
4:leaf=0.853982329
2:[f109<-9.53674316e-07] yes=5, no=6, missing=5, features used in this subtree: [109]
5:leaf=0.971056461
6:leaf=-0.963636339
booster[1]:
0:[f29<-9.53674316e-07] yes=1, no=2, missing=1, features used in this subtree: [56, 109, 29]
1:[f56<-9.53674316e-07] yes=3, no=4, missing=3, features used in this subtree: [56]
3:leaf=0.856615365
4:leaf=-0.853982329
2:[f109<-9.53674316e-07] yes=5, no=6, missing=5, features used in this subtree: [109]
5:leaf=-0.971056461
6:leaf=0.963636339
booster[2]:
0:[f60<-9.53674316e-07] yes=1, no=2, missing=1, features used in this subtree: [29, 60]
1:[f29<-9.53674316e-07] yes=3, no=4, missing=3, features used in this subtree: [29]
3:leaf=-0.393318802
4:leaf=0.485989004
2:leaf=3.19529176
booster[3]:
0:[f60<-9.53674316e-07] yes=1, no=2, missing=1, features used in this subtree: [29, 60]
1:[f29<-9.53674316e-07] yes=3, no=4, missing=3, features used in this subtree: [29]
3:leaf=0.393318832
4:leaf=-0.485989004
2:leaf=-3.19529128
I do not think it is necessary to add record_selections
parameter, since the information you want can be obtained by processing the JSON dump.