Hi,
I trained a xgb ranker in python, and I want to manually insert a tree to it. Is it possible?
thank you
Hi,
I trained a xgb ranker in python, and I want to manually insert a tree to it. Is it possible?
thank you
You can save a JSON model and modify it.
Thanks for the answer!
But then can I load it back and do predictions with ‘predict’ in order to check its behavior? How do I load it?
You can use save_model
and load_model
with json
file extension. See tutorial in doc for detailed explanation. As long as your modification complies to the model schema, xgboost wouldn’t know about your modification.
But when I use save_model
the json is not readable, and the readable option (i.e. Booster.dump_model) can’t be loaded back … is there any tool to work with it? How can I read it or how can I load a readable one?
What do you mean by JSON not being readable? Can you be more specific?
Yeah, when I save it with save_model
I get some encoded code which I don’t know how to read (or more specifically how to add trees to it):
7724 bf00 0000 0000 0000 003b 4b62 4540
whereas in the Booster.dump_model
I get something I can understand:
{ "nodeid": 0, "depth": 0, "split": "rel_log_rated_orders_by_stemmed_alphabetized_search_query", "split_condition": 0.822961092, "yes": 1, "no": 2, "missing": 1, "children": [
Did you specify the JSON extension when calling save_model? It looks like you got the binary format, not JSON format.
bst.save_model('model.json')
Also make sure you are using XGBoost version 1.0.0 or later.
Thank you very much! Updating the version worked!
Follow-up questions - I see that the trees (each tree in bst_json['learner']['gradient_booster']['model']['trees']
) are described with a dictionary:
bst_json['learner']['learner_model_param']['base_score']
+ constant
* split_conditions of the leaf my sample fell in
Edit: after some trials and errors I realized the leaf value is not in base_weights
but rather in the split_conditions
list. I edited the message accordingly
reg:squarederror
, the final score should be base_score + split_condition
. If the objective is binary:logistic
, the final score is sigmoid(base_score + split_condition)
.You are right I was with binary:logistic
!
Thank you a lot!