New Callback Format

I am trying to create a logger within my company that only uses the callback system to obtain the training params, evaluation metrics, and model artifact.

With the new callback format, what is the env variable equivalent to access the parameters used for training and the evaluation metrics?

Thanks!

What specific parameter are you trying to recovery?

First, I’m trying to retrieve the original launch parameters for the XGBoost model config. Previously I believe I could access through the env variable.

Second, I want to pull the metrics that are evaluated in EvaluationMonitor as a one time pull after the training job is completed. There’s an evals log with every metric recorded against the evaluation set. The example there is showing for every iteration.

Previously I believe I could access through the env variable.

I don’t think so, these are the fields of the old env parameter:

    ["model",
     "cvfolds",
     "iteration",
     "begin_iteration",
     "end_iteration",
     "rank",
     "evaluation_result_list"]

First, I’m trying to retrieve the original launch parameters for the XGBoost model config.

Since you, as the user, are passing those parameters into XGBoost, I think you can obtain those parameters without callback storing them:

callback = MyCallBack(parameters)
booster = xgboost.train(parameters, ... callback=[callback])
                        ~~~~~~~~~~

or

callback = MyCallBack(parameters)
clf = xgboost.XGBClassifier(**parameters)
                              ~~~~~~~~~~
clf.fit(X, y, callback=[callback])

In both cases, you have to know the parameters before start training right? If for some reason you have to get it from the callback itself, you can try booster.save_config() with the model passed into callback.

Second, I want to pull the metrics that are evaluated in EvaluationMonitor as a one time pull after the training job is completed.

You can use the evals_result parameter of xgboost.train function, or xgboost.XGBRegressor.evals_result() function. If you need to do it in your callback (I can’t think of the case that evals_result not being sufficient, the example is just to answer you question instead of providing suggestion), then you can create a callback that does nothing until the last iteration:

results = {}
num_boost_round = 10
class MyCallBack(xgboost.TrainingCallback):
    def after_iteration(self, model, epoch, evals_log):
        if num_boost_round - 1 == epoch:
            results.update(evals_log)
        return False

xgboost.train(parameters, Xy, num_boost_round=num_boost_round, callback=[MyCallBack()]

Hi,
Thanks for your response!
For some context I’m a former PM at AWS SageMaker (also FB, Lyft, etc.) and I’m hacking on some open source tooling now.

I’ll try out the metrics emission.

On parameters, you’re right that this all defined by the user, but logging the parameters is boiler plate code. A superior experience would be to assign a callback which can log all the values to their enterprise’s ML metadata manager of choice (MLFlow, W&B, or MLMD). If I integrate XGBoost with various HPO solutions, the parameters are fed into the params function. Abstraction would be beneficial for user experience.

Are you a core contributor? I’d love to discuss the metadata logging concept and how to reduce boilerplate code overall for ML deployments.

Yup, feel free to open an issue. It would be great if you can provide some examples for illustration so we can understand your use-case better.

Hi, would you be able to DM or email me so I can walk you through some of the logic? alex@socialg.tech

I can also open a Git issue, but I want to think about this from a industry perspective and get back on principles to use as a design for other training frameworks.

Thanks for reaching out. I sent you an e-mail.