# AFT Survival Model Probabilities

I created an Accelerated Failure Time model with Xgboost, but cannot figure out how to extract the model probabilities and it doesn’t seem to be covered in the documentation. predict_proba does not work.

Any ideas?

I have been struggling with this problem too. Here’s what I’ve tried so far. It looks pretty good for aft_loss_distribution = ‘normal’, but not so good for aft_loss_distribution = ‘extreme’. Would love it if an expert could chime in and mention what I could be doing wrong for aft_loss_distribution = ‘extreme’

According to the documentation,

log(Y) =T(x) + σ*Z

Here, Y is the time to death, T(x) is the output of the XGBoost algorithm, and Z is some probability distribution. Assume Z is the normal distribution. Then, in order to figure out the P(Y > Y_t), aka, the probability that the user will still be alive after time Y_t, all you have to do is the following:

``````    distribution = np.random.normal(0, σ, 10000) + np.log(T(x))
p_survival = 1- np.sum(distribution < np.log(Y_t))/len(distribution)
``````

As I mentioned earlier, I got pretty good results when using this for Z = normal distribution. When I say good results, I mean that the p_survival were well-calibrated for various values of Y_t. More specifically, when I bucket the data based on the magnitude of p_survival, and compute the predicted probability and the actual probability in each bucket, the predicted and actual values line up pretty well.

However, I tried the following for Z = extreme distribution, and things look pretty poorly calibrated. Any idea what I could be doing wrong?

``````    distribution = np.random.gumbel(0,σ,10000) +  np.log(T(x))
p_churn = 1-np.sum(distribution < np.log(Y_t))/len(distribution)
``````

Are you sure `np.random.gumbel(0,σ,10000)` is equivalent to `σ*np.random.gumbel(0,1,10000)` ? Otherwise, I can’t find any fault with your code. I’m not a statistician by training, hence I recommended XGBSE which seems to have more statistical rigor.