Privacy secure federated distributed XGBoost


#1

I’m trying to use XGBoost for secure federated learning, that is the model be trained without having direct access to users’ data. For that I need to use wrappers around the communication between the server and clients in distributed XGBoost.

I searched in the source code and I haven’t found where this communication is called, so that I can add my wrapper there. It would be very nice if anyone could tell me where it happens. If ever it is possible to do that in a python file, that would be ideal for me, but I’m fine with a c++ file if I have to.

The communications I need to change (based on my understanding of XGBoost from the original paper) are:
The clients send the candidate quantiles.
The server sends all the candidate quantiles.
The clients send the ws, r+s, r-s for all the candidate quantiles.
The server sends the global quantiles.
The clients send the hs and gs for all buckets.

Thanks,


#2

Not direct answer to you question, but perhaps this could be of help to you (repo with implementation of federated XGBoost, they use rabit):


#3

Thanks, I actually already knew this project, but it doesn’t work for me. They are more doing distributed XGBoost on multiple computers rather than real federated XGBoost. Everybody gets the model at the end and the server knows the data distribution of all workers. Also, it doesn’t really help me find where the communication happens because the only thing they do (other than the standard call to run XGBoost) is to call xgb.rabit.init() and xgb.rabit.finalize() around the code.


#4

Look for rabit::Reducer::Allreduce() in XGBoost, e.g.