Does XGBoost support data encryption over the wire in distributed mode?

I didn’t find any reference in the doc/code, not sure if this is supported by some underlying framework.

It depends on which tracker you use. See the selection at https://github.com/dmlc/dmlc-core/tree/master/tracker/dmlc_tracker. In particular, at least SSH will encrypt your data over the wire.

In general, security is best achieved by setting up some kind of virtual firewall so that access is prohibited from the world (0.0.0.0). For instance, if you are using Amazon Web Services, you can set up Security Groups so that workers will have unlimited access to one another but the world would have no access to any worker.

Does the SSH tracker cover data in transit during training? It seems the code only submits the jobs.

Actually you may be right. So it would be safe to assume that data don’t get encrypted.

It looks like XGBoost4J-Spark supports data encryption, thanks to Spark integration: https://github.com/dmlc/xgboost/issues/3647

No. This issue and pull request didn’t add encryption support. In contrast, it just raises exception and warning to tell people that XGBoost’s network traffic is not protected when people enable spark.ssl.