XGBoost4J support on Windows

Is there any interest in adding a Windows dll to the XGBoost4j jar that’s distributed in Maven Central? As far as I can tell it builds fine (and I’m willing to fix the XGBoost4j Windows build if it’s currently broken, I’ve contributed fixes for that before).

It makes it easier for downstream projects to consume XGBoost as the Maven Central jar will work across Linux, macOS and Windows x86_64 platforms. (As an aside has there been any thought about supporting macOS aarch64 when the hardware for that lands later this year?)

Distributed training in Windows is currently not operational. See https://github.com/dmlc/xgboost/issues/6049 for example. We will need to run tests and fix bugs. https://github.com/dmlc/xgboost/pull/6105 has potential to fix distributed training on Windows, but we’ll want to run tests.

EDIT. If you mean the Java/Scala package without Spark support, it might be more doable to support Windows.

The difficulty is that we don’t have access to a macOS aarch64 machine on which to test the JVM package.

If you would like to step in and help out with Windows and macOS support, let us know.

I’m currently interested in single node Java usage, we use XGBoost as a backend in our Java ML library - tribuo.org, and internally I’ve built an XGBoost4j jar with Windows, macOS and Linux support, but in the open source release of Tribuo we now depend on the XGBoost4j version in Maven Central which doesn’t have Windows.

I’m happy to help out with Windows issues for the Java package, wrt to macOS I don’t have access to an aarch64 machine either (and at the moment I’m not sure what the status of OpenJDK is on that platform, I know some work is happening http://openjdk.java.net/jeps/8251280 but I don’t know how complete it is), it was more speculative. I think the native library loader will need to become a little more complicated to make it work, separating out the binaries by both OS and architecture.

I’ve got an Apple Silicon Mac now, and XGBoost4j builds just fine on it. I’m considering putting in a PR which changes the loader to allow a macOS x64 and aarch64 binary to co-exist, as it would be very useful to me from a deployment perspective to have a single jar with macOS x64, macOS aarch64, Win x64, Linux x64 and Linux aarch64 binaries in it. This would be along the same lines as the loader code in the Java interface to ONNX Runtime as I wrote most of that so understand that loader the best - https://github.com/microsoft/onnxruntime/blob/master/java/src/main/java/ai/onnxruntime/OnnxRuntime.java#L50.

What kind of help would you want to get these binaries built? I can built them by hand and send them to you, but that’s not a great solution. I’m not sure when Github actions will add support for macOS aarch64, but it’s already got Win x64 support which we use to build TF Java and to test Tribuo on Windows platforms. I can ask my employer (Oracle) if they are interested too, maybe that could come under the sponsorship thing you’ve setup now?

That would be great! Thanks!

Currently, our CI pipeline builds the binary automatically for the x86_64 Linux platform (libxgboost4j.so). We currently build the MacOS x86_64 binary by hand and then bundle it inside the JAR. If you’d like, please submit a pull request to set up GitHub Actions to build the binary for Windows (xgboost4j.dll). The CI can upload the artifact to an S3 bucket, and later we can bundle it into the JAR manually. As for the Apple Silicon, you should build the binary manually for now and email it to me.

Note that I’m going off to one-month vacation starting on January 29, so responses will be delayed.

Ok, I can look into setting up Github Actions to build it on Windows. I’m actually on paternity leave at the moment, so it’s likely February before I get to this & the Java loader work anyway.

It looks like the Linux JVM package is built in Jenkins, is there anything special I should know about how that’s built? Also is there docs somewhere on which S3 bucket to upload it to?

I can probably add a Github Action to build the macOS x86_64 JVM package as well as Windows if you’d like. macOS should be pretty straightforward as the Github Actions macOS runner already has homebrew (https://github.com/actions/virtual-environments/blob/main/images/macos/macos-10.15-Readme.md) so all that’s necessary is adding libomp and then building it.

1 Like

The Linux build is special, in that it uses a CentOS 6 Docker container to ensure that the resulting binary libxgboost4j.so only uses an old version of GLIBC. This is important because many users want to use an old Spark cluster.

Currently, Jenkins uploads all artifacts to the S3 bucket xgboost-maven-repo. This actually doubles as a privately hosted Maven repository for all snapshots of XGBoost4J.

Thanks for your offer to help. I really appreciate it.