Understanding shared lib size diff: pip vs self-built

Hi all!

I am trying to understand why does the size of the compiled shared library libxgboost.so varies so greatly between pip install one (~200mb) and the one I build myself (~4mb).

To make things similar, I cloned the xgboost repo, and used python-packages to build a python package. Now I compare it with the one from PyPI and the difference in size of libxgboost.so is enormous. I tried playing with different build parameters (USE_OPENMP etc.) but the resulting library size is still below 5mb.

Does anyone know why the resulting sizes are so vastly different?

Or maybe someone can point me to how wheels for PyPI are built.

I think I find the reason - CUDA support. Seems like official wheels are build with CUDA, probably it is adding the size then.

Yes, CUDA increases the binary size. On Linux, we also support the use of multiple GPUs in training, and that requires linking with NCCL.

Thanks for the clarification. I built the XGBoost myself without CUDA and NCCL and it indeed became much slimmer.