XGBoost with GPU Support on MacOS

farismismar · November 16, 2018, 5:03am

Hello,

I am new to the forum and I have run into a problem compiling XGBoost with GPU support on MacOS High Sierra 10.13.6. My compilation exits successfully and I am able to invoke XGBoost from Python 3, but only as a CPU version. The moment I set tree_method='gpu_hist' in Python I get this error at run time:

XGBoostError: b'[23:00:00] src/learner.cc:186: XGBoost version not compiled with GPU support.\n\nStack 
trace returned 1 entries:\n[bt] (0) 0   libxgboost.dylib                    0x0000000113edff57 
dmlc::StackTrace[abi:cxx11](unsigned long) + 119\n\n'

Which is complaining that XGBoost was not compiled with GPU support! I tried all sorts of cmake configurations, with this being the most verbose:

cmake .. -DUSE_CUDA=ON -DCUDA_SDK_ROOT_DIR=/Developer/NVIDIA/CUDA-9.2 - 
DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DUSE_NCCL=ON - 
DNCCL_INCLUDE_DIR=/usr/local/nccl_2.3.7-1+cuda10.0_x86_64/include - 
DNCCL_LIBRARY=/usr/local/nccl_2.3.7-1+cuda10.0_x86_64/library -DGPU_COMPUTE_VER='35 61'

My eGPU is NVIDIA 1080 Ti. I used the recursive clone from git to download the latest XGBoost (0.81).

Can you please help me?

hcho3 · November 16, 2018, 8:35am

Can you post the full log of running make after cmake?

farismismar · November 16, 2018, 1:28pm

Here it is:

bash-3.2$ export CC=gcc-5
bash-3.2$ export CXX=g++-5
bash-3.2$ cmake .. -DUSE_CUDA=ON -DCUDA_SDK_ROOT_DIR=/Developer/NVIDIA/CUDA-9.2  
-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DUSE_NCCL=ON -DNCCL_INCLUDE_DIR=/usr/lo 
cal/nccl_2.3.7-1+cuda10.0_x86_64/include -DNCCL_LIBRARY=/usr/local/nccl_2.3.7-1+ 
cuda10.0_x86_64/library -DGPU_COMPUTE_VER='35 61'
-- The C compiler identification is GNU 5.5.0
-- The CXX compiler identification is GNU 5.5.0
-- Checking whether C compiler has -isysroot
-- Checking whether C compiler has -isysroot - yes
-- Checking whether C compiler supports OSX deployment target flag
-- Checking whether C compiler supports OSX deployment target flag - yes
-- Check for working C compiler: /usr/local/bin/gcc-5
-- Check for working C compiler: /usr/local/bin/gcc-5 -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Checking whether CXX compiler has -isysroot
-- Checking whether CXX compiler has -isysroot - yes
-- Checking whether CXX compiler supports OSX deployment target flag
-- Checking whether CXX compiler supports OSX deployment target flag - yes
-- Check for working CXX compiler: /usr/local/bin/g++-5
-- Check for working CXX compiler: /usr/local/bin/g++-5 -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found OpenMP_C: -fopenmp (found version "4.0") 
-- Found OpenMP_CXX: -fopenmp (found version "4.0") 
-- Found OpenMP: TRUE (found version "4.0")  
-- Setting build type to 'Release' as none was specified.
-- Looking for clock_gettime in rt
-- Looking for clock_gettime in rt - not found
-- Looking for fopen64
-- Looking for fopen64 - not found
-- Looking for C++ include cxxabi.h
-- Looking for C++ include cxxabi.h - found
-- Looking for execinfo.h
-- Looking for execinfo.h - found
-- Looking for nanosleep
-- Looking for nanosleep - found
-- Check if the system is big endian
-- Searching 16 bit integer
-- Looking for sys/types.h
-- Looking for sys/types.h - found
-- Looking for stdint.h
-- Looking for stdint.h - found
-- Looking for stddef.h
-- Looking for stddef.h - found
-- Check size of unsigned short
-- Check size of unsigned short - done
-- Using unsigned short
-- Check if the system is big endian - little endian
-- /Users/farismismar/xgboost/dmlc-core/cmake/build_config.h.in -> /Users/farismismar/xgboost/dmlc-core/include/dmlc/build_config.h
-- Performing Test SUPPORT_CXX11
-- Performing Test SUPPORT_CXX11 - Success
-- Performing Test SUPPORT_CXX0X
-- Performing Test SUPPORT_CXX0X - Success
-- Performing Test SUPPORT_MSSE2
-- Performing Test SUPPORT_MSSE2 - Success
-- Found OpenMP_C: -fopenmp (found version "4.0") 
-- Found OpenMP_CXX: -fopenmp (found version "4.0") 
-- Looking for pthread.h
-- Looking for pthread.h - found
-- Looking for pthread_create
-- Looking for pthread_create - found
-- Found Threads: TRUE  
-- Found CUDA: /usr/local/cuda (found suitable version "9.2", minimum required is "8.0") 
-- Found Nccl: /usr/local/nccl_2.3.7-1+cuda10.0_x86_64/include  
cuda architecture flags: -gencode arch=compute_35 61,code=sm_35 61;-gencode arch=compute_35 61,code=compute_35 61;
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/farismismar/xgboost/build

Thank you.

hcho3 · November 16, 2018, 5:14pm

Can you run make VERBOSE=1 and post the log too?

farismismar · November 16, 2018, 8:34pm

It is on this link (the file size is 3.5 MB).

hcho3 · November 16, 2018, 11:25pm

I don’t see any reference to nvcc in the log. I suspect you may be using the Makefile in the root, not the Makefile generated by CMake. Make sure to run make inside the build/ directory.

farismismar · November 17, 2018, 4:52am

Thank you. That seems to be the reason indeed! It is strange that the guidelines I used all mention to go back to xgboost directory not to the build/ directory. Now I am getting a compile error. Upon using the VERBOSE=1 option, I got this error:

/Users/farismismar/xgboost/src/linear/updater_gpu_coordinate.cu(232): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (size_t, uint64_t)

I used in the cmake phase Xcode clang-800.0.42.1. I tried to explicitly export CC=clang and export CXX=clang++ during both the cmake and the make, but still no luck. What should I do now?

hcho3 · November 17, 2018, 12:15pm

Can you try using Homebrew GCC instead? Set CC=gcc-5 CXX=g++-5

farismismar · November 17, 2018, 2:04pm

I must have forgotten to keep this in my edited post! Yes I tried export CC=gcc-5 and export CXX=g++-5 first before attempting CC=clang and I got this error:

nvcc fatal : GNU C/C++ compiler is no longer supported as a host compiler on Mac OS X.

This is what triggered me to try export CC=clang and export CXX=clang++

What can I do now given the nvcc error reported upon using gcc-5?

I am editing the post to share my configuration:

$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2018 NVIDIA Corporation
Built on Tue_Jun_12_23:08:12_CDT_2018
Cuda compilation tools, release 9.2, V9.2.148

$ gcc-5 --version
gcc-5 (Homebrew GCC 5.5.0_2) 5.5.0
Copyright (C) 2015 Free Software Foundation, Inc.

CUDA is 9.2, cuDNN 7. I am using High Sierra 10.13.6. I also downgraded nccl to 2.3.7 for CUDA 9.2.

hcho3 · November 17, 2018, 2:27pm

I see. In that case you should use clang.

As for the compilation error, try fixing this line

github.com

dmlc/xgboost/blob/master/src/linear/updater_gpu_coordinate.cu#L232




int n_devices = devices.Size();
bst_uint row_begin = 0;
bst_uint shard_size =
    std::ceil(static_cast<double>(p_fmat->Info().num_row_) / n_devices);


// Partition input matrix into row segments
std::vector<size_t> row_segments;
row_segments.push_back(0);
for (int d_idx = 0; d_idx < n_devices; ++d_idx) {
  bst_uint row_end = std::min(static_cast<size_t>(row_begin + shard_size),
                              p_fmat->Info().num_row_);
  row_segments.push_back(row_end);
  row_begin = row_end;
}


CHECK(p_fmat->SingleColBlock());
const auto &batch = *p_fmat->GetColumnBatches().begin();


shards.resize(n_devices);
// Create device shards

to

bst_uint row_end = std::min(static_cast<int64_t>(row_begin + shard_size),
                            p_fmat->Info().num_row_);

farismismar · November 17, 2018, 4:07pm

Thanks, error persists with a new complaint as shown below:

/Users/farismismar/xgboost/src/linear/updater_gpu_coordinate.cu(231): error: no instance of overloaded function "std::min" matches the argument list
            argument types are: (int64_t, uint64_t)

I converted both arguments to std::min as:

bst_uint row_end = std::min(static_cast<int64_t>(row_begin + shard_size),
static_cast<int64_t>(p_fmat->Info().num_row_));

and this seems to solve that error, but now I got another error due to nccl:

ld: library not found for -lnccl_static
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [../xgboost] Error 1
make[1]: *** [CMakeFiles/runxgboost.dir/all] Error 2
make: *** [all] Error 2

The latest cmake invocation is this:

cmake .. -DUSE_CUDA=ON -DCUDA_SDK_ROOT_DIR=/Developer/NVIDIA/CUDA-9.2 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DUSE_NCCL=ON -DNCCL_INCLUDE_DIR=/usr/local/nccl_2.3.7-1+cuda9.2_x86_64/include -DNCCL_LIBRARY=/usr/local/nccl_2.3.7-1+cuda9.2_x86_64/lib -DGPU_COMPUTE_VER='61' -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/libomp/include" -DOpenMP_CXX_LIB_NAMES="omp" -DOpenMP_omp_LIBRARY=/usr/local/opt/libomp/lib/libomp.a

The export variables are:

export CC=clang
export CXX=clang++
export LDFLAGS="-L/usr/local/opt/libomp/lib"
export CPPFLAGS="-I/usr/local/opt/libomp/include"
export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:/usr/local/opt/libomp/lib/:/usr/local/nccl_2.3.7-1+cuda9.2_x86_64/lib/
export LD_LIBRARY_PATH=$DYLD_LIBRARY_PATH

The nccl_static file is indeed under /usr/local/nccl_2.3.7-1+cuda9.2_x86_64/lib/libnccl_static.a

farismismar · November 17, 2018, 6:21pm

In order to simplify this, I decided to compile without NCCL:

cmake .. -DUSE_CUDA=ON -DCUDA_SDK_ROOT_DIR=/Developer/NVIDIA/CUDA-9.2 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DGPU_COMPUTE_VER='61' -DOpenMP_CXX_LIB_NAMES="omp" -DOpenMP_omp_LIBRARY=/usr/local/opt/libomp/lib/libomp.a -DOpenMP_CXX_FLAGS="-Xpreprocessor -fopenmp -I/usr/local/opt/libomp/include"

A different error now showed upon attempting Linking CXX executable ../xgboost

 "_omp_set_num_threads", referenced from:
  _XGDMatrixCreateFromMat_omp in c_api.cc.o
  _XGDMatrixCreateFromDT in c_api.cc.o
  xgboost::LearnerImpl::Configure(std::__1::vector<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > > const&) in learner.cc.o
ld: symbol(s) not found for architecture x86_64
clang: error: linker command failed with exit code 1 (use -v to see invocation)
make[2]: *** [../xgboost] Error 1
make[1]: *** [CMakeFiles/runxgboost.dir/all] Error 2
make: *** [all] Error 2

If we could solve this first, then we could worry about the multiple-GPU case with nccl. Thanks.

hcho3 · November 17, 2018, 6:38pm

The default Clang installed in Mac OSX doesn’t support OpenMP. You should set USE_OPENMP=OFF when running CMake. As for NCCL, try setting NCCL_ROOT.

farismismar · November 17, 2018, 6:58pm

It worked, but after I had to uninstall libomp totally from the system using brew uninstall libomp. Then:

cmake -DUSE_OPENMP=OFF -DUSE_CUDA=ON -DCUDA_SDK_ROOT_DIR=/Developer/NVIDIA/CUDA-9.2 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DGPU_COMPUTE_VER='61' ..

I deleted the build folder and reconstructed it prior to running cmake.

The long sought lines come in:

[ 98%] Built target runxgboost
Scanning dependencies of target xgboost
[100%] Linking CXX shared library ../lib/libxgboost.dylib
[100%] Built target xgboost

Thank you for your help. As for nccl, I will try this later (it is not yet a necessity). I hope this thread works well as a reference for people who may be going through the same problem as mine!

farismismar · November 17, 2018, 7:25pm

I decided to write a cleaned up procedure for installing XGBoost v0.81 on MacOS High Sierra 10.13.6.

Requirements:
MacOS 10.13.6, Python 3.6, CUDA 9.2, cudnn 7, Xcode clang-800.0.42.1, your NVIDIA GPU card compute version (mine is 1080 Ti and compute version is therefore 6.1).

If you have libomp, go ahead and uninstall it: brew uninstall libomp.
If you have a preinstalled version of xgboost, uninstall it too: sudo -H pip3 uninstall xgboost

Steps:

Issue git clone --recursive https://github.com/dmlc/xgboost
Empty these environment variables: CC, CXX, DYLD_LIBRARY_PATH, LD_LIBRARY_PATH, CPPFLAGS, LDFLAGS.
Edit the line 232 in xgboost/blob/master/src/linear/updater_gpu_coordinate.cu line 232:

:

 bst_uint row_end = std::min(static_cast<int64_t>(row_begin + shard_size),
                          static_cast<int64_t>(p_fmat->Info().num_row_));

Run the following:

:

export CC=clang
export CXX=clang++
cd xgboost
mkdir build
cd build

cmake -DUSE_OPENMP=OFF -DUSE_CUDA=ON -DCUDA_SDK_ROOT_DIR=/Developer/NVIDIA/CUDA-9.2 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda -DGPU_COMPUTE_VER='61' ..

Then, while in the same build directory, go ahead and issue make -j.
Finally, cd python-package; sudo -H python3 setup.py install

Now to test that things are working well from Python:

import xgboost as xgb
xgb = xgb.XGBRegressor(tree_method='gpu_hist')
xgb.fit(X_train, y_train)

Monitor your GPU activity.

hcho3 · November 17, 2018, 7:18pm

Fantastic! Thanks for the write-up.

Ps. Note that you’ll be using a single CPU thread, since OpenMP is disabled. Homebrew GCC would have supported OpenMP but unfortunately CUDA toolkit doesn’t work with it. Hence, Mac users are forced to choose between multi-core CPU training and GPU training.

farismismar · November 17, 2018, 7:20pm

Thanks a lot for this input and for all your patience and help over the last two days! Yes I am much happier with a single GPU than with multiple CPU cores!

hcho3 · November 17, 2018, 7:23pm

It’s a shame that there’s no good way to test GPU algorithms with Mac. My macbook doesn’t have NVIDIA GPU, and leading cloud services such as Amazon Web Services doesn’t offer Mac OSX on their cloud offering. (Usually the choice is between Linux and Windows.) This is also the reason why we don’t publish Python wheels for Mac.

farismismar · November 17, 2018, 7:28pm

Makes perfect sense. Hopefully this procedure would help those who want to run things locally on their computers especially if access to GCP, Azure, or AWS is not feasible. Take care :-).

hcho3 · November 18, 2018, 5:03am

@farismismar I submitted a patch to fix the compilation error: https://github.com/dmlc/xgboost/pull/3917