No GPU usage when using "gpu_hist"


#1

Hello All,

Given I was having issues installing XGBoost w/ GPU support for R, I decided to just use the Python version for the time being. I’m having a weird issue where using “gpu_hist” is speeding up the XGBoost run time but without using the GPU at all.

Computer/Environment Info
CPU: i7 7820x
GPU: Nvidia RTX 2080
OS: Windows 10 Pro (64-bit)
Python: 3.7

I used the binaries posted on here when installing xgboost with GPU support.

Problem:
So, I ran a slightly modified version of the GPU demo script hosted on the xgboost github. The changes were to the print out order and I reduced the boosting iterations to 100 for the sake of sped.

Here is the printout after executing the script:

Notes:

  1. The GPU version did run faster than the CPU version (which is an OK sign)
  2. There was no printout information about Memory Allocation for the GPU (which is present in other folk’s
    printouts.
  3. Both the CPU version and the GPU version used all available CPU resources and didn’t touch the GPU according to the resource monitor (this behavior occurs regardless of the number of boosting rounds)

Resource usage with “gpu_hist” set as a parameter:

Note: Although GPU usage here was greater; it was due to applications other than execution of the python script

Resource usage with “hist” (CPU only):

Am I actually using my GPU at all? It seems bizarre that the GPU usage would be so low.

Thanks for any suggestions. I think this will be more straightforward to figure out than the issues with the R package.

PS. Sorry for the imgur links, new users can only post 1 image as an upload.


#2

Can you try again with the official binary from PyPI? Just run pip install xgboost


#3

Well mystery 1 solved. I seems the GPU is being used despite the lack of print out info. This is a screenshot from GPUz. I let the PC idle a baseline and then executed the GPU demo script. The box indicates the period of time the GPU variant was used. So clearly, XGBoost is accessing the GPU (Yay!):

Does XGBoost just always use a lot of CPU resources when using the GPU version? My hope was that I could run models on the GPU and use the CPU for some other less intensive work.

PS. Man the pip installation process is incredibly smooth compared to building from the source!


#4

@jkilbrid AFAIK the CPU is used to feed work into the GPU and will likely keep the CPU occupied.
Similarl to how running a Tensorflow script that works with the GPU, will still keep at least on CPU core completely occupied.
I don’t think you can offload the computation completely.


#5

Sorry for the lack of response! I didn’t want this to end up being a thread where the OP never responds with what fixed their issue.

Using PIP appears to have completely fixed the issue. Using some larger data sets, I can see a significant increase in the GPU’s memory usage and load in GPU-z.

All appears right in the world! Thank you for the assistance!


#6

I just wanted to add some experiences I have had using XGBoost on Windows in case others stumble across this and find it useful.

First, maybe avoid relying on Window’s own resource utilization visualizations and reporting in Task Manager. As far as I know, only Memory is accurate enough to be useful. When using XGBoost, CPU load can often show as very high, even when using gpu_hist or similar. This is indeed because the CPU is leveraged to send data to the GPU. However, Task Manager will often show 100% on all cores, when this is clearly not true. For example, if you open another monitoring solution and check the temperature of your CPU, you can clearly see that the temperature isn’t changing from near idle, which is impossible when the CPU is running 100% on all cores.

Second, the Windows GPU utilization reporting is even more useless than the CPU reporting. Instead, I suggest using the NVIDIA-SMI tool like so (found here: “C:\Program Files\NVIDIA Corporation\NVSMI”):

nvidia-smi --query-gpu=gpu_name,temperature.gpu,utilization.gpu,utilization.memory --format=csv --loop=5

I hope this is useful.


#7

I was using task manager to monitor CPU usage; GPU-Z was used to monitor GPU usage.