I am trying to speedup my current online inference service in Intel CPU to GPU. But the result seems to be way slower than CPU. My profiling result shows that CPU is maybe 8 times faster than GPU doing prediction. The cuda kernal prediction code takes up to 40ms(PredictKernel). My question is:
- in what circumstances GPU inference is faster than CPU.
- what should I do to speedup the gpu inference.