Summary: In order for this to work with CUDA we need to declare functions as __host__ and __device__ while also making sure we only call the GPU functions during the CUDA / HIP compile stage.
Summary: In order for this to work with CUDA we need to declare functions as __host__ and __device__ while also making sure we only call the GPU functions during the CUDA / HIP compile stage.