This removes a runtime dependency on the CUDA Toolkit path, instead of looking up the filesystem we use a version of libdevice embedded in the binary at build time.
This removes a runtime dependency on the CUDA Toolkit path, instead of looking up the filesystem we use a version of libdevice embedded in the binary at build time.