This changes adds the option to lower to NvGpu dialect ops during the VectorToGPU convsersion pass. Because this transformation reuses existing VectorToGPU logic, a seperate VectorToNvGpu conversion pass is not created. The option `use-nvgpu` is added to the VectorToGPU pass. When this is true, the pass will attempt to convert slices rooted at `vector.contract` operations into `nvgpu.mma.sync` ops, and `vector.transfer_read` ops are converted to either `nvgpu.ldmatrix` or one or more `vector.load` operations. The specific data loaded will depend on the thread id within a subgroup (warp). These index calculations depend on data type and shape of the MMA op according to the downstream PTX specification. The code for supporting these details is separated into `NvGpuSupport.cpp|h`. Differential Revision: https://reviews.llvm.org/D122940
12 KiB
12 KiB