If we know the thread count statically and it is a constant, we can set the "amdgpu-flat-work-group-size" kernel attribute. Fixes https://github.com/llvm/llvm-project/issues/64816 in parts.
3.6 KiB
3.6 KiB