New gfx950 MFMA allows bf16 operands.
c0cc81cdc0/llvm/include/llvm/IR/IntrinsicsAMDGPU.td (L3434)
When running `amdgpu-to-rocdl`, Current logic converts bf16 to i16
always which fails to compile for newer bf16 MFMA e.g.
`v_mfma_f32_16x16x32bf16`.
Backend expects bf16 type for the operands for those newer MFMAs. This
patch fixes it.
CC: @krzysz00 @dhernandez0 @giuseros @antiagainst @kuhar
Multi-Level Intermediate Representation
See https://mlir.llvm.org/ for more information.