clang-p2996

Files

Cullen Rhodes f75d46a7ec [mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621 )

This patch adds support for lowering vector.outerproduct to the ArmSME
MOPA intrinsic for the following types:

  vector<[8]xf16>,  vector<[8]xf16>  -> vector<[8]x[8]xf16>
  vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16>
  vector<[4]xf32>,  vector<[4]xf32>  -> vector<[4]x[4]xf32>
  vector<[2]xf64>,  vector<[2]xf64>  -> vector<[2]x[2]xf64>

The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to
BFMOPA
(non-widening) [2].

Note at the ISA level these variants are implemented by different
architecture features, these are listed below:

  FMOPA (non-widening)
    * half-precision   - +sme2p1,+sme-f16f16
    * single-precision - +sme
    * double-precision - +sme-f64f64
  BFMOPA (non-widening)
    * half-precision   - +sme2p1,+b16b16

There's currently no way to target different features when lowering to
ArmSME. Integration tests are added for F32 and F64. We use QEMU to run
the integration tests but SME2 support isn't available yet, it's
targeted for 9.0, so integration tests for these variants excluded.

Masking is currently unsupported.

Depends on #65450.

[1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate-
[2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-

2023-09-14 08:31:52 +01:00

lit.local.cfg

[NFC][Py Reformat] Reformat python files in mlir subdir

2023-05-26 08:05:40 +02:00

load-store-128-bit-tile.mlir

[mlir][ArmSME] Lower loads/stores of (.Q) 128-bit tiles to intrinsics

2023-08-23 09:16:20 +00:00

test-outerproduct-f32.mlir

[mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621 )

2023-09-14 08:31:52 +01:00

test-outerproduct-f64.mlir

[mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621 )