clang-p2996

Author	SHA1	Message	Date
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Benjamin Maxwell	9505cf457f	[mlir][ArmSME][test] Use `only-if-required-by-ops` rather than `enable_arm_streaming_ignore` (NFC) (#75209 ) This moves the fix out of the IR and into the pass description, which seems nicer. It also works as an integration test for the `only-if-required-by-ops` flag :)	2023-12-13 10:29:28 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Benjamin Maxwell	dff97c1e4c	[mlir][ArmSME] Move ArmSME -> intrinsics lowerings to `convert-arm-sme-to-llvm` pass (#72890 ) This gives more flexibility with when these lowerings are performed, without also lowering unrelated vector ops. This is a NFC (other than adding a new `-convert-arm-sme-to-llvm` pass)	2023-11-22 13:36:36 +00:00
Benjamin Maxwell	783ac3b6fb	[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044 ) Previously, we were inserting za.enable/disable intrinsics for functions with the "arm_za" attribute (at the MLIR level), rather than using the backend attributes. This was done to avoid a dependency on the SME ABI functions from compiler-rt (which have only recently been implemented). Doing things this way did have correctness issues, for example, calling a streaming-mode function from another streaming-mode function (both with ZA enabled) would lead to ZA being disabled after returning to the caller (where it should still be enabled). Fixing issues like this would require re-doing the ABI work already done in the backend within MLIR. Instead, this patch switches to use the "arm_new_za" (backend) attribute for enabling ZA for an MLIR function. For the integration tests, this requires some way of linking the SME ABI functions. This is done via the `%arm_sme_abi_shlib` lit substitution. By default, this expands to a stub implementation of the SME ABI functions, but this can be overridden by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable (pointing it at an alternative implementation). For now, the ArmSME integration tests pass with just stubs, as we don't make use of nested ZA-enabled calls. A future patch may add an option to compiler-rt to build the SME builtins into a standalone shared library to allow easily building/testing with the actual implementation.	2023-11-14 12:50:38 +00:00
Cullen Rhodes	4240b1790f	[mlir][ArmSME] Lower transfer_write + transpose to vertical store (#71181 ) This patch extends the lowering of vector.transfer_write in VectorToArmSME to support in-flight transpose via SME vertical store.	2023-11-10 07:51:06 +00:00
Cullen Rhodes	9783cf448a	[mlir][ArmSME] Add support for lowering masked tile_load ops (#70915 ) This patch extends ArmSMEToSCF to support lowering of masked tile_load ops. Only masks created by 'vector.create_mask' are currently supported. There are two lowerings depending on the pad. For pad of constant zero, the tile is first zeroed, then only active rows are loaded. For non-zero pad, the scalar pad is broadcast to a 1-D vector and a regular 'vector.masked_load' (will be lowered to SVE, not SME) loads each slice, with padding specified as a passthru and the 2-D mask combined into a 1-D mask. The resulting slice is then inserted into the tile with 'arm_sme.move_vector_to_tile_slice'.	2023-11-08 09:02:09 +00:00
Cullen Rhodes	fbc70c5a9e	[mlir][ArmSME] remove addressof ops to undefined symbols (NFC) The string symbols were replaced with 'vector.print str' calls in `061d978043` (#68973) but the addressof ops weren't removed. This was missed as the test is currently XFAIL'ed.	2023-11-06 11:43:43 +00:00
Cullen Rhodes	ed350bb3d8	[mlir][ArmSME] Add support for lowering masked tile_store ops (#71180 ) This patch extends ArmSMEToSCF to support lowering of masked tile_store ops. Only masks created by 'vector.create_mask' are currently supported. Example: %mask = vector.create_mask %c3, %c2 : vector<[4]x[4]xi1> arm_sme.tile_store %tile, %dest[%c0, %c0], %mask : memref<?x?xi32>, vector<[4]x[4]xi32> Produces: %num_rows = arith.constant 3 : index %num_cols = vector.create_mask %c2 : vector<[4]xi1> scf.for %slice_idx = %c0 to %num_rows step %c1 arm_sme.store_tile_slice %tile, %slice_idx, %num_cols, %dest[%slice_idx, %c0] : memref<?x?xi32>, vector<[4]xi1>, vector<[4]x[4]xi32>	2023-11-06 11:18:57 +00:00
Christian Ulmann	52491c99fa	[MLIR][LLVM] Remove typed pointer remnants from integration tests (#71208 ) This commit removes all LLVM dialect typed pointers from the integration tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 21:21:25 +01:00
Benjamin Maxwell	e666295011	[mlir][ArmSME] Support lowering masked vector.outerproduct ops to SME (#69604 ) This patch adds support for lowering masked outer products to SME. This is done in two stages. First, vector.outerproducts (both masked and non-masked) are rewritten to arm_sme.outerproducts. The arm_sme.outerproduct op is close to vector.outerproduct, but supports masking on the operands rather than the result. It also limits the cases it handles to things that could be (directly) lowered to SME. This currently requires that the source of the mask is a vector.create_mask op. E.g.: ```mlir %mask = vector.create_mask %dimA, %dimB : vector<[4]x[4]xi1> %result = vector.mask %mask { vector.outerproduct %vecA, %vecB : vector<[4]xf32>, vector<[4]xf32> } : vector<[4]x[4]xi1> -> vector<[4]x[4]xf32> ``` Is rewritten to: ``` %maskA = vector.create_mask %dimA : vector<[4]xi1> %maskB = vector.create_mask %dimB : vector<[4]xi1> %result = arm_sme.outerproduct %vecA, %vecB masks(%maskA, %maskB) : vector<[4]xf32>, vector<[4]xf32> ``` (The same rewrite works for non-masked vector.outerproducts too) The arm_sme.outerproduct can then be directly lowered to SME intrinsics.	2023-10-31 09:06:21 +00:00
Andrzej Warzyński	e9478b167f	[mlir][SVE] Add more e2e test for vector.contract (#70367 ) Adds basic integration tests for `vector.contract` for the dot product and matvec operations. These tests exercise scalable vectors. Depends on https://github.com/llvm/llvm-project/pull/69845	2023-10-27 15:00:01 +01:00
tyb0807	4d4f603793	[mlir][Vector] Fix integration test for vector.maskedload narrow type… (#70431 ) … emulation Currently the expected CHECK values are not correct for `fcst_maskedload` from mlir/test/Integration/Dialect/Vector/CPU/test-rewrite-narrow-types.mlir	2023-10-27 11:26:20 +02:00
tyb0807	674261b203	[mlir][Vector] Add narrow type emulation pattern for vector.maskedload (#68443 )	2023-10-27 10:49:58 +02:00
Andrzej Warzyński	f24c443e82	[mlir][SVE] Add an e2e test for vector.contract (#69845 ) Adds an end-to-end test for `vector.contract` that targets SVE (i.e. scalable vectors). Note that this requires lifting the restriction on `vector.outerproduct` (to which `vector.contract` is lowered to) that would deem the following as invalid by the Op verifier (*): ``` vector.outerproduct %27, %28, %26 {kind = #vector.kind<add>} : vector<3xf32>, vector<[2]xf32> ``` This is indeed valid as the end-to-end test demonstrates (at least when compiling for SVE).	2023-10-26 20:57:49 +01:00
Benjamin Maxwell	96e040acee	[mlir][ArmSVE] Add `-arm-sve-legalize-vector-storage` pass (#68794 ) This patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM. This pass currently fixes two issues. ## Loading and storing predicate types It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is `vector<[16]xi1>`. Smaller predicate types (`vector<[1\|2\|4\|8]xi1>`) must be converted to/from a full predicate type (referred to as a `svbool`) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics. For example: ```mlir %alloca = memref.alloca() : memref<vector<[4]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> memref.store %mask, %alloca[] : memref<vector<[4]xi1>> %reload = memref.load %alloca[] : memref<vector<[4]xi1>> ``` Becomes: ```mlir %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1> memref.store %svbool, %alloca[] : memref<vector<[16]xi1>> %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>> %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> ``` ## Relax alignments for SVE vector allocas The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for `f32`s). However, the LLVM backend currently defaults to aligning to `base size x element size` bytes. For non-legal vector types like `vector<[8]xf32>` this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue. Depends on: #68586 and #68695 (for testing)	2023-10-26 12:18:58 +01:00
Benjamin Maxwell	061d978043	[mlir][test] Update tests to use vector.print str (NFC) (#68973 ) This cuts down on a fair amount of boilerplate. Depends on: #68695	2023-10-25 10:14:34 +01:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Benjamin Maxwell	3be3883e6d	[mlir][VectorOps] Support string literals in `vector.print` (#68695 ) Printing strings within integration tests is currently quite annoyingly verbose, and can't be tucked into shared helpers as the types depend on the length of the string: ``` llvm.mlir.global internal constant @hello_world("Hello, World!\0") func.func @entry() { %0 = llvm.mlir.addressof @hello_world : !llvm.ptr<array<14 x i8>> %1 = llvm.mlir.constant(0 : index) : i64 %2 = llvm.getelementptr %0[%1, %1] : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> () return } ``` So this patch adds a simple extension to `vector.print` to simplify this: ``` func.func @entry() { // Print a vector of characters ;) vector.print str "Hello, World!" return } ``` Most of the logic for this is now shared with `cf.assert` which already does something similar. Depends on #68694	2023-10-24 09:34:14 +01:00
Cullen Rhodes	d86047cb66	[mlir][ArmSME] Update tile slice layout syntax (#69151 ) This patch prefixes tile slice layout with `layout` in the assemblyFormat: - `<vertical>` -> `layout<vertical>` - `<horizontal>` -> `layout<horizontal>` The reason for this change is the current format doesn't play nicely with additional optional operands, required to support padding and masking (#69148), as it becomes ambiguous. This affects the the following ops: - arm_sme.tile_load - arm_sme.tile_store - arm_sme.load_tile_slice - arm_sme.store_tile_slice	2023-10-16 10:55:30 +01:00
Cullen Rhodes	9816edc9f3	[mlir][vector] add result type to vector.extract assembly format (#66499 ) The vector.extract assembly format currently only contains the source type, for example: %1 = vector.extract %0[1] : vector<3x7x8xf32> it's not immediately obvious if this is the source or result type. This patch improves the assembly format to make this clearer, so the above becomes: %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>	2023-09-28 11:11:16 +01:00
Benjamin Maxwell	174cd6145b	[mlir][ArmSME] Add custom vector.print lowering for SME tiles (#66691 ) This adds a custom lowering for SME that loops over each row of the tile, extracting it via an SME MOVA, then printing with a normal 1D vector.print. This makes writing SME integration tests easier and less verbose. Depends on: #66910, #66911	2023-09-26 17:09:57 +01:00
Cullen Rhodes	eaf15900ff	[mlir][ArmSME] Add support for vector.transpose (#66760 ) This patch adds support for lowering vector.transpose to ArmSME. It's implemented by storing the input tile of the tranpose to memory and reloading vertically, building on top of the tile slice layout support. Tranposing via memory is obviously expensive, the current intention is to avoid the transpose if possible, this is therefore intended as a fallback and to provide base support for Vector ops. If it turns out transposes can't be avoided then this should be replaced with a more optimal implementation, perhaps with tile <-> vector (MOVA) ops. Depends on https://github.com/llvm/llvm-project/pull/66758.	2023-09-25 12:15:12 +01:00
Cullen Rhodes	75a71c27c1	[mlir][ArmSME] Support vertical layout in load and store ops (#66758 ) In SME a ZA tile slice is a one-dimensional set of horizontally or vertically contiguous elements within a ZA tile. Currently the load and store ops only support horizontal tile slices. This patch adds a tile slice layout attribute to the load and store ops to support both horizontal and vertical tile slices. When lowering from Vector dialect horizontal layout is the default.	2023-09-25 09:34:23 +01:00
Nicolas Vasilache	04ba475e85	[mlir][Vector] Add a rewrite pattern for better low-precision ext(bit… (#66648 ) …cast) expansion This revision adds a rewrite for sequences of vector `ext(bitcast)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the source vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` with shifts`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.	2023-09-18 19:02:46 +02:00
Nicolas Vasilache	bf7c490ab7	[mlir][Vector] Add a rewrite pattern for better low-precision bitcast… (#66387 ) …(trunci) expansion This revision adds a rewrite for sequences of vector `bitcast(trunci)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` followed by an optional final `trunci`/`extui`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.	2023-09-18 15:08:18 +02:00
Cullen Rhodes	f75d46a7ec	[mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621 ) This patch adds support for lowering vector.outerproduct to the ArmSME MOPA intrinsic for the following types: vector<[8]xf16>, vector<[8]xf16> -> vector<[8]x[8]xf16> vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16> vector<[4]xf32>, vector<[4]xf32> -> vector<[4]x[4]xf32> vector<[2]xf64>, vector<[2]xf64> -> vector<[2]x[2]xf64> The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to BFMOPA (non-widening) [2]. Note at the ISA level these variants are implemented by different architecture features, these are listed below: FMOPA (non-widening) * half-precision - +sme2p1,+sme-f16f16 * single-precision - +sme * double-precision - +sme-f64f64 BFMOPA (non-widening) * half-precision - +sme2p1,+b16b16 There's currently no way to target different features when lowering to ArmSME. Integration tests are added for F32 and F64. We use QEMU to run the integration tests but SME2 support isn't available yet, it's targeted for 9.0, so integration tests for these variants excluded. Masking is currently unsupported. Depends on #65450. [1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate- [2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-	2023-09-14 08:31:52 +01:00
Daniil Dudkin	709b27427b	[mlir][vector] Bring back `maxf`/`minf` reductions This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. In line with the mentioned RFC, this patch tackles tasks 2.3 and 2.4. It adds LLVM conversions for the `maxf`/`minf` reductions to the non-NaN-propagating LLVM intrinsics. Depends on D158618 Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D158659	2023-09-13 22:49:07 +00:00
Daniil Dudkin	4a831250b8	[mlir][vector] Rename vector reductions: `maxf` → `maximumf`, `minf` → `minimumf` This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. Here, we are addressing task 2.1 from the plan, which involves renaming the vector reductions to align with the semantics of the corresponding LLVM intrinsics. Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D158618	2023-09-13 22:49:07 +00:00
Martin Erhart	8037deb7af	[mlir][memref] Add pass to expand realloc operations, simplify lowering to LLVM There are two motivations for this change: 1. It considerably simplifies adding support for the realloc operation to the new buffer deallocation pass by lowering the realloc such that no deallocation operation is inserted and the deallocation pass itself can insert that dealloc 2. The lowering is expressed on a higher level and thus easier to understand, and the lowerings of the memref operations it is composed of don't have to be duplicated in the MemRefToLLVM lowering (also see discussion in https://reviews.llvm.org/D133424) Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D159430	2023-09-05 08:58:40 +00:00
Cullen Rhodes	3b4b6cbba5	[mlir][ArmSME] Add move vector to tile slice op and lowerings This adds a 'move_vector_to_tile_slice' op to the ArmSME dialect that moves a 1-D scalable vector to a slice of a 2-D tile at a given index. This is lowered to the 'llvm.aarch64.sme.write.horiz' intrinsic that maps to the MOVA (vector to tile, single) SME instruction [1] when lowering to LLVM. Like the SME load and store instructions this operates on ZA tile slices, which are 1D vectors of horizontally or vertically contiguous elements within a ZA tile. This patch extends the lowering of 'arith.constant' to SME to support non-zero constants using this new op. This requires materializing a loop that broadcasts the constant to each tile slice with the 'vector_to_tile_slice' op. Unlike load and store, this is done during conversion from Vector to ArmSME, rather than ArmSME to SCF. The latter would require a higher-level custom op in the ArmSME dialect like 'tile_load' and 'tile_store' and this isn't necessary. We may also remove the load and store ops in the future in favour of lowering straight from Vector, at which point this would converge. Currently only horizontal tile slices are supported. A future patch will extend this mechanism to support 'vector.broadcast'. Depends on D156980 D157004 [1] https://developer.arm.com/documentation/ddi0602 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D157005	2023-08-29 09:29:22 +00:00
Andrzej Warzynski	3f7f1bca38	[mlir] Update AArch64 integration tests to use mlir-cpu-runner For consistency with other tests and to simplify the `RUN` lines, switch to using `mlir-cpu-runner` instead of `lli` in integrations tests targeting SSVE and SME. Differential Revision: https://reviews.llvm.org/D158719	2023-08-24 09:44:11 +00:00
Benjamin Maxwell	97da414182	[mlir][ArmSME] Lower loads/stores of (.Q) 128-bit tiles to intrinsics This follows from D155306. Loads and stores of 128-bit tiles have been confirmed to work in the `load-store-128-bit-tile.mlir` integration test. However, there is currently a bug in QEMU (see: https://gitlab.com/qemu-project/qemu/-/issues/1833) which means this test produces incorrect results (a patch for this issue is available but not yet in any released version of QEMU). Until a fixed version of QEMU is available the integration test is expected to fail. Reviewed By: c-rhodes, awarzynski Differential Revision: https://reviews.llvm.org/D158418	2023-08-23 09:16:20 +00:00
Benjamin Maxwell	07d135e16a	[mlir][ArmSME][test] Cleanup printing scalable vectors in vector-load-store.mlir (NFC) This replaces the manual print loops with `vector.print` which now supports scalable vectors. Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D157978	2023-08-16 09:51:34 +00:00
Benjamin Maxwell	f36e909da0	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests, a few CUDA/GPU MLIR tests, and ensuring the assembly format is round-trippable. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print punctuation <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-11 09:29:54 +00:00
Mehdi Amini	1b272d21c8	Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors" This reverts commit `490dae26cb`. Bot is broken, seems like there is a problem of ambiguity in the parser.	2023-08-09 19:37:01 -07:00
Benjamin Maxwell	490dae26cb	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests and a few CUDA/GPU MLIR tests. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-09 11:47:18 +00:00
Benjamin Maxwell	b160442dd2	Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors" This reverts commit `3875804a07`. This caused some test failures for the MLIR python bindings. Reverting until those are addressed.	2023-08-09 09:54:05 +00:00
Benjamin Maxwell	3875804a07	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-09 09:38:05 +00:00
Cullen Rhodes	65a6be5de9	[mlir][ArmSME] Use memref indices for load and store This patch extends the ArmSME load and store op lowering to use the memref indices. An integration test that loads two 32-bit element ZA tiles from memory and stores them back to memory in reverse order to verify this is added. Depends on D156467 D156558 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D156689	2023-08-03 08:50:12 +00:00
Matthias Springer	4e9eaa2e52	[mlir][vector] Allow out-of-bounds starting positition for vector transfer ops The starting indices of all vector dimensions are allowed to be out-of-bounds. E.g.: ``` // %j is allowed to be out-of-bounds (but not %i). %0 = vector.transfer_read %m[%i, %j] ... {in_bounds = [false]} : memref<?x?xf32>, vector<5xf32> ``` This revision just updates the op documentation and adds extra test cases. Out-of-bounds starting points are already supported by the respective lowerings: * 2D and higher-dimensional transfers are lowered to 1D transfers by `VectorToScf`. These patterns generate an `scf.if` check for every (potentially unrolled) loop iteration if the dimension is `in_bounds = false`, including the first loop iteration. - 1D out-of-bounds transfers are lowered to in-bounds transfers by `MaterializeTransferMask`, which adds a mask to the op. The mask is defined by `vector.create_mask (dim-size) - (index)`. In case of an out-of-bounds starting point, the operand of the `vector.create_mask` op is 0 or negative. Negative operands are treated like 0 according to the documentation of `vector.create_mask`. Differential Revision: https://reviews.llvm.org/D155719	2023-08-02 15:31:09 +02:00
Cullen Rhodes	6081f562ec	[mlir][ArmSME] Use vector.reduction add in zero test The inner 1d vector row can be summed with vector.reduction op. The earlier mul reduction can't be updated similarly as it currently crashes in the backend with: LLVM ERROR: Expanding reductions for scalable vectors is undefined. Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D156701	2023-08-01 08:22:51 +00:00
Cullen Rhodes	9e1b825321	[mlir][ArmSME] Add conversion from ArmSME to SCF to materialize loops Currently a loop is materialized when lowering ArmSME loads and stores to intrinsics. This patch introduces two new ops to the ArmSME dialect that map 1-1 with intrinsics: 1. arm_sme.load_tile_slice - Loads a 1D tile slice from memory into a 2D SME "virtual tile". 2. arm_sme.store_tile_slice - Stores a 1D tile slice from a 2D SME "virtual tile" into memory. As well as a new conversion pass '-convert-arm-sme-to-scf' that materializes loops with these ops. The existing load/store lowering to intrinsics is updated to use these ops. Depends on D156517 Discourse thread: https://discourse.llvm.org/t/loop-materialization-in-armsme/72354 Reviewed By: awarzynski, dcaballe, WanderAway Differential Revision: https://reviews.llvm.org/D156467	2023-08-01 08:20:02 +00:00
Cullen Rhodes	ca9a3354d0	[mlir][ArmSME] Add tile load op and extend tile store tile size support This extends the existing 'arm_sme.tile_store' op to support all tile sizes and adds a new op 'arm_sme.tile_load', as well as lowerings from vector -> custom ops and custom ops -> intrinsics. Currently there's no lowering for i128. Depends on D154867 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D155306	2023-07-25 08:28:36 +00:00
Andrzej Warzynski	e62f366b01	[mlir] Update SVE integration tests to use mlir-cpu-runner With the recent addition of "-mattr" and "-march" to the list of options supported by mlir-cpu-runner [1], the SVE integration tests can be updated to use mlir-cpu-runner instead of lli. This will allow better code re-use and more consistency This patch updates 2 tests to demonstrate the new logic. The remaining tests will be updated in the follow-up patches. [1] https://reviews.llvm.org/D146917 Depends on D155403 Differential Revision: https://reviews.llvm.org/D155405	2023-07-19 08:29:17 +00:00
Cullen Rhodes	fb54fec726	[mlir][ArmSME] Implement tile allocation This patch adds a pass '-allocate-sme-tiles' to the ArmSME dialect that implements allocation of SME ZA tiles. It does this at the 'func.func' op level by replacing 'arm_sme.get_tile_id' ops with 'arith.constant' ops that represent the tile number. The tiles in use in a given function are tracked by an integer function attribute 'arm_sme.tiles_in_use' that is a 16-bit tile mask with a bit for each 128-bit element tile (ZA0.Q-ZA15.Q), the smallest ZA tile granule. This is initialized on the first 'arm_sme.get_tile_id' rewrite and updated on each subsequent rewrite. Mixing of different element tile types is supported. Section B2.3.2 of the SME spec [1] describes how the 128-bit element tiles overlap with other element tiles. Depends on D154941 [1] https://developer.arm.com/documentation/ddi0616/aa Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D154955	2023-07-18 08:46:40 +00:00
Andrzej Warzynski	447bb5bee4	[mlir][ArmSME] Introduce new lowering layer (Vector -> ArmSME) At the moment, the lowering from the Vector dialect to SME looks like this: * Vector --> SME LLVM IR intrinsics This patch introduces a new lowering layer between the Vector dialect and the Arm SME extension: * Vector --> ArmSME dialect (custom Ops) --> SME LLVM IR intrinsics. This is motivated by 2 considerations: 1. Storing `ZA` to memory (e.g. `vector.transfer_write`) requires an `scf.for` loop over all rows of `ZA`. Similar logic will apply to "load to ZA from memory". This is a rather complex transformation and a custom Op seems justified. 2. As discussed in [1], we need to prevent the LLVM type converter from having to convert types unsupported in LLVM, e.g. `vector<[16]x[16]xi8>`. A dedicated abstraction layer with custom Ops opens a path to some fine tuning (e.g. custom type converters) that will allow us to avoid this. To facilitate this change, two new custom SME Op are introduced: * `TileStoreOp`, and * `ZeroOp`. Note that no new functionality is added - these Ops merely model what's already supported. In particular, the following tile size is assumed (dimension and element size are fixed): * `vector<[16]x[16]xi8>` The new lowering layer is introduced via a conversion pass between the Vector and the SME dialects. You can use the `-convert-vector-to-sme` flag to run it. The following function: ``` func.func @example(%arg0 : memref<?x?xi8>) { // (...) %cst = arith.constant dense<0> : vector<[16]x[16]xi8> vector.transfer_write %cst, %arg0 : vector<[16]x[16]xi8>, memref<?x?xi8> return } ``` would be lowered to: ``` func.func @example(%arg0: memref<?x?xi8>) { // (...) %0 = arm_sme.zero : vector<[16]x[16]xi8> arm_sme.tile_store %arg0[%c0, %c0], %0 : memref<?x?xi8>, vector<[16]x[16]xi8> return } ``` Later, a mechanism will be introduced to guarantee that `arm_sme.zero` and `arm_sme.tile_store` operate on the same virtual tile. For `i8` elements this is not required as there is only one tile. In order to lower the above output to LLVM, use * `-convert-vector-to-llvm="enable-arm-sme"`. [1] https://github.com/openxla/iree/issues/14294 Reviewed By: WanderAway Differential Revision: https://reviews.llvm.org/D154867	2023-07-18 08:04:59 +00:00
Matthias Springer	1b0bdffbd3	[mlir][vector] Fix test case There was an invalid test case in `test-transfer-read-1d.mlir`. A read was going out-of-bounds, but the dimension was marked as in-bounds. Differential Revision: https://reviews.llvm.org/D154855	2023-07-10 18:04:25 +02:00
Cullen Rhodes	564713c471	[mlir][ArmSME] Add basic lowering of vector.transfer_write to zero This patch adds support for lowering a 'vector.transfer_write' of zeroes and type 'vector<[16x16]xi8>' to the SME 'zero {za}' instruction [1], which zeroes the entire accumulator, and then writing it out to memory with the 'str' instruction [2]. This contributes to supporting a path from 'linalg.fill' to SME. [1] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/ZERO--Zero-a-list-of-64-bit-element-ZA-tiles- [2] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/STR--Store-vector-from-ZA-array- Reviewed By: awarzynski, dcaballe, WanderAway Differential Revision: https://reviews.llvm.org/D152508	2023-07-03 10:18:43 +00:00
Matthias Springer	1826c728cf	[mlir][transform] SequenceOp: Top-level operations can be used as matchers As a convenience to the user, top-level sequence ops can optionally be used as matchers: the op type is specified by the type of the block argument. This is similar to how pass pipeline targets can be specified on the command line (`-pass-pipeline='builtin.module(func.func(...))`). Differential Revision: https://reviews.llvm.org/D153121	2023-06-19 09:06:18 +02:00

1 2 3 4

162 Commits