clang-p2996

Author	SHA1	Message	Date
Oleksandr "Alex" Zinenko	5a9bdd85ee	[mlir] split transform interfaces into a separate library (#85221 ) Transform interfaces are implemented, direction or via extensions, in libraries belonging to multiple other dialects. Those dialects don't need to depend on the non-interface part of the transform dialect, which includes the growing number of ops and transitive dependency footprint. Split out the interfaces into a separate library. This in turn requires flipping the dependency from the interface on the dialect that has crept in because both co-existed in one library. The interface shouldn't depend on the transform dialect either. As a consequence of splitting, the capability of the interpreter to automatically walk the payload IR to identify payload ops of a certain kind based on the type used for the entry point symbol argument is disabled. This is a good move by itself as it simplifies the interpreter logic. This functionality can be trivially replaced by a `transform.structured.match` operation.	2024-03-20 22:15:17 +01:00
Benjamin Maxwell	780d55690e	[mlir][ArmSVE] Fix test-setArmVLBits.mlir after #83213	2024-02-29 13:02:50 +00:00
Aart Bik	c1b8c6cf41	[mlir][vector][print] do not append newline to printing pure strings (#83213 ) Since the vector.print str provides no punctuation control, it is slightly more flexible to let the client of this operation decide whether there should be a trailing newline. This allows for printing like vector.print str "nse = " vector.print %nse : index as nse = 42	2024-02-28 10:18:21 -08:00
Oleksandr "Alex" Zinenko	5468f88413	[mlir] update remaining transform tests to main pass (#81279 ) Use the main transform interpreter pass instead of the test pass. The only tests that are not updated are specific to the operation of the test pass.	2024-02-28 11:06:53 +01:00
David Spickett	575bf33a27	[mlir][ArmSME] Remove XFAIL for load-store-128-bit-tile https://gitlab.com/qemu-project/qemu/-/issues/1833 was fixed by `4b3520fd93` which was included in v8.1.1. The buildbot is now using v8.1.3 so the test is passing. https://lab.llvm.org/buildbot/#/builders/179/builds/9468	2024-02-27 08:45:01 +00:00
Cullen Rhodes	b39f5660a4	[mlir][ArmSME] Add test-lower-to-arm-sme pipeline (#81732 ) The ArmSME compilation pipeline has evolved significantly and is now sufficiently complex enough that it warrants a proper lowering pipeline that encapsulates the various passes and orderings. Currently the pipeline is loosely defined in our integration tests, but these have diverged and are not using the same passes or ordering everywhere. This patch introduces a test-lower-to-arm-sme pipeline mirroring test-lower-to-llvm that provides some sanity when running e2e examples and can be used a reference for targeting ArmSME in MLIR. All the integration tests are updated to use this pipeline. The intention is to productize the pipeline once it becomes more mature.	2024-02-23 09:42:08 +00:00
Benjamin Maxwell	08eced5fcc	[mlir][test] Add -march=aarch64 -mattr=+sve to test-scalable-interleave Fix for https://lab.llvm.org/buildbot/#/builders/179/builds/9438	2024-02-22 15:30:53 +00:00
Benjamin Maxwell	f01719afaa	[mlir][test] Add integration tests for vector.interleave (#80969 )	2024-02-22 10:21:12 +00:00
Cullen Rhodes	fff86c6111	[mlir][ArmSME] Support 4-way widening outer products (#79288 ) This patch introduces support for 4-way widening outer products. This enables the fusion of 4 'arm_sme.outerproduct' operations that are chained via the accumulator into single widened operations. Changes: - Adds the following operations: - smopa_4way, smops_4way - umopa_4way, umops_4way - sumopa_4way, sumops_4way - sumopa_4way, sumops_4way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Extends 'arm-sme-outer-product' pass. For a detailed description of these operations see the 'arm_sme.smopa_4way' description.	2024-02-07 08:17:47 +00:00
Benjamin Maxwell	042800a4dd	[mlir][ArmSME] Add initial SME vector legalization pass (#79152 ) This adds a new pass (`-arm-sme-vector-legalization`) which legalizes vector operations so that they can be lowered to ArmSME. This initial patch adds decomposition for `vector.outerproduct`, `vector.transfer_read`, and `vector.transfer_write` when they operate on vector types larger than a single SME tile. For example, a [8]x[8]xf32 outer product would be decomposed into four [4]x[4]xf32 outer products, which could then be lowered to ArmSME. These three ops have been picked as supporting them alone allows lowering matmuls that use all ZA accumulators to ArmSME. For it to be possible to legalize a vector type it has to be a multiple of an SME tile size, but other than that any shape can be used. E.g. `vector<[8]x[8]xf32>`, `vector<[4]x[16]xf32>`, `vector<[16]x[4]xf32>` can all be lowered to four `vector<[4]x[4]xf32>` operations. In future, this pass will be extended with more SME-specific rewrites to legalize unrolling the reduction dimension of matmuls (which is not type-decomposition), which is why the pass has quite a general name.	2024-01-31 11:55:22 +00:00
Cullen Rhodes	95ef8e3868	[mlir][ArmSME] Support 2-way widening outer products (#78975 ) This patch introduces support for 2-way widening outer products. This enables the fusion of 2 'arm_sme.outerproduct' operations that are chained via the accumulator into a 2-way widening outer product operation. Changes: - Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants. These map to instruction variants added in SME2 and use different intrinsics. Intrinsics are already implemented for widening variants from SME1. - Adds the following operations: - fmopa_2way, fmops_2way - smopa_2way, smops_2way - umopa_2way, umops_2way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Adds a pass 'arm-sme-outer-product-fusion' that fuses 'arm_sme.outerproduct' operations. For a detailed description of these operations see the 'arm_sme.fmopa_2way' description. The reason for introducing many operations rather than one is the signed/unsigned variants can't be distinguished with types (e.g., ui16, si16) since 'arith.extui' and 'arith.extsi' only support signless integers. A single operation would require this information and an attribute (for example) for the sign doesn't feel right if floating-point types are also supported where this wouldn't apply. Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32) introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but no subtract variant. Whilst these are not supported in this patch, it felt simpler to have separate ops for add/subtract given this.	2024-01-31 09:13:18 +00:00
Benjamin Maxwell	e280c287e4	[mlir] Add `mlir_arm_runner_utils` library for use in integration tests (#78583 ) This adds a new `mlir_arm_runner_utils` library that contains utils specific to Arm/AArch64. This is for use in MLIR integration tests. This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This allows changing vector length or streaming vector length at runtime (or setting it to a known minimum, i.e. 128-bits).	2024-01-22 09:28:13 +00:00
Cullen Rhodes	9f7fff7f13	[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197 ) Existing 'arith::ConstantOp' conversion and tests are moved from VectorToArmSME. There's currently only a single op that's converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME utils since it's relevant for both conversions.	2024-01-22 09:23:11 +00:00
Benjamin Maxwell	dc974573a8	[mlir][ArmSME][test] Make use of arm_sme.streaming_vl (NFC) (#77322 )	2024-01-11 10:24:55 +00:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Benjamin Maxwell	9505cf457f	[mlir][ArmSME][test] Use `only-if-required-by-ops` rather than `enable_arm_streaming_ignore` (NFC) (#75209 ) This moves the fix out of the IR and into the pass description, which seems nicer. It also works as an integration test for the `only-if-required-by-ops` flag :)	2023-12-13 10:29:28 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Benjamin Maxwell	dff97c1e4c	[mlir][ArmSME] Move ArmSME -> intrinsics lowerings to `convert-arm-sme-to-llvm` pass (#72890 ) This gives more flexibility with when these lowerings are performed, without also lowering unrelated vector ops. This is a NFC (other than adding a new `-convert-arm-sme-to-llvm` pass)	2023-11-22 13:36:36 +00:00
Benjamin Maxwell	783ac3b6fb	[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044 ) Previously, we were inserting za.enable/disable intrinsics for functions with the "arm_za" attribute (at the MLIR level), rather than using the backend attributes. This was done to avoid a dependency on the SME ABI functions from compiler-rt (which have only recently been implemented). Doing things this way did have correctness issues, for example, calling a streaming-mode function from another streaming-mode function (both with ZA enabled) would lead to ZA being disabled after returning to the caller (where it should still be enabled). Fixing issues like this would require re-doing the ABI work already done in the backend within MLIR. Instead, this patch switches to use the "arm_new_za" (backend) attribute for enabling ZA for an MLIR function. For the integration tests, this requires some way of linking the SME ABI functions. This is done via the `%arm_sme_abi_shlib` lit substitution. By default, this expands to a stub implementation of the SME ABI functions, but this can be overridden by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable (pointing it at an alternative implementation). For now, the ArmSME integration tests pass with just stubs, as we don't make use of nested ZA-enabled calls. A future patch may add an option to compiler-rt to build the SME builtins into a standalone shared library to allow easily building/testing with the actual implementation.	2023-11-14 12:50:38 +00:00
Cullen Rhodes	4240b1790f	[mlir][ArmSME] Lower transfer_write + transpose to vertical store (#71181 ) This patch extends the lowering of vector.transfer_write in VectorToArmSME to support in-flight transpose via SME vertical store.	2023-11-10 07:51:06 +00:00
Cullen Rhodes	9783cf448a	[mlir][ArmSME] Add support for lowering masked tile_load ops (#70915 ) This patch extends ArmSMEToSCF to support lowering of masked tile_load ops. Only masks created by 'vector.create_mask' are currently supported. There are two lowerings depending on the pad. For pad of constant zero, the tile is first zeroed, then only active rows are loaded. For non-zero pad, the scalar pad is broadcast to a 1-D vector and a regular 'vector.masked_load' (will be lowered to SVE, not SME) loads each slice, with padding specified as a passthru and the 2-D mask combined into a 1-D mask. The resulting slice is then inserted into the tile with 'arm_sme.move_vector_to_tile_slice'.	2023-11-08 09:02:09 +00:00
Cullen Rhodes	fbc70c5a9e	[mlir][ArmSME] remove addressof ops to undefined symbols (NFC) The string symbols were replaced with 'vector.print str' calls in `061d978043` (#68973) but the addressof ops weren't removed. This was missed as the test is currently XFAIL'ed.	2023-11-06 11:43:43 +00:00
Cullen Rhodes	ed350bb3d8	[mlir][ArmSME] Add support for lowering masked tile_store ops (#71180 ) This patch extends ArmSMEToSCF to support lowering of masked tile_store ops. Only masks created by 'vector.create_mask' are currently supported. Example: %mask = vector.create_mask %c3, %c2 : vector<[4]x[4]xi1> arm_sme.tile_store %tile, %dest[%c0, %c0], %mask : memref<?x?xi32>, vector<[4]x[4]xi32> Produces: %num_rows = arith.constant 3 : index %num_cols = vector.create_mask %c2 : vector<[4]xi1> scf.for %slice_idx = %c0 to %num_rows step %c1 arm_sme.store_tile_slice %tile, %slice_idx, %num_cols, %dest[%slice_idx, %c0] : memref<?x?xi32>, vector<[4]xi1>, vector<[4]x[4]xi32>	2023-11-06 11:18:57 +00:00
Christian Ulmann	52491c99fa	[MLIR][LLVM] Remove typed pointer remnants from integration tests (#71208 ) This commit removes all LLVM dialect typed pointers from the integration tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 21:21:25 +01:00
Benjamin Maxwell	e666295011	[mlir][ArmSME] Support lowering masked vector.outerproduct ops to SME (#69604 ) This patch adds support for lowering masked outer products to SME. This is done in two stages. First, vector.outerproducts (both masked and non-masked) are rewritten to arm_sme.outerproducts. The arm_sme.outerproduct op is close to vector.outerproduct, but supports masking on the operands rather than the result. It also limits the cases it handles to things that could be (directly) lowered to SME. This currently requires that the source of the mask is a vector.create_mask op. E.g.: ```mlir %mask = vector.create_mask %dimA, %dimB : vector<[4]x[4]xi1> %result = vector.mask %mask { vector.outerproduct %vecA, %vecB : vector<[4]xf32>, vector<[4]xf32> } : vector<[4]x[4]xi1> -> vector<[4]x[4]xf32> ``` Is rewritten to: ``` %maskA = vector.create_mask %dimA : vector<[4]xi1> %maskB = vector.create_mask %dimB : vector<[4]xi1> %result = arm_sme.outerproduct %vecA, %vecB masks(%maskA, %maskB) : vector<[4]xf32>, vector<[4]xf32> ``` (The same rewrite works for non-masked vector.outerproducts too) The arm_sme.outerproduct can then be directly lowered to SME intrinsics.	2023-10-31 09:06:21 +00:00
Andrzej Warzyński	e9478b167f	[mlir][SVE] Add more e2e test for vector.contract (#70367 ) Adds basic integration tests for `vector.contract` for the dot product and matvec operations. These tests exercise scalable vectors. Depends on https://github.com/llvm/llvm-project/pull/69845	2023-10-27 15:00:01 +01:00
tyb0807	4d4f603793	[mlir][Vector] Fix integration test for vector.maskedload narrow type… (#70431 ) … emulation Currently the expected CHECK values are not correct for `fcst_maskedload` from mlir/test/Integration/Dialect/Vector/CPU/test-rewrite-narrow-types.mlir	2023-10-27 11:26:20 +02:00
tyb0807	674261b203	[mlir][Vector] Add narrow type emulation pattern for vector.maskedload (#68443 )	2023-10-27 10:49:58 +02:00
Andrzej Warzyński	f24c443e82	[mlir][SVE] Add an e2e test for vector.contract (#69845 ) Adds an end-to-end test for `vector.contract` that targets SVE (i.e. scalable vectors). Note that this requires lifting the restriction on `vector.outerproduct` (to which `vector.contract` is lowered to) that would deem the following as invalid by the Op verifier (*): ``` vector.outerproduct %27, %28, %26 {kind = #vector.kind<add>} : vector<3xf32>, vector<[2]xf32> ``` This is indeed valid as the end-to-end test demonstrates (at least when compiling for SVE).	2023-10-26 20:57:49 +01:00
Benjamin Maxwell	96e040acee	[mlir][ArmSVE] Add `-arm-sve-legalize-vector-storage` pass (#68794 ) This patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM. This pass currently fixes two issues. ## Loading and storing predicate types It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is `vector<[16]xi1>`. Smaller predicate types (`vector<[1\|2\|4\|8]xi1>`) must be converted to/from a full predicate type (referred to as a `svbool`) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics. For example: ```mlir %alloca = memref.alloca() : memref<vector<[4]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> memref.store %mask, %alloca[] : memref<vector<[4]xi1>> %reload = memref.load %alloca[] : memref<vector<[4]xi1>> ``` Becomes: ```mlir %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1> memref.store %svbool, %alloca[] : memref<vector<[16]xi1>> %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>> %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> ``` ## Relax alignments for SVE vector allocas The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for `f32`s). However, the LLVM backend currently defaults to aligning to `base size x element size` bytes. For non-legal vector types like `vector<[8]xf32>` this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue. Depends on: #68586 and #68695 (for testing)	2023-10-26 12:18:58 +01:00
Benjamin Maxwell	061d978043	[mlir][test] Update tests to use vector.print str (NFC) (#68973 ) This cuts down on a fair amount of boilerplate. Depends on: #68695	2023-10-25 10:14:34 +01:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Benjamin Maxwell	3be3883e6d	[mlir][VectorOps] Support string literals in `vector.print` (#68695 ) Printing strings within integration tests is currently quite annoyingly verbose, and can't be tucked into shared helpers as the types depend on the length of the string: ``` llvm.mlir.global internal constant @hello_world("Hello, World!\0") func.func @entry() { %0 = llvm.mlir.addressof @hello_world : !llvm.ptr<array<14 x i8>> %1 = llvm.mlir.constant(0 : index) : i64 %2 = llvm.getelementptr %0[%1, %1] : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> () return } ``` So this patch adds a simple extension to `vector.print` to simplify this: ``` func.func @entry() { // Print a vector of characters ;) vector.print str "Hello, World!" return } ``` Most of the logic for this is now shared with `cf.assert` which already does something similar. Depends on #68694	2023-10-24 09:34:14 +01:00
Cullen Rhodes	d86047cb66	[mlir][ArmSME] Update tile slice layout syntax (#69151 ) This patch prefixes tile slice layout with `layout` in the assemblyFormat: - `<vertical>` -> `layout<vertical>` - `<horizontal>` -> `layout<horizontal>` The reason for this change is the current format doesn't play nicely with additional optional operands, required to support padding and masking (#69148), as it becomes ambiguous. This affects the the following ops: - arm_sme.tile_load - arm_sme.tile_store - arm_sme.load_tile_slice - arm_sme.store_tile_slice	2023-10-16 10:55:30 +01:00
Cullen Rhodes	9816edc9f3	[mlir][vector] add result type to vector.extract assembly format (#66499 ) The vector.extract assembly format currently only contains the source type, for example: %1 = vector.extract %0[1] : vector<3x7x8xf32> it's not immediately obvious if this is the source or result type. This patch improves the assembly format to make this clearer, so the above becomes: %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>	2023-09-28 11:11:16 +01:00
Benjamin Maxwell	174cd6145b	[mlir][ArmSME] Add custom vector.print lowering for SME tiles (#66691 ) This adds a custom lowering for SME that loops over each row of the tile, extracting it via an SME MOVA, then printing with a normal 1D vector.print. This makes writing SME integration tests easier and less verbose. Depends on: #66910, #66911	2023-09-26 17:09:57 +01:00
Cullen Rhodes	eaf15900ff	[mlir][ArmSME] Add support for vector.transpose (#66760 ) This patch adds support for lowering vector.transpose to ArmSME. It's implemented by storing the input tile of the tranpose to memory and reloading vertically, building on top of the tile slice layout support. Tranposing via memory is obviously expensive, the current intention is to avoid the transpose if possible, this is therefore intended as a fallback and to provide base support for Vector ops. If it turns out transposes can't be avoided then this should be replaced with a more optimal implementation, perhaps with tile <-> vector (MOVA) ops. Depends on https://github.com/llvm/llvm-project/pull/66758.	2023-09-25 12:15:12 +01:00
Cullen Rhodes	75a71c27c1	[mlir][ArmSME] Support vertical layout in load and store ops (#66758 ) In SME a ZA tile slice is a one-dimensional set of horizontally or vertically contiguous elements within a ZA tile. Currently the load and store ops only support horizontal tile slices. This patch adds a tile slice layout attribute to the load and store ops to support both horizontal and vertical tile slices. When lowering from Vector dialect horizontal layout is the default.	2023-09-25 09:34:23 +01:00
Nicolas Vasilache	04ba475e85	[mlir][Vector] Add a rewrite pattern for better low-precision ext(bit… (#66648 ) …cast) expansion This revision adds a rewrite for sequences of vector `ext(bitcast)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the source vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` with shifts`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.	2023-09-18 19:02:46 +02:00
Nicolas Vasilache	bf7c490ab7	[mlir][Vector] Add a rewrite pattern for better low-precision bitcast… (#66387 ) …(trunci) expansion This revision adds a rewrite for sequences of vector `bitcast(trunci)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` followed by an optional final `trunci`/`extui`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.	2023-09-18 15:08:18 +02:00
Cullen Rhodes	f75d46a7ec	[mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621 ) This patch adds support for lowering vector.outerproduct to the ArmSME MOPA intrinsic for the following types: vector<[8]xf16>, vector<[8]xf16> -> vector<[8]x[8]xf16> vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16> vector<[4]xf32>, vector<[4]xf32> -> vector<[4]x[4]xf32> vector<[2]xf64>, vector<[2]xf64> -> vector<[2]x[2]xf64> The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to BFMOPA (non-widening) [2]. Note at the ISA level these variants are implemented by different architecture features, these are listed below: FMOPA (non-widening) * half-precision - +sme2p1,+sme-f16f16 * single-precision - +sme * double-precision - +sme-f64f64 BFMOPA (non-widening) * half-precision - +sme2p1,+b16b16 There's currently no way to target different features when lowering to ArmSME. Integration tests are added for F32 and F64. We use QEMU to run the integration tests but SME2 support isn't available yet, it's targeted for 9.0, so integration tests for these variants excluded. Masking is currently unsupported. Depends on #65450. [1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate- [2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-	2023-09-14 08:31:52 +01:00
Daniil Dudkin	709b27427b	[mlir][vector] Bring back `maxf`/`minf` reductions This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. In line with the mentioned RFC, this patch tackles tasks 2.3 and 2.4. It adds LLVM conversions for the `maxf`/`minf` reductions to the non-NaN-propagating LLVM intrinsics. Depends on D158618 Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D158659	2023-09-13 22:49:07 +00:00
Daniil Dudkin	4a831250b8	[mlir][vector] Rename vector reductions: `maxf` → `maximumf`, `minf` → `minimumf` This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. Here, we are addressing task 2.1 from the plan, which involves renaming the vector reductions to align with the semantics of the corresponding LLVM intrinsics. Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D158618	2023-09-13 22:49:07 +00:00
Martin Erhart	8037deb7af	[mlir][memref] Add pass to expand realloc operations, simplify lowering to LLVM There are two motivations for this change: 1. It considerably simplifies adding support for the realloc operation to the new buffer deallocation pass by lowering the realloc such that no deallocation operation is inserted and the deallocation pass itself can insert that dealloc 2. The lowering is expressed on a higher level and thus easier to understand, and the lowerings of the memref operations it is composed of don't have to be duplicated in the MemRefToLLVM lowering (also see discussion in https://reviews.llvm.org/D133424) Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D159430	2023-09-05 08:58:40 +00:00
Cullen Rhodes	3b4b6cbba5	[mlir][ArmSME] Add move vector to tile slice op and lowerings This adds a 'move_vector_to_tile_slice' op to the ArmSME dialect that moves a 1-D scalable vector to a slice of a 2-D tile at a given index. This is lowered to the 'llvm.aarch64.sme.write.horiz' intrinsic that maps to the MOVA (vector to tile, single) SME instruction [1] when lowering to LLVM. Like the SME load and store instructions this operates on ZA tile slices, which are 1D vectors of horizontally or vertically contiguous elements within a ZA tile. This patch extends the lowering of 'arith.constant' to SME to support non-zero constants using this new op. This requires materializing a loop that broadcasts the constant to each tile slice with the 'vector_to_tile_slice' op. Unlike load and store, this is done during conversion from Vector to ArmSME, rather than ArmSME to SCF. The latter would require a higher-level custom op in the ArmSME dialect like 'tile_load' and 'tile_store' and this isn't necessary. We may also remove the load and store ops in the future in favour of lowering straight from Vector, at which point this would converge. Currently only horizontal tile slices are supported. A future patch will extend this mechanism to support 'vector.broadcast'. Depends on D156980 D157004 [1] https://developer.arm.com/documentation/ddi0602 Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D157005	2023-08-29 09:29:22 +00:00
Andrzej Warzynski	3f7f1bca38	[mlir] Update AArch64 integration tests to use mlir-cpu-runner For consistency with other tests and to simplify the `RUN` lines, switch to using `mlir-cpu-runner` instead of `lli` in integrations tests targeting SSVE and SME. Differential Revision: https://reviews.llvm.org/D158719	2023-08-24 09:44:11 +00:00
Benjamin Maxwell	97da414182	[mlir][ArmSME] Lower loads/stores of (.Q) 128-bit tiles to intrinsics This follows from D155306. Loads and stores of 128-bit tiles have been confirmed to work in the `load-store-128-bit-tile.mlir` integration test. However, there is currently a bug in QEMU (see: https://gitlab.com/qemu-project/qemu/-/issues/1833) which means this test produces incorrect results (a patch for this issue is available but not yet in any released version of QEMU). Until a fixed version of QEMU is available the integration test is expected to fail. Reviewed By: c-rhodes, awarzynski Differential Revision: https://reviews.llvm.org/D158418	2023-08-23 09:16:20 +00:00
Benjamin Maxwell	07d135e16a	[mlir][ArmSME][test] Cleanup printing scalable vectors in vector-load-store.mlir (NFC) This replaces the manual print loops with `vector.print` which now supports scalable vectors. Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D157978	2023-08-16 09:51:34 +00:00
Benjamin Maxwell	f36e909da0	[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors Reland of the original patch after updating the Python binding tests, a few CUDA/GPU MLIR tests, and ensuring the assembly format is round-trippable. This patch splits the lowering of vector.print into first converting an n-D print into a loop of scalar prints of the elements, then a second pass that converts those scalar prints into the runtime calls. The former is done in VectorToSCF and the latter in VectorToLLVM. The main reason for this is to allow printing scalable vector types, which are not possible to fully unroll at compile time, though this also avoids fully unrolling very large vectors. To allow VectorToSCF to add the necessary punctuation between vectors and elements, a "punctuation" attribute has been added to vector.print. This abstracts calling the runtime functions such as printNewline(), without leaking the LLVM details into the higher abstraction levels. For example: vector.print punctuation <comma> lowers to llvm.call @printComma() : () -> () The output format and runtime functions remain the same, which avoids the need to alter a large number of tests (aside from the pipelines). Reviewed By: awarzynski, c-rhodes, aartbik Differential Revision: https://reviews.llvm.org/D156519	2023-08-11 09:29:54 +00:00
Mehdi Amini	1b272d21c8	Revert "[mlir][VectorOps] Use SCF for vector.print and allow scalable vectors" This reverts commit `490dae26cb`. Bot is broken, seems like there is a problem of ambiguity in the parser.	2023-08-09 19:37:01 -07:00

1 2 3 4

176 Commits