clang-p2996

Author	SHA1	Message	Date
Andrzej Warzyński	56d6b56739	[mlir][vector] Relax the requirements on broadcast dims (#99341 ) NOTE: This is a follow-up for #97049 in which the `in_bounds` attribute was made mandatory. This PR updates the semantics of the `in_bounds` attribute so that broadcast dimensions are no longer required to be "in bounds". Specifically, these xfer_read/xfer_write Ops become valid after this change: ```mlir %read = vector.transfer_read %A[%base1, %base2], %pad {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : memref<?x?xf32>, vector<9xf32> vector.transfer_write %vec, %A[%base1, %base2], {in_bounds = [false], permutation_map = affine_map<(d0, d1) -> (0)>} {permutation_map = affine_map<(d0, d1) -> (0)>} : vector<9xf32>, memref<?x?xf32> ``` Note that the value `false` merely means "may run out-of-bounds", i.e., the corresponding access can still be "in bounds". In fact, the folder for xfer Ops is also updated () and will update the attribute value corresponding to broadcast dims to `true` if all non-broadcast dims are marked as "in bounds". Note that this PR doesn't change any of the lowerings. The changes in "SuperVectorize.cpp", "Vectorization.cpp" and "AffineMap.cpp" are simple reverts of recent changes in #97049. Those were only meant to facilitate making `in_bounds` mandatory and to work around the extra requirements for broadcast dims (those requirements ere removed in this PR). All changes in tests are also reverts of changes from #97049. For context, here's a PR in which "broadcast" dims where forced to always be "in-bounds": https://reviews.llvm.org/D102566 (*) See `foldTransferInBoundsAttribute`.	2024-10-04 07:41:20 +01:00
Benjamin Maxwell	833ce5d27b	[mlir][ArmSME] Fix test after #98043 (NFC)	2024-08-30 10:16:44 +00:00
Maciej Gabka	95d2d1cba0	Move stepvector intrinsic out of experimental namespace (#98043 ) This patch is moving out stepvector intrinsic from the experimental namespace. This intrinsic exists in LLVM for several years now, and is widely used.	2024-08-28 12:48:20 +01:00
Benjamin Maxwell	ea28668f8c	[mlir][ArmSME] Remove XFAILs (#104758 ) Buildbots have been updated to QEMU 9.0.2: https://github.com/llvm/llvm-project/pull/104758#issuecomment-2296592845	2024-08-19 14:55:09 +01:00
Zhaoshi Zheng	fe55c34d19	[MLIR][test] Run SVE and SME Integration tests using qemu-aarch64 (#101568 ) To run integration tests using qemu-aarch64 on x64 host, below flags are added to the cmake command when building mlir/llvm: -DMLIR_INCLUDE_INTEGRATION_TESTS=ON \ -DMLIR_RUN_ARM_SVE_TESTS=ON \ -DMLIR_RUN_ARM_SME_TESTS=ON \ -DARM_EMULATOR_EXECUTABLE="<...>/qemu-aarch64" \ -DARM_EMULATOR_OPTIONS="-L /usr/aarch64-linux-gnu" \ -DARM_EMULATOR_MLIR_CPU_RUNNER_EXECUTABLE="<llvm_arm64_build_top>/bin/mlir-cpu-runner-arm64" \ -DARM_EMULATOR_LLI_EXECUTABLE="<llvm_arm64_build_top>/bin/lli" \ -DARM_EMULATOR_UTILS_LIB_DIR="<llvm_arm64_build_top>/lib" The last three above are prebuilt on, or cross-built for, an aarch64 host. This patch introduced substittutions of "%native_mlir_runner_utils" etc. and use them in SVE/SME integration tests. When configured to run using qemu-aarch64, mlir runtime util libs will be loaded from ARM_EMULATOR_UTILS_LIB_DIR, if set. Some tests marked with 'UNSUPPORTED: target=aarch64{{.*}}' are still run when configured with ARM_EMULATOR_EXECUTABLE and the default target is not aarch64. A lit config feature 'mlir_arm_emulator' is added in mlir/test/lit.site.cfg.py.in and to UNSUPPORTED list of such tests.	2024-08-15 21:37:51 -07:00
Andrzej Warzyński	2ee5586ac7	[mlir][vector] Make the in_bounds attribute mandatory (#97049 ) At the moment, the in_bounds attribute has two confusing/contradicting properties: 1. It is both optional _and_ has an effective default-value. 2. The default value is "out-of-bounds" for non-broadcast dims, and "in-bounds" for broadcast dims. (see the `isDimInBounds` vector interface method for an example of this "default" behaviour [1]). This PR aims to clarify the logic surrounding the `in_bounds` attribute by: * making the attribute mandatory (i.e. it is always present), * always setting the default value to "out of bounds" (that's consistent with the current behaviour for the most common cases). #### Broadcast dimensions in tests As per [2], the broadcast dimensions requires the corresponding `in_bounds` attribute to be `true`: ``` vector.transfer_read op requires broadcast dimensions to be in-bounds ``` The changes in this PR mean that we can no longer rely on the default value in cases like the following (dim 0 is a broadcast dim): ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` Instead, the broadcast dimension has to explicitly be marked as "in bounds: ```mlir %read = vector.transfer_read %A[%base1, %base2], %f, %mask {in_bounds = [true, false], permutation_map = affine_map<(d0, d1) -> (0, d1)>} : memref<?x?xf32>, vector<4x9xf32> ``` All tests with broadcast dims are updated accordingly. #### Changes in "SuperVectorize.cpp" and "Vectorization.cpp" The following patterns in "Vectorization.cpp" are updated to explicitly set the `in_bounds` attribute to `false`: * `LinalgCopyVTRForwardingPattern` and `LinalgCopyVTWForwardingPattern` Also, `vectorizeAffineLoad` (from "SuperVectorize.cpp") and `vectorizeAsLinalgGeneric` (from "Vectorization.cpp") are updated to make sure that xfer Ops created by these hooks set the dimension corresponding to broadcast dims as "in bounds". Otherwise, the Op verifier would complain Note that there is no mechanism to verify whether the corresponding memory access are indeed in bounds. Still, this is consistent with the current behaviour where the broadcast dim would be implicitly assumed to be "in bounds". [1] `4145ad2bac/mlir/include/mlir/Interfaces/VectorInterfaces.td (L243-L246)` [2] https://mlir.llvm.org/docs/Dialects/Vector/#vectortransfer_read-vectortransferreadop	2024-07-16 16:49:52 +01:00
Mubashar Ahmad	7d69095fd5	[mlir][vector] Remove Emulated Sub-directory (#94742 ) The "Emulated" sub-directories under "ArmSVE" and "ArmSME" have been removed. Associated tests have been moved up a directory and now include the "REQUIRES" constraint for the arm-emulator.	2024-06-07 15:38:50 +01:00
Andrzej Warzyński	435114f9fe	[mlir][test] Rename Vector integration tests for CPU (nfc) (#93521 ) To keep the test filenames consistent, this patch: * removes "test-" from file names (there used to be a mix of "test-feature-1.mlir" and "feature-2.mlir"), * replaces "_" with "-" (there used to be a mix of "feature-3.mlir" and "feature_4.mlir"). Only files under test/Integration/Dialect/Vector/CPU are updated.	2024-05-30 18:06:43 +01:00
Mubashar Ahmad	bc946f5287	[mlir][vector] Add 1D vector.deinterleave lowering (#93042 ) This patch implements the lowering of vector.deinterleave for 1D vectors. For fixed vector types, the operation is lowered to two llvm shufflevector operations. One for even indexed elements and the other for odd indexed elements. A poison operation is used to satisfy the parameters of the shufflevector parameters. For scalable vectors, the llvm vector.deinterleave2 intrinsic is used for lowering. As such the results found by extraction and used to form the result struct for the intrinsic.	2024-05-30 09:42:35 +01:00
Kunwar Grover	debdbeda15	[mlir] Remove dialect specific bufferization passes (Reland) (#93535 ) These passes have been depreciated for a long time and replaced by one-shot bufferization. These passes are also unsafe because they do not check for read-after-write conflicts. Relands https://github.com/llvm/llvm-project/pull/93488 which failed on buildbot. Fixes the failure by updating integration tests to use one-shot-bufferize instead.	2024-05-28 20:04:27 +01:00
Cullen Rhodes	ea20647023	[mlir][ArmSME] NFC: -force-streaming-compatible-sve rename fixup (#93177 ) -force-streaming-compatible-sve was renamed in #92774 but this test was missed, no longer required so removing.	2024-05-28 09:07:07 +01:00
Jakub Kuderski	714aee31e1	[mlir][vector] Add result type to `interleave` assembly format (#93392 ) This is to make it more obvious for what the result type is, especially with some less trivial cases like 0-d inputs resulting in 1-d inputs or interaction with scalable vector types. Note that `vector.deinterleave` uses the same format with explicit result type. Also improve examples and clean up surrounding code.	2024-05-27 11:03:36 -04:00
Benjamin Maxwell	041baf2f60	[mlir][ArmSME] Use liveness information in the tile allocator (#90448 ) This patch rewrites the ArmSME tile allocator to use liveness information to make better tile allocation decisions and improve the correctness of the ArmSME dialect. This algorithm used here is a linear scan over live ranges, where live ranges are assigned to tiles as they appear in the program (chronologically). Live ranges release their assigned tile ID when the current program point is passed their end. This is a greedy algorithm (which is mainly to keep the implementation relatively straightforward), and because it seems to be sufficient for most kernels (e.g. matmuls) that use ArmSME. The general steps of this are roughly from https://link.springer.com/content/pdf/10.1007/3-540-45937-5_17.pdf, though there have been a few simplifications and assumptions made for our use case. Hopefully, the only changes needed for a user of the ArmSME dialect is that: - `-allocate-arm-sme-tiles` will no longer be a standalone pass - `-test-arm-sme-tile-allocation` is only for unit tests - `-convert-arm-sme-to-llvm` must happen after `-convert-scf-to-cf` - SME tile allocation is now part of the LLVM conversion By integrating this into the `ArmSME -> LLVM` conversion we can allow high-level (value-based) ArmSME operations to be side-effect-free, as we can guarantee nothing will rearrange ArmSME operations before we emit intrinsics (which could invalidate the tile allocation). The hope is for ArmSME operations to have no hidden state/side effects and allow easily lowering dialects such as `vector` and `arith` to SME, without making assumptions about how the input IR looks, as the semantics of the operations will be the same. That is no (new) side effects and the IR follows the rules of SSA (a value will never change). The aim is correctness, so we have a base for working on optimizations.	2024-05-14 14:59:01 +01:00
Oleksandr "Alex" Zinenko	5a9bdd85ee	[mlir] split transform interfaces into a separate library (#85221 ) Transform interfaces are implemented, direction or via extensions, in libraries belonging to multiple other dialects. Those dialects don't need to depend on the non-interface part of the transform dialect, which includes the growing number of ops and transitive dependency footprint. Split out the interfaces into a separate library. This in turn requires flipping the dependency from the interface on the dialect that has crept in because both co-existed in one library. The interface shouldn't depend on the transform dialect either. As a consequence of splitting, the capability of the interpreter to automatically walk the payload IR to identify payload ops of a certain kind based on the type used for the entry point symbol argument is disabled. This is a good move by itself as it simplifies the interpreter logic. This functionality can be trivially replaced by a `transform.structured.match` operation.	2024-03-20 22:15:17 +01:00
Benjamin Maxwell	780d55690e	[mlir][ArmSVE] Fix test-setArmVLBits.mlir after #83213	2024-02-29 13:02:50 +00:00
Aart Bik	c1b8c6cf41	[mlir][vector][print] do not append newline to printing pure strings (#83213 ) Since the vector.print str provides no punctuation control, it is slightly more flexible to let the client of this operation decide whether there should be a trailing newline. This allows for printing like vector.print str "nse = " vector.print %nse : index as nse = 42	2024-02-28 10:18:21 -08:00
Oleksandr "Alex" Zinenko	5468f88413	[mlir] update remaining transform tests to main pass (#81279 ) Use the main transform interpreter pass instead of the test pass. The only tests that are not updated are specific to the operation of the test pass.	2024-02-28 11:06:53 +01:00
David Spickett	575bf33a27	[mlir][ArmSME] Remove XFAIL for load-store-128-bit-tile https://gitlab.com/qemu-project/qemu/-/issues/1833 was fixed by `4b3520fd93` which was included in v8.1.1. The buildbot is now using v8.1.3 so the test is passing. https://lab.llvm.org/buildbot/#/builders/179/builds/9468	2024-02-27 08:45:01 +00:00
Cullen Rhodes	b39f5660a4	[mlir][ArmSME] Add test-lower-to-arm-sme pipeline (#81732 ) The ArmSME compilation pipeline has evolved significantly and is now sufficiently complex enough that it warrants a proper lowering pipeline that encapsulates the various passes and orderings. Currently the pipeline is loosely defined in our integration tests, but these have diverged and are not using the same passes or ordering everywhere. This patch introduces a test-lower-to-arm-sme pipeline mirroring test-lower-to-llvm that provides some sanity when running e2e examples and can be used a reference for targeting ArmSME in MLIR. All the integration tests are updated to use this pipeline. The intention is to productize the pipeline once it becomes more mature.	2024-02-23 09:42:08 +00:00
Benjamin Maxwell	08eced5fcc	[mlir][test] Add -march=aarch64 -mattr=+sve to test-scalable-interleave Fix for https://lab.llvm.org/buildbot/#/builders/179/builds/9438	2024-02-22 15:30:53 +00:00
Benjamin Maxwell	f01719afaa	[mlir][test] Add integration tests for vector.interleave (#80969 )	2024-02-22 10:21:12 +00:00
Cullen Rhodes	fff86c6111	[mlir][ArmSME] Support 4-way widening outer products (#79288 ) This patch introduces support for 4-way widening outer products. This enables the fusion of 4 'arm_sme.outerproduct' operations that are chained via the accumulator into single widened operations. Changes: - Adds the following operations: - smopa_4way, smops_4way - umopa_4way, umops_4way - sumopa_4way, sumops_4way - sumopa_4way, sumops_4way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Extends 'arm-sme-outer-product' pass. For a detailed description of these operations see the 'arm_sme.smopa_4way' description.	2024-02-07 08:17:47 +00:00
Benjamin Maxwell	042800a4dd	[mlir][ArmSME] Add initial SME vector legalization pass (#79152 ) This adds a new pass (`-arm-sme-vector-legalization`) which legalizes vector operations so that they can be lowered to ArmSME. This initial patch adds decomposition for `vector.outerproduct`, `vector.transfer_read`, and `vector.transfer_write` when they operate on vector types larger than a single SME tile. For example, a [8]x[8]xf32 outer product would be decomposed into four [4]x[4]xf32 outer products, which could then be lowered to ArmSME. These three ops have been picked as supporting them alone allows lowering matmuls that use all ZA accumulators to ArmSME. For it to be possible to legalize a vector type it has to be a multiple of an SME tile size, but other than that any shape can be used. E.g. `vector<[8]x[8]xf32>`, `vector<[4]x[16]xf32>`, `vector<[16]x[4]xf32>` can all be lowered to four `vector<[4]x[4]xf32>` operations. In future, this pass will be extended with more SME-specific rewrites to legalize unrolling the reduction dimension of matmuls (which is not type-decomposition), which is why the pass has quite a general name.	2024-01-31 11:55:22 +00:00
Cullen Rhodes	95ef8e3868	[mlir][ArmSME] Support 2-way widening outer products (#78975 ) This patch introduces support for 2-way widening outer products. This enables the fusion of 2 'arm_sme.outerproduct' operations that are chained via the accumulator into a 2-way widening outer product operation. Changes: - Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants. These map to instruction variants added in SME2 and use different intrinsics. Intrinsics are already implemented for widening variants from SME1. - Adds the following operations: - fmopa_2way, fmops_2way - smopa_2way, smops_2way - umopa_2way, umops_2way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Adds a pass 'arm-sme-outer-product-fusion' that fuses 'arm_sme.outerproduct' operations. For a detailed description of these operations see the 'arm_sme.fmopa_2way' description. The reason for introducing many operations rather than one is the signed/unsigned variants can't be distinguished with types (e.g., ui16, si16) since 'arith.extui' and 'arith.extsi' only support signless integers. A single operation would require this information and an attribute (for example) for the sign doesn't feel right if floating-point types are also supported where this wouldn't apply. Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32) introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but no subtract variant. Whilst these are not supported in this patch, it felt simpler to have separate ops for add/subtract given this.	2024-01-31 09:13:18 +00:00
Benjamin Maxwell	e280c287e4	[mlir] Add `mlir_arm_runner_utils` library for use in integration tests (#78583 ) This adds a new `mlir_arm_runner_utils` library that contains utils specific to Arm/AArch64. This is for use in MLIR integration tests. This initial patch adds `setArmVLBits()` and `setArmSVLBits()`. This allows changing vector length or streaming vector length at runtime (or setting it to a known minimum, i.e. 128-bits).	2024-01-22 09:28:13 +00:00
Cullen Rhodes	9f7fff7f13	[mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197 ) Existing 'arith::ConstantOp' conversion and tests are moved from VectorToArmSME. There's currently only a single op that's converted at the moment, but this will grow in the future as things like in-tile add are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME utils since it's relevant for both conversions.	2024-01-22 09:23:11 +00:00
Benjamin Maxwell	dc974573a8	[mlir][ArmSME][test] Make use of arm_sme.streaming_vl (NFC) (#77322 )	2024-01-11 10:24:55 +00:00
Guray Ozen	ace69e6b94	[mlir][gpu] Improve `gpu-lower-to-nvvm-pipeline` Documentation (#77062 ) This PR improves the documentation for the `gpu-lower-to-nvvm-pipeline` (as it was remaning item for #75775) - Changes pipeline `gpu-lower-to-nvvm` -> `gpu-lower-to-nvvm-pipeline` - Adds a section in GPU Dialect in website. It clarifies the pipeline's functionality in lowering primary dialects to NVVM targets.	2024-01-05 12:51:25 +01:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Benjamin Maxwell	9505cf457f	[mlir][ArmSME][test] Use `only-if-required-by-ops` rather than `enable_arm_streaming_ignore` (NFC) (#75209 ) This moves the fix out of the IR and into the pass description, which seems nicer. It also works as an integration test for the `only-if-required-by-ops` flag :)	2023-12-13 10:29:28 +00:00
Benjamin Maxwell	eaff02f28e	[mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253 ) This reworks the ArmSME dialect to use attributes for tile allocation. This has a number of advantages and corrects some issues with the previous approach: * Tile allocation can now be done ASAP (i.e. immediately after `-convert-vector-to-arm-sme`) * SSA form for control flow is now supported (e.g.`scf.for` loops that yield tiles) * ArmSME ops can be converted to intrinsics very late (i.e. after lowering to control flow) * Tests are simplified by removing constants and casts * Avoids correctness issues with representing LLVM `immargs` as MLIR values - The tile ID on the SME intrinsics is an `immarg` (so is required to be a compile-time constant), `immargs` should be mapped to MLIR attributes (this is already the case for intrinsics in the LLVM dialect) - Using MLIR values for `immargs` can lead to invalid LLVM IR being generated (and passes such as -cse making incorrect optimizations) As part of this patch we bid farewell to the following operations: ```mlir arm_sme.get_tile_id : i32 arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32> arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32 ``` These are now replaced with: ```mlir // Allocates a new tile with (indeterminate) state: arm_sme.get_tile : vector<[4]x[4]xi32> // A placeholder operation for lowering ArmSME ops to intrinsics: arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32> ``` The new tile allocation works by operations implementing the `ArmSMETileOpInterface`. This interface says that an operation needs to be assigned a tile ID, and may conditionally allocate a new SME tile. Operations allocate a new tile by implementing... ```c++ std::optional<arm_sme::ArmSMETileType> getAllocatedTileType() ``` ...and returning what type of tile the op allocates (ZAB, ZAH, etc). Operations that don't allocate a tile return `std::nullopt` (which is the default behaviour). Currently the following ops are defined as allocating: ```mlir arm_sme.get_tile arm_sme.zero arm_sme.tile_load arm_sme.outerproduct // (if no accumulator is specified) ``` Allocating operations become the roots for the tile allocation pass, which currently just (naively) assigns all transitive uses of a root operation the same tile ID. However, this is enough to handle current use cases. Once tile IDs have been allocated subsequent rewrites can forward the tile IDs to any newly created operations.	2023-11-30 10:22:22 +00:00
Benjamin Maxwell	dff97c1e4c	[mlir][ArmSME] Move ArmSME -> intrinsics lowerings to `convert-arm-sme-to-llvm` pass (#72890 ) This gives more flexibility with when these lowerings are performed, without also lowering unrelated vector ops. This is a NFC (other than adding a new `-convert-arm-sme-to-llvm` pass)	2023-11-22 13:36:36 +00:00
Benjamin Maxwell	783ac3b6fb	[mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044 ) Previously, we were inserting za.enable/disable intrinsics for functions with the "arm_za" attribute (at the MLIR level), rather than using the backend attributes. This was done to avoid a dependency on the SME ABI functions from compiler-rt (which have only recently been implemented). Doing things this way did have correctness issues, for example, calling a streaming-mode function from another streaming-mode function (both with ZA enabled) would lead to ZA being disabled after returning to the caller (where it should still be enabled). Fixing issues like this would require re-doing the ABI work already done in the backend within MLIR. Instead, this patch switches to use the "arm_new_za" (backend) attribute for enabling ZA for an MLIR function. For the integration tests, this requires some way of linking the SME ABI functions. This is done via the `%arm_sme_abi_shlib` lit substitution. By default, this expands to a stub implementation of the SME ABI functions, but this can be overridden by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable (pointing it at an alternative implementation). For now, the ArmSME integration tests pass with just stubs, as we don't make use of nested ZA-enabled calls. A future patch may add an option to compiler-rt to build the SME builtins into a standalone shared library to allow easily building/testing with the actual implementation.	2023-11-14 12:50:38 +00:00
Cullen Rhodes	4240b1790f	[mlir][ArmSME] Lower transfer_write + transpose to vertical store (#71181 ) This patch extends the lowering of vector.transfer_write in VectorToArmSME to support in-flight transpose via SME vertical store.	2023-11-10 07:51:06 +00:00
Cullen Rhodes	9783cf448a	[mlir][ArmSME] Add support for lowering masked tile_load ops (#70915 ) This patch extends ArmSMEToSCF to support lowering of masked tile_load ops. Only masks created by 'vector.create_mask' are currently supported. There are two lowerings depending on the pad. For pad of constant zero, the tile is first zeroed, then only active rows are loaded. For non-zero pad, the scalar pad is broadcast to a 1-D vector and a regular 'vector.masked_load' (will be lowered to SVE, not SME) loads each slice, with padding specified as a passthru and the 2-D mask combined into a 1-D mask. The resulting slice is then inserted into the tile with 'arm_sme.move_vector_to_tile_slice'.	2023-11-08 09:02:09 +00:00
Cullen Rhodes	fbc70c5a9e	[mlir][ArmSME] remove addressof ops to undefined symbols (NFC) The string symbols were replaced with 'vector.print str' calls in `061d978043` (#68973) but the addressof ops weren't removed. This was missed as the test is currently XFAIL'ed.	2023-11-06 11:43:43 +00:00
Cullen Rhodes	ed350bb3d8	[mlir][ArmSME] Add support for lowering masked tile_store ops (#71180 ) This patch extends ArmSMEToSCF to support lowering of masked tile_store ops. Only masks created by 'vector.create_mask' are currently supported. Example: %mask = vector.create_mask %c3, %c2 : vector<[4]x[4]xi1> arm_sme.tile_store %tile, %dest[%c0, %c0], %mask : memref<?x?xi32>, vector<[4]x[4]xi32> Produces: %num_rows = arith.constant 3 : index %num_cols = vector.create_mask %c2 : vector<[4]xi1> scf.for %slice_idx = %c0 to %num_rows step %c1 arm_sme.store_tile_slice %tile, %slice_idx, %num_cols, %dest[%slice_idx, %c0] : memref<?x?xi32>, vector<[4]xi1>, vector<[4]x[4]xi32>	2023-11-06 11:18:57 +00:00
Christian Ulmann	52491c99fa	[MLIR][LLVM] Remove typed pointer remnants from integration tests (#71208 ) This commit removes all LLVM dialect typed pointers from the integration tests. Typed pointers have been deprecated for a while now and it's planned to soon remove them from the LLVM dialect. Related PSA: https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502	2023-11-03 21:21:25 +01:00
Benjamin Maxwell	e666295011	[mlir][ArmSME] Support lowering masked vector.outerproduct ops to SME (#69604 ) This patch adds support for lowering masked outer products to SME. This is done in two stages. First, vector.outerproducts (both masked and non-masked) are rewritten to arm_sme.outerproducts. The arm_sme.outerproduct op is close to vector.outerproduct, but supports masking on the operands rather than the result. It also limits the cases it handles to things that could be (directly) lowered to SME. This currently requires that the source of the mask is a vector.create_mask op. E.g.: ```mlir %mask = vector.create_mask %dimA, %dimB : vector<[4]x[4]xi1> %result = vector.mask %mask { vector.outerproduct %vecA, %vecB : vector<[4]xf32>, vector<[4]xf32> } : vector<[4]x[4]xi1> -> vector<[4]x[4]xf32> ``` Is rewritten to: ``` %maskA = vector.create_mask %dimA : vector<[4]xi1> %maskB = vector.create_mask %dimB : vector<[4]xi1> %result = arm_sme.outerproduct %vecA, %vecB masks(%maskA, %maskB) : vector<[4]xf32>, vector<[4]xf32> ``` (The same rewrite works for non-masked vector.outerproducts too) The arm_sme.outerproduct can then be directly lowered to SME intrinsics.	2023-10-31 09:06:21 +00:00
Andrzej Warzyński	e9478b167f	[mlir][SVE] Add more e2e test for vector.contract (#70367 ) Adds basic integration tests for `vector.contract` for the dot product and matvec operations. These tests exercise scalable vectors. Depends on https://github.com/llvm/llvm-project/pull/69845	2023-10-27 15:00:01 +01:00
tyb0807	4d4f603793	[mlir][Vector] Fix integration test for vector.maskedload narrow type… (#70431 ) … emulation Currently the expected CHECK values are not correct for `fcst_maskedload` from mlir/test/Integration/Dialect/Vector/CPU/test-rewrite-narrow-types.mlir	2023-10-27 11:26:20 +02:00
tyb0807	674261b203	[mlir][Vector] Add narrow type emulation pattern for vector.maskedload (#68443 )	2023-10-27 10:49:58 +02:00
Andrzej Warzyński	f24c443e82	[mlir][SVE] Add an e2e test for vector.contract (#69845 ) Adds an end-to-end test for `vector.contract` that targets SVE (i.e. scalable vectors). Note that this requires lifting the restriction on `vector.outerproduct` (to which `vector.contract` is lowered to) that would deem the following as invalid by the Op verifier (*): ``` vector.outerproduct %27, %28, %26 {kind = #vector.kind<add>} : vector<3xf32>, vector<[2]xf32> ``` This is indeed valid as the end-to-end test demonstrates (at least when compiling for SVE).	2023-10-26 20:57:49 +01:00
Benjamin Maxwell	96e040acee	[mlir][ArmSVE] Add `-arm-sve-legalize-vector-storage` pass (#68794 ) This patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM. This pass currently fixes two issues. ## Loading and storing predicate types It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is `vector<[16]xi1>`. Smaller predicate types (`vector<[1\|2\|4\|8]xi1>`) must be converted to/from a full predicate type (referred to as a `svbool`) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics. For example: ```mlir %alloca = memref.alloca() : memref<vector<[4]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> memref.store %mask, %alloca[] : memref<vector<[4]xi1>> %reload = memref.load %alloca[] : memref<vector<[4]xi1>> ``` Becomes: ```mlir %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1> memref.store %svbool, %alloca[] : memref<vector<[16]xi1>> %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>> %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> ``` ## Relax alignments for SVE vector allocas The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for `f32`s). However, the LLVM backend currently defaults to aligning to `base size x element size` bytes. For non-legal vector types like `vector<[8]xf32>` this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue. Depends on: #68586 and #68695 (for testing)	2023-10-26 12:18:58 +01:00
Benjamin Maxwell	061d978043	[mlir][test] Update tests to use vector.print str (NFC) (#68973 ) This cuts down on a fair amount of boilerplate. Depends on: #68695	2023-10-25 10:14:34 +01:00
Oleksandr "Alex" Zinenko	e4384149b5	[mlir] use transform-interpreter in test passes (#70040 ) Update most test passes to use the transform-interpreter pass instead of the test-transform-dialect-interpreter-pass. The new "main" interpreter pass has a named entry point instead of looking up the top-level op with `PossibleTopLevelOpTrait`, which is arguably a more understandable interface. The change is mechanical, rewriting an unnamed sequence into a named one and wrapping the transform IR in to a module when necessary. Add an option to the transform-interpreter pass to target a tagged payload op instead of the root anchor op, which is also useful for repro generation. Only the test in the transform dialect proper and the examples have not been updated yet. These will be updated separately after a more careful consideration of testing coverage of the transform interpreter logic.	2023-10-24 16:12:34 +02:00
Benjamin Maxwell	3be3883e6d	[mlir][VectorOps] Support string literals in `vector.print` (#68695 ) Printing strings within integration tests is currently quite annoyingly verbose, and can't be tucked into shared helpers as the types depend on the length of the string: ``` llvm.mlir.global internal constant @hello_world("Hello, World!\0") func.func @entry() { %0 = llvm.mlir.addressof @hello_world : !llvm.ptr<array<14 x i8>> %1 = llvm.mlir.constant(0 : index) : i64 %2 = llvm.getelementptr %0[%1, %1] : (!llvm.ptr<array<14 x i8>>, i64, i64) -> !llvm.ptr<i8> llvm.call @printCString(%2) : (!llvm.ptr<i8>) -> () return } ``` So this patch adds a simple extension to `vector.print` to simplify this: ``` func.func @entry() { // Print a vector of characters ;) vector.print str "Hello, World!" return } ``` Most of the logic for this is now shared with `cf.assert` which already does something similar. Depends on #68694	2023-10-24 09:34:14 +01:00
Cullen Rhodes	d86047cb66	[mlir][ArmSME] Update tile slice layout syntax (#69151 ) This patch prefixes tile slice layout with `layout` in the assemblyFormat: - `<vertical>` -> `layout<vertical>` - `<horizontal>` -> `layout<horizontal>` The reason for this change is the current format doesn't play nicely with additional optional operands, required to support padding and masking (#69148), as it becomes ambiguous. This affects the the following ops: - arm_sme.tile_load - arm_sme.tile_store - arm_sme.load_tile_slice - arm_sme.store_tile_slice	2023-10-16 10:55:30 +01:00
Cullen Rhodes	9816edc9f3	[mlir][vector] add result type to vector.extract assembly format (#66499 ) The vector.extract assembly format currently only contains the source type, for example: %1 = vector.extract %0[1] : vector<3x7x8xf32> it's not immediately obvious if this is the source or result type. This patch improves the assembly format to make this clearer, so the above becomes: %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>	2023-09-28 11:11:16 +01:00

1 2 3 4 5

202 Commits