clang-p2996

Author	SHA1	Message	Date
Kojo Acquah	04bf1a4090	Update `LowerContractionToSMMLAPattern` to ingnore matvec (#88288 ) Patterns in `LowerContractionToSMMLAPattern` are designed to handle vector-to-matrix multiplication but not matrix-to-vector. This leads to the following error when processing `rhs` with rank < 2: ``` iree-compile: /usr/local/google/home/kooljblack/code/iree-build/llvm-project/tools/mlir/include/mlir/IR/BuiltinTypeInterfaces.h.inc:268: int64_t mlir::detail::ShapedTypeTrait<mlir::VectorType>::getDimSize(unsigned int) const [ConcreteType = mlir::VectorType]: Assertion `idx < getRank() && "invalid index for shaped type"' failed. ``` Updates to explicitly check the rhs rank and fail cases that cannot process.	2024-04-10 13:18:47 -04:00
Aart Bik	f388a3a446	[mlir][sparse] update doc and examples of the [dis]assemble operations (#88213 ) The doc and examples of the [dis]assemble operations did not reflect all the recent changes on order of the operands. Also clarified some of the text.	2024-04-10 09:42:12 -07:00
Mehdi Amini	43b2b2ebce	Revert "Fix complex log1p accuracy with large abs values." (#88290 ) Reverts llvm/llvm-project#88260 The test fails on the GCC7 buildbot.	2024-04-10 18:25:16 +02:00
Johannes Reifferscheid	49ef12a08c	Fix complex log1p accuracy with large abs values. (#88260 ) This ports https://github.com/openxla/xla/pull/10503 by @pearu. The new implementation matches mpmath's results for most inputs, see caveats in the linked pull request. In addition to the filecheck test here, the accuracy was tested with XLA's complex_unary_op_test and its MLIR emitters.	2024-04-10 14:55:56 +02:00
Raghu Maddhipatla	eec41d2f8d	Revert "[Flang] [OpenMP] [Semantics] [MLIR] [Lowering] Add lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses on OMP TARGET directive." (#88198 ) Reverts llvm/llvm-project#74187	2024-04-09 16:18:56 -05:00
srcarroll	b79db39659	[mlir][linalg] Support `ParamType` in `vector_sizes` option of `VectorizeOp` transform (#87557 )	2024-04-09 15:52:40 -05:00
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Raghu Maddhipatla	9d9560facb	[Flang] [OpenMP] [Semantics] [MLIR] [Lowering] Add lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses on OMP TARGET directive. (#74187 ) Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for OMP TARGET directive and added related tests for these changes. IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive OpenMP spec states `The is_device_ptr clause indicates that its list items are device pointers.` `The has_device_addr clause indicates that its list items already have device addresses and therefore they may be directly accessed from a target device.` Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET DATA directive and OpenMP spec for them states `Each list item in the use_device_ptr clause results in a new list item that is a device pointer that refers to a device address` `Each list item in a use_device_addr clause that is present in the device data environment is treated as if it is implicitly mapped by a map clause on the construct with a map-type of alloc`	2024-04-09 14:59:20 -05:00
xiaoleis-nv	8d6469b0e0	[mlir][vector] Add lower-vector-multi-reduction pass (#87333 ) This MR adds the `lower-vector-multi-reduction` pass to lower the vector.multi_reduction operation. While the Transform Dialect includes an operation, `transform.apply_patterns.vector.lower_multi_reduction`, intended for a similar purpose, its utility is limited to projects that have adopted the Transform Dialect. Recognizing that not all projects are equipped to integrate this dialect, the proposed pass serves as a vital standalone alternative. It ensures that projects solely dependent on the traditional pass infrastructure can also benefit from the optimized lowering of `multi_reduction` operation. --------- Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>	2024-04-09 10:04:25 -07:00
Billy Zhu	6f6336858e	[MLIR][LLVM] Add DebugNameTableKind to DICompileUnit (#87974 ) Add the DebugNameTableKind field to DICompileUnit, along with its importer & exporter.	2024-04-09 06:18:07 -07:00
Kai Sasaki	51089e360e	[mlir][complex] Support fast math flag for complex.tan op (#87919 ) See https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-04-09 15:22:43 +09:00
Uday Bondhugula	0e5a53cc01	[MLIR] Fix typo bug in AffineExprVisitor for WalkResult return case (#86138 ) Fix typo bug in AffineExprVisitor for the WalkResult return case. This didn't show up immmediately because most walks in the tree didn't use walk result.	2024-04-09 08:37:57 +05:30
Matthias Braun	4a812b5912	Verify threadlocal_address constraints (#87841 ) Check invariants for `llvm.threadlocal.address` intrinsic in IR Verifier.	2024-04-08 17:47:57 -07:00
Corentin Ferry	50b937331f	[mlir] Add missing libm member operations to MathToLibm (#87981 ) This PR adds support for lowering the following Math operations to `libm` calls: * `math.absf` -> `fabsf, fabs` * `math.exp` -> `expf, exp` * `math.exp2` -> `exp2f, exp2` * `math.fma` -> `fmaf, fma` * `math.log` -> `logf, log` * `math.log2` -> `log2f, log2` * `math.log10` -> `log10f, log10` * `math.powf` -> `powf, pow` * `math.sqrt` -> `sqrtf, sqrt` These operations are direct members of `libm`, and do not seem to require any special manipulations on their operands.	2024-04-09 00:41:12 +02:00
Andrzej Warzyński	e276dcec17	[mlir][arith] Refine the verifier for arith.constant (#87999 ) Disallows initialization of scalable vectors with an attribute of arbitrary values, e.g.: ```mlir %c = arith.constant dense<[0, 1]> : vector<[2] x i32> ``` Initialization using vector splats remains allowed (i.e. when all the init values are identical): ```mlir %c = arith.constant dense<[1, 1]> : vector<[2] x i32> ``` Note: This is a re-upload of #86178	2024-04-08 21:22:00 +01:00
Andrzej Warzynski	40327a628a	Revert "[mlir][arith] Refine the verifier for arith.constant (#86178 )" This reverts commit `662c62609e`. Broken both: * https://lab.llvm.org/buildbot/#/builders/61/builds/56565	2024-04-08 14:39:20 +01:00
Adrian Kuegel	a4c84d6ac1	[mlir] Only inline if properties are used. This is a followup to `0f52f4ddd9` It breaks dialects that don't use properties yet.	2024-04-08 13:13:57 +00:00
Andrzej Warzyński	662c62609e	[mlir][arith] Refine the verifier for arith.constant (#86178 ) Disallows initialization of scalable vectors with an attribute of arbitrary values, e.g.: ```mlir %c = arith.constant dense<[0, 1]> : vector<[2] x i32> ``` Initialization using vector splats remains allowed (i.e. when all the init values are identical): ```mlir %c = arith.constant dense<[1, 1]> : vector<[2] x i32> ```	2024-04-08 13:59:27 +01:00
Billy Zhu	81a7b6454e	[MLIR][LLVM] Recursion importer handle repeated self-references (#87295 ) Followup to this discussion: https://github.com/llvm/llvm-project/pull/80251#discussion_r1535599920. The previous debug importer was correct but inefficient. For cases with mutual recursion that contain more than one back-edge, each back-edge would result in a new translated instance. This is because the previous implementation never caches any translated result with unbounded self-references. This means all translation inside a recursive context is performed from scratch, which will incur repeated run-time cost as well as repeated attribute sub-trees in the translated IR (differing only in their `recId`s). This PR refactors the importer to handle caching inside a recursive context. - In the presence of unbound self-refs, the translation result is cached in a separate cache that keeps track of the set of dependent unbound self-refs. - A dependent cache entry is valid only when all the unbound self-refs are in scope. Whenever a cached entry goes out of scope, it will be removed the next time it is looked up.	2024-04-08 01:09:54 -07:00
Fabian Mora	a2c4b7c8e2	[mlir] Add `convertInstruction` and `getSupportedInstructions` to `LLVMImportInterface` (#86799 ) This patch adds the `convertInstruction` and `getSupportedInstructions` to `LLVMImportInterface`, allowing any non-LLVM dialect to specify how to import LLVM IR instructions and overriding the default import of LLVM instructions.	2024-04-07 08:46:21 +02:00
Matthias Springer	c459a366d3	[mlir][Arith] `ValueBoundsOpInterface`: Support `arith.select` (#87870 ) This commit adds a `ValueBoundsOpInterface` implementation for `arith.select`. The implementation is almost identical to `scf.if` (#85895), but there is one special case: if the condition is a shaped value, the selection is applied element-wise and the result shape can be inferred from either operand. Note: This is a re-upload of #86383.	2024-04-07 09:36:28 +09:00
Kai Sasaki	a522dbbd62	[mlir][complex] Support fast math flag for complex.sign op (#87148 ) We are going to support the fast math flag given in `complex.sign` op in the conversion to standard dialect. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-04-06 15:35:10 +09:00
Matthias Springer	76435f2dca	[mlir][SCF] `ValueBoundsConstraintSet`: Support `scf.if` (branches) (#87860 ) This commit adds support for `scf.if` to `ValueBoundsConstraintSet`. Example: ``` %0 = scf.if ... -> index { scf.yield %a : index } else { scf.yield %b : index } ``` The following constraints hold for %0: * %0 >= min(%a, %b) * %0 <= max(%a, %b) Such constraints cannot be added to the constraint set; min/max is not supported by `IntegerRelation`. However, if we know which one of %a and %b is larger, we can add constraints for %0. E.g., if %a <= %b: * %0 >= %a * %0 <= %b This commit required a few minor changes to the `ValueBoundsConstraintSet` infrastructure, so that values can be compared while we are still in the process of traversing the IR/adding constraints. Note: This is a re-upload of #85895, which was reverted. The bug that caused the failure was fixed in #87859.	2024-04-06 13:04:49 +09:00
Jeff Niu	0f52f4ddd9	[mlir][ods] Emit "trivial" ODS getter/setters inline (#87741 ) Emitting trivial getters that amount to `(*this)->getOperand(1)` out-of-line or `getProperties().foo` is a pretty significant performance hit on these basic MLIR APIs for manipulating ops (3-4x). Emit them inline (without adding additional dependencies to header files).	2024-04-06 04:01:37 +02:00
Diego Caballero	42a6ad7bad	[mlir][Vector] Fix n-D vector.extract/insert lowering to LLVM (#87591 ) The lowering of n-D vector.extract/insert ops to LLVM is not supported but if one of these accidentally reaches the vector-to-llvm conversion patterns, we end up with a kind of puzzling crash. This PR fixes that crash and gracefully bails out in those cases.	2024-04-05 15:01:20 -07:00
Christian Ulmann	541962306d	[MLIR][LLVM] Remove bitcast pattern from type consistency pass (#87755 ) This commit removes the no longer required bitcast inserting pattern in LLVM dialect's type consistency pattern. This was previously required to enable Mem2Reg and SROA to promote accesses that had different types. Recent changes to both passes added direct support for this feature to them, so the pattern has no further use.	2024-04-05 15:47:16 +02:00
Jan Leyonberg	9708d09003	[MLIR][OpenMP] Skip host omp ops when compiling for the target device (#85239 ) This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code. This is an alternative approach to #84611, the new test in this PR was taken from there.	2024-04-05 09:25:28 -04:00
Mehdi Amini	8487e05967	Revert "[mlir][SCF] `ValueBoundsConstraintSet`: Support `scf.if` (branches) (#85895 )" This reverts commit `6b30ffef28`. gcc7 bot is broken	2024-04-05 03:00:35 -07:00
Mehdi Amini	f2d8218efa	Revert "[mlir][Arith] `ValueBoundsOpInterface`: Support `arith.select` (#86383 )" This reverts commit `62b58d3418`. gcc7 bot is broken.	2024-04-05 03:00:02 -07:00
Benjamin Maxwell	0b7362c257	[mlir][arith] Add result pretty printing for constant vscale values (#83565 ) In scalable code it is very common to have constant multiples of vscale, e.g. `4 * vscale`. This updates `arith.muli` to pretty print the result name in cases like this, so `4 * vscale` would be `%c4_vscale`. This makes reading IR dumps of scalable code a little nicer.	2024-04-05 10:48:16 +01:00
Andrzej Warzynski	5ed60ffd79	[mlir][test] Extend CMake logic for e2e tests Adds two new CMake functions to query the host system: * `check_hwcap`, * `check_emulator`. Together, these functions are used to check whether a given set of MLIR integration tests require an emulator. If yes, then the corresponding CMake var that defies the required emulator executable is also checked. `check_hwcap` relies on ELF_HWCAP for discovering CPU features from userspace on Linux systems. This is the recommended approach for Arm CPUs running on Linux as outlined in this blog post: * https://community.arm.com/arm-community-blogs/b/operating-systems-blog/posts/runtime-detection-of-cpu-features-on-an-armv8-a-cpu Other operating systems (e.g. Android) and CPU architectures will most likely require some other approach. Right now these new hooks are only used for SVE and SME integration tests. This relands #86489 with the following changes: * Replaced: `set(hwcap_test_file ${CMAKE_BINARY_DIR}/${CMAKE_FILES_DIRECTORY}/hwcap_check.c)` with: `set(hwcap_test_file ${CMAKE_BINARY_DIR}/temp/hwcap_check.c)` The former would trigger an infinite loop when running `ninja` (after the initial CMake configuration). * Fixed commit msg. Previous one was taken from the initial GH PR commit rather than the final re-worked solution (missed this when merging via GH UI). * A couple more NFCs/tweaks.	2024-04-05 08:43:37 +00:00
Christian Ulmann	ef8322f41d	[MLIR][LLVM] Improve bit- and addrspacecast folders (#87745 ) This commit extends the folders of chainable casts (bitcast and addrspacecast) to ensure that they fold a chain of the same casts into a single cast. Additionally cleans up the canonicalization test file, as this used some outdated constructs.	2024-04-05 09:14:13 +02:00
Christian Ulmann	974f1ee58d	[MLIR][LLVM][Mem2Reg] Relax type equality requirement for load and store (#87637 ) This commit relaxes Mem2Reg's type equality requirement for the LLVM dialect's load and store operations. For now, we only allow loads to be promoted if the reaching definition can be casted into a value of the target type. For stores, the same conversion casting check is applied and we ensure that their result is properly casted to the type of the memory slot. This is necessary to satisfy assumptions of the general mem2reg pass, as it creates block arguments with the types of the memory slot. This relands https://github.com/llvm/llvm-project/pull/87504	2024-04-05 08:25:36 +02:00
Matthias Springer	62b58d3418	[mlir][Arith] `ValueBoundsOpInterface`: Support `arith.select` (#86383 ) This commit adds a `ValueBoundsOpInterface` implementation for `arith.select`. The implementation is almost identical to `scf.if` (#85895), but there is one special case: if the condition is a shaped value, the selection is applied element-wise and the result shape can be inferred from either operand.	2024-04-05 13:39:14 +09:00
Matthias Springer	6b30ffef28	[mlir][SCF] `ValueBoundsConstraintSet`: Support `scf.if` (branches) (#85895 ) This commit adds support for `scf.if` to `ValueBoundsConstraintSet`. Example: ``` %0 = scf.if ... -> index { scf.yield %a : index } else { scf.yield %b : index } ``` The following constraints hold for %0: * %0 >= min(%a, %b) * %0 <= max(%a, %b) Such constraints cannot be added to the constraint set; min/max is not supported by `IntegerRelation`. However, if we know which one of %a and %b is larger, we can add constraints for %0. E.g., if %a <= %b: * %0 >= %a * %0 <= %b This commit required a few minor changes to the `ValueBoundsConstraintSet` infrastructure, so that values can be compared while we are still in the process of traversing the IR/adding constraints.	2024-04-05 13:14:00 +09:00
MaheshRavishankar	5aeb604c7c	[mlir][SCF] Modernize `coalesceLoops` method to handle `scf.for` loops with iter_args (#87019 ) As part of this extension this change also does some general cleanup 1) Make all the methods take `RewriterBase` as arguments instead of creating their own builders that tend to crash when used within pattern rewrites 2) Split `coalesePerfectlyNestedLoops` into two separate methods, one for `scf.for` and other for `affine.for`. The templatization didnt seem to be buying much there. Also general clean up of tests.	2024-04-04 13:44:24 -07:00
Jeff Niu	dad065dc6e	[mlir][ods] Fix attribute setter gen when properties are on (#87688 ) ODS was still generating the old `Operation::setAttr` hooks for ODS methods for setting attributes, when the backing implementation of the attributes was changed to properties. No idea how this wasn't noticed until now.	2024-04-04 21:39:07 +02:00
Keyi Zhang	7e87d03b45	[MLIR][CF] Fix cf.switch parsing with result numbers (#87658 ) This PR should fix the parsing bug reported in https://github.com/llvm/llvm-project/issues/87430. It allows using result number as the `cf.switch` operand.	2024-04-04 21:32:47 +02:00
Fabian Mora	220cdf940e	[mlir] Add `requiresReplacedValues` and `visitReplacedValues` to `PromotableOpInterface` (#86792 ) Add `requiresReplacedValues` and `visitReplacedValues` methods to `PromotableOpInterface`. These methods allow `PromotableOpInterface` ops to transforms definitions mutated by a `store`. This change is necessary to correctly handle the promotion of `LLVM_DbgDeclareOp`. --------- Co-authored-by: Théo Degioanni <30992420+Moxinilian@users.noreply.github.com>	2024-04-04 13:34:46 -04:00
Andrzej Warzynski	d3fe2b538d	Revert "[mlir][test] Make SME e2e tests require an emulator (#86489 )" This reverts commit `7b5255297d`. Broken bot: * https://lab.llvm.org/buildbot/#/builders/179/builds/9794	2024-04-04 17:12:37 +01:00
Christian Ulmann	e0e615efac	Revert "[MLIR][LLVM][Mem2Reg] Relax type equality requirement for load and store (#87504 )" (#87631 ) This reverts commit `d6e4582198` as it violates an assumption of Mem2Reg's block argument creation. Mem2Reg strongly assumes that all involved values have the same type as the alloca, which was relaxed by this PR. Therefore, branches got created that jumped to basic blocks with differently typed block arguments.	2024-04-04 15:07:18 +02:00
Andrzej Warzyński	7b5255297d	[mlir][test] Make SME e2e tests require an emulator (#86489 ) Integration tests for ArmSME require an emulator (there's no hardware available). Make sure that CMake complains if `MLIR_RUN_ARM_SME_TESTS` is set while `ARM_EMULATOR_EXECUTABLE` is empty. I'm also adding a note in the docs for future reference.	2024-04-04 13:40:08 +01:00
Philip Lassen	608a663c8e	[MLIR] Clean up pass options for test-loop-fusion and affine-super-vectorizer-test (#87606 ) Before the change `test-loop-fusion` and `affine-super-vectorizer-test` options were in their own category. This was because they used the standard llvm command line parsing with `llvm::cl::opt`. This PR moves them over to the mlir `Pass::Option` class. Before the change ``` $ mlir-opt --help ... General options: ... Compiler passes to run Passes: ... Pass Pipelines: ... Generic Options: .... affine-super-vectorizer-test options: --backward-slicing ... --vectorize-affine-loop-nest test-loop-fusion options: --test-loop-fusion-dependence-check ... --test-loop-fusion-transformation ``` After the change ``` $ mlir-opt --help ... General options: ... Compiler passes to run Passes: ... --affine-super-vectorizer-test --backward-slicing ... --vectorize-affine-loop-nest ... --test-loop-fusion options: --test-loop-fusion-dependence-check ... --test-loop-fusion-transformation ... Pass Pipelines: ... Generic Options: ... ``` --------- Signed-off-by: philass <plassen@groq.com>	2024-04-04 12:26:33 +02:00
Tom Eccles	cc34ad91f0	[MLIR][OpenMP] Add cleanup region to omp.declare_reduction (#87377 ) Currently, by-ref reductions will allocate the per-thread reduction variable in the initialization region. Adding a cleanup region allows that allocation to be undone. This will allow flang to support reduction of arrays stored on the heap. This conflation of allocation and initialization in the initialization should be fixed in the future to better match the OpenMP standard, but that is beyond the scope of this patch.	2024-04-04 11:19:42 +01:00
Tom Eccles	099ecdf1ec	[mlir][OpenMP] map argument to reduction initialization region (#86979 ) The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification). Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region. We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument. This argument was never used previously because for most cases the initialized value depends only upon the type of the reduction, not on the original variable. It is needed now so that we can read the array extents for the local copy from the mold. Flang support for reductions on assumed shape arrays patch 2/3	2024-04-04 10:55:42 +01:00
Matthias Springer	5e4a44380e	[mlir][Interfaces][NFC] `ValueBoundsConstraintSet`: Pass stop condition in the constructor (#86099 ) This commit changes the API of `ValueBoundsConstraintSet`: the stop condition is now passed to the constructor instead of `processWorklist`. That makes it easier to add items to the worklist multiple times and process them in a consistent manner. The current `ValueBoundsConstraintSet` is passed as a reference to the stop function, so that the stop function can be defined before the the `ValueBoundsConstraintSet` is constructed. This change is in preparation of adding support for branches.	2024-04-04 17:05:47 +09:00
Christian Ulmann	d6e4582198	[MLIR][LLVM][Mem2Reg] Relax type equality requirement for load and store (#87504 ) This commit relaxes Mem2Reg's type equality requirement for the LLVM dialect's load and store operations. For now, we only allow loads to be promoted if the reaching definition can be casted into a value of the target type. For stores, all type checks are removed, as a non-volatile store that does not write out the alloca's pointer can always be deleted.	2024-04-04 09:34:37 +02:00
Matthias Springer	a4c470555b	[mlir][linalg] Fix builder API usage in `RegionBuilderHelper` (#87451 ) Operations must be created with the supplied builder. Otherwise, the dialect conversion / greedy pattern rewrite driver can break. This commit fixes a crash in the dialect conversion: ``` within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: error: failed to legalize operation 'tosa.add' %0 = tosa.add %1, %arg2 : (tensor<10x10xf32>, tensor<xf32>) -> tensor<xf32> ^ within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: note: see current operation: %9 = "tosa.add"(%8, %arg2) : (tensor<10x10xf32>, tensor<xf32>) -> tensor<xf32> mlir-opt: llvm-project/mlir/include/mlir/IR/UseDefLists.h:198: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed. ``` This commit is the proper fix for #87297 (which was reverted).	2024-04-04 11:17:59 +09:00
Han-Chung Wang	ef5a710911	[mlir][vector] Skip 0D vectors in vector linearization. (#87577 )	2024-04-03 17:00:56 -07:00
Kojo Acquah	66fed33db0	[mlir][vector] Update `castAwayContractionLeadingOneDim` to omit transposes solely on leading unit dims. (#85694 ) Updates `castAwayContractionLeadingOneDim` to check for leading unit dimensions before inserting `vector.transpose` ops. Currently `castAwayContractionLeadingOneDim` removes all leading unit dims based on the accumulator and transpose any subsequent operands to match the accumulator indexing. This does not take into account if the transpose is strictly necessary, for instance when given this vector-matrix contract: ```mlir %result = vector.contract {indexing_maps = [affine_map<(d0, d1, d2, d3) -> (d0, d1, d3)>, affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, affine_map<(d0, d1, d2, d3) -> (d1, d2)>], iterator_types = ["parallel", "parallel", "parallel", "reduction"], kind = #vector.kind<add>} %lhs, %rhs, %acc : vector<1x1x8xi32>, vector<1x8x8xi32> into vector<1x8xi32> ``` Passing this through `castAwayContractionLeadingOneDim` pattern produces the following: ```mlir %0 = vector.transpose %arg0, [1, 0, 2] : vector<1x1x8xi32> to vector<1x1x8xi32> %1 = vector.extract %0[0] : vector<1x8xi32> from vector<1x1x8xi32> %2 = vector.extract %arg2[0] : vector<8xi32> from vector<1x8xi32> %3 = vector.contract {indexing_maps = [affine_map<(d0, d1, d2) -> (d0, d2)>, affine_map<(d0, d1, d2) -> (d0, d1, d2)>, affine_map<(d0, d1, d2) -> (d1)>], iterator_types = ["parallel", "parallel", "reduction"], kind = #vector.kind<add>} %1, %arg1, %2 : vector<1x8xi32>, vector<1x8x8xi32> into vector<8xi32> %4 = vector.broadcast %3 : vector<8xi32> to vector<1x8xi32> ``` The `vector.transpose` introduced does not affect the underlying data layout (effectively a no op), but it cannot be folded automatically. This change avoids inserting transposes when only leading unit dimensions are involved. Fixes #85691	2024-04-03 19:27:01 -04:00

1 2 3 4 5 ...

10934 Commits