clang-p2996

Author	SHA1	Message	Date
Nicolas Vasilache	04ba475e85	[mlir][Vector] Add a rewrite pattern for better low-precision ext(bit… (#66648 ) …cast) expansion This revision adds a rewrite for sequences of vector `ext(bitcast)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the source vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` with shifts`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.	2023-09-18 19:02:46 +02:00
Jie Fu	dd6dde1166	[mlir][Vector] Fix -Wunused-function in VectorEmulateNarrowType.cpp (NFC) /data/llvm-project/mlir/lib/Dialect/Vector/Transforms/VectorEmulateNarrowType.cpp:229:21: error: unused function 'operator<<' [-Werror,-Wunused-function] static raw_ostream &operator<<(raw_ostream &os, ^ 1 error generated.	2023-09-18 21:47:33 +08:00
frgossen	06f9ffa050	Fix unused variable (#66644 )	2023-09-18 09:35:20 -04:00
Nicolas Vasilache	bf7c490ab7	[mlir][Vector] Add a rewrite pattern for better low-precision bitcast… (#66387 ) …(trunci) expansion This revision adds a rewrite for sequences of vector `bitcast(trunci)` to use a more efficient sequence of vector operations comprising `shuffle` and `bitwise` ops. Such patterns appear naturally when writing quantization / dequantization functionality with the vector dialect. The rewrite performs a simple enumeration of each of the bits in the result vector and determines its provenance in the pre-trunci vector. The enumeration is used to generate the proper sequence of `shuffle`, `andi`, `ori` followed by an optional final `trunci`/`extui`. The rewrite currently only applies to 1-D non-scalable vectors and bails out if the final vector element type is not a multiple of 8. This is a failsafe heuristic determined empirically: if the resulting type is not an even number of bytes, further complexities arise that are not improved by this pattern: the heavy lifting still needs to be done by LLVM.	2023-09-18 15:08:18 +02:00
Matthias Springer	5cf714bb2f	[mlir][SCF] scf.for: Consistent API around `initArgs` (#66512 ) * Always use the auto-generated `getInitArgs` function. Remove the hand-written `getInitOperands` duplicate. * Remove `hasIterOperands` and `getNumIterOperands`. The names were inconsistent because the "arg" is called `initArgs` in TableGen. Use `getInitArgs().size()` instead. * Fix verification around ops with no results.	2023-09-18 09:13:43 +02:00
Andrzej Warzyński	57cf6896cd	[mlir][vector] Fix vector.broadcast lowering for scalable vectors (#66344 ) This patch makes sure that the following case is lowered correctly ("duplication"): ``` func.func @broadcast_scalable_duplication(%arg0: vector<[32]xf32>) -> vector<1x[32]xf32> { %res = vector.broadcast %arg0 : vector<[32]xf32> to vector<1x[32]xf32> return %res : vector<1x[32]xf32> } ```	2023-09-15 16:35:47 +01:00
Christopher Bate	831041be79	[mlir][vector] Cleanup VectorUnroll and create a generic tile iteration utility This change refactors some of the utilities used to unroll larger vector computations into smaller vector computations. In fact, the indexing computations used here are rather generic and are useful in other dialects or downstream projects. Therefore, a utility for iterating over all possible tile offsets for a particular pair of static (shape, tiled shape) is introduced in IndexingUtils and replaces the existing computations in the vector unrolling transformations. This builds off of the refactoring of IndexingUtils introduced in `203fad476b`. Reviewed By: nicolasvasilache Differential Revision: https://reviews.llvm.org/D150000	2023-09-14 20:34:44 -06:00
Cullen Rhodes	f75d46a7ec	[mlir][ArmSME] Lower vector.outerproduct to FMOPA/BFMOPA (#65621 ) This patch adds support for lowering vector.outerproduct to the ArmSME MOPA intrinsic for the following types: vector<[8]xf16>, vector<[8]xf16> -> vector<[8]x[8]xf16> vector<[8]xbf16>, vector<[8]xbf16> -> vector<[8]x[8]xbf16> vector<[4]xf32>, vector<[4]xf32> -> vector<[4]x[4]xf32> vector<[2]xf64>, vector<[2]xf64> -> vector<[2]x[2]xf64> The FP variants are lowered to FMOPA (non-widening) [1] and BFloat to BFMOPA (non-widening) [2]. Note at the ISA level these variants are implemented by different architecture features, these are listed below: FMOPA (non-widening) * half-precision - +sme2p1,+sme-f16f16 * single-precision - +sme * double-precision - +sme-f64f64 BFMOPA (non-widening) * half-precision - +sme2p1,+b16b16 There's currently no way to target different features when lowering to ArmSME. Integration tests are added for F32 and F64. We use QEMU to run the integration tests but SME2 support isn't available yet, it's targeted for 9.0, so integration tests for these variants excluded. Masking is currently unsupported. Depends on #65450. [1] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/FMOPA--non-widening---Floating-point-outer-product-and-accumulate- [2] https://developer.arm.com/documentation/ddi0602/2023-06/SME-Instructions/BFMOPA--non-widening---BFloat16-floating-point-outer-product-and-accumulate-	2023-09-14 08:31:52 +01:00
Daniil Dudkin	4a831250b8	[mlir][vector] Rename vector reductions: `maxf` → `maximumf`, `minf` → `minimumf` This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. Here, we are addressing task 2.1 from the plan, which involves renaming the vector reductions to align with the semantics of the corresponding LLVM intrinsics. Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D158618	2023-09-13 22:49:07 +00:00
Andrzej Warzyński	22f96ab6fb	[mlir][vector] Refine vector.transfer_read hoisting/forwarding (#65770 ) Make sure that when analysing a `vector.transfer_read` that's a candidate for either hoisting or store-to-load forwarding, `memref.collapse_shape` Ops are correctly included in the alias analysis. This is done by either * making sure that relevant users are taken into account, or * source Ops are correctly identified.	2023-09-12 10:33:58 +01:00
Daniil Dudkin	8a6e54c9b3	[mlir][arith] Rename operations: `maxf` → `maximumf`, `minf` → `minimumf` (#65800 ) This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671. This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.	2023-09-11 22:02:19 -07:00
Benjamin Maxwell	ccef726d09	[mlir][VectorOps] Don't drop scalable dims when lowering transfer_reads/writes (in VectorToLLVM) This is a follow-on to D158753, and allows the lowering of a transfer read/write of n-D vectors with a single trailing scalable dimension to primitive vector ops. The final conversion to LLVM depends on D158517 and D158752, without these patches type conversion will fail (or an assert is hit in the LLVM backend) if the final IR contains an array of scalable vectors. This patch adds `transform.apply_patterns.vector.lower_create_mask` which allows the lowering of vector.create_mask/constant_mask to be tested independently of --convert-vector-to-llvm. Reviewed By: c-rhodes, awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D159482	2023-09-11 16:47:51 +00:00
Benjamin Maxwell	8dffb71cba	[mlir][VectorOps] Add lowering for vector.shape_cast of scalable vectors This adds a lowering similar to the general shape_cast lowering, but instead moves elements a (scalable) subvector at a time via vector.scalable.extract/insert. It is restricted to the case where both the source and result vector types have a single trailing scalable dimension (due to limitations of the insert/extract ops). The current lowerings are now disabled for scalable vectors, as they produce incorrect results at runtime (due to assuming a fixed number of elements). Examples of casts that now work: // Flattening: %v = vector.shape_cast %arg0 : vector<4x[8]xi8> to vector<[32]xi8> // Un-flattening: %v = vector.shape_cast %arg0 : vector<[8]xi32> to vector<2x1x[4]xi32> Reviewed By: awarzynski, nicolasvasilache Differential Revision: https://reviews.llvm.org/D159217	2023-09-07 15:58:44 +00:00
Cullen Rhodes	067bd7d051	[mlir][vector] Use optional for outerproduct accumulator instead of variadic This was introduced before the Optional directive and uses Variadic, but it's really optional. Reviewed By: nicolasvasilache, benmxwl-arm, dcaballe Differential Revision: https://reviews.llvm.org/D159259	2023-09-01 05:50:01 +00:00
Benjamin Maxwell	296d5cb60c	[mlir][BuiltinTypes] Return VectorType from VectorType::Builder conversion operator 0-D vectors are now supported, so the special case of returning the just the element type can now be removed. A few callers that relied on the old behaviour have been updated. Reviewed By: awarzynski, nicolasvasilache Differential Revision: https://reviews.llvm.org/D159122	2023-08-30 13:47:06 +00:00
yzhang93	f4bef787bc	Add narrow type emulation pattern for vector.transfer_read Reviewed By: mravishankar, hanchung Differential Revision: https://reviews.llvm.org/D158757	2023-08-29 13:15:19 -07:00
Lei Zhang	d243378722	[mlir][vector] Use dyn_cast in if conditions Reviewed By: dcaballe Differential Revision: https://reviews.llvm.org/D158336	2023-08-22 08:27:40 -07:00
Andrzej Warzynski	f9070b2dfb	[mlir][vector] Enable CastAwayElementwiseLeadingOneDim for scalable vec This patch effectively enables the CastAwayElementwiseLeadingOneDim rewrite pattern for scalable vectors. To this end, `ExtractOp::inferReturnTypes` is updated so that scalable dimensions are correctly recognised. The change to ExtractOp will likely make also other conversion patterns valid for scalable vectors, but this patch focuses on just one case. Other conversion patterns will be enabled in the forthcoming patches. Depends on D157993 Differential Revision: https://reviews.llvm.org/D158335	2023-08-22 11:40:46 +00:00
Andrzej Warzynski	576b184d6e	[mlir][vector] Add support for scalable vectors in `trimLeadingOneDims` This patch updates one specific hook in "VectorDropLeadUnitDim.cpp" to make sure that "scalable dims" are handled correctly. While this change affects multiple patterns, I am only adding one regression tests that captures one specific case that affects me right now. I am also adding Vector dialect to the list of dependencies of `-test-vector-to-vector-lowering`. Otherwise my test case won't work as a standalone test. Differential Revision: https://reviews.llvm.org/D157993	2023-08-22 08:45:59 +00:00
Lei Zhang	199442ea2c	[mlir][vector] Fix uniform transfer_read distribution If the original shape and the distributed shape is the same, we don't distribute at all--every thread is handling the whole. Reviewed By: hanchung Differential Revision: https://reviews.llvm.org/D158235	2023-08-17 17:38:55 -07:00
Mahesh Ravishankar	0f8bab8d59	[mlir] Revamp implementation of sub-byte load/store emulation. When handling sub-byte emulation, the sizes of the converted `memref`s also need to be updated (this was not done in the current implementation). This adds the additional complexity of having to linearize the `memref`s as well. Consider a `memref<3x3xi4>` where the `i4` elements are packed. This has a overall size of 5 bytes (rounded up to number of bytes). This can only be represented by a `memref<5xi8>`. A `memref<3x2xi8>` would imply an implicit padding of 4 bits at the end of each row. So incorporate linearization into the sub-byte load-store emulation. This patch also updates some of the utility functions to make better use of statically available information using `OpFoldResult` and `makeComposedFoldedAffineApplyOps`. Reviewed By: hanchung, yzhang93 Differential Revision: https://reviews.llvm.org/D158125	2023-08-17 20:27:53 +00:00
Lei Zhang	73ddc4474b	[mlir][vector] Enable distribution over multiple dimensions This commit starts enabling vector distruction over multiple dimensions. It requires delinearize the lane ID to match the expected rank. shape_cast and transfer_read now can properly handle multiple dimensions. Reviewed By: hanchung Differential Revision: https://reviews.llvm.org/D157931	2023-08-16 12:08:43 -07:00
Matthias Springer	a02ad6c177	[mlir][bufferization] Generalize getAliasingOpResults to getAliasingValues This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops. This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block. If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block. Differential Revision: https://reviews.llvm.org/D157957	2023-08-15 15:02:47 +02:00
Andrzej Warzynski	12b4951866	[mlir][vector] Add missing support for scalable vectors This patch adds the missing logic so that the `TransferReadPermutationLowering` can be used for scalable vectors. To this end: * TransferOp custom C++ builder is updated to support scalable vectors, * `TransferOpReduceRank` is also updated to support scalable vectors. This pattern is relevant when lowering `linalg.matmul` via `vector_multi_reduction` for scalable vectors. I've also updated relevant code in `TransferOpReduceRank` not to use `llvm::to_vector` for constructing `SmallVector` from `ArrayRef`. That hook doesn't work for `ArraryRef<bool>` (), so for consistency I switched to an explicit constructor (so that both `newShape` and `newScalableDim` are constructed in a similar fashion). () IIUC, that's due how implicit narrowing conversions between `bool` and `bool` work. Note that these narrowing conversions change when using initializer lists, see https://en.cppreference.com/w/cpp/language/list_initialization. Depends on D157092 Differential Revision: https://reviews.llvm.org/D157268	2023-08-10 09:08:30 +00:00
Diego Caballero	15a08cf27c	[mlir][Vector] Fold selects of single-element i1 vectors This patch adds a folding to select operation between an all-true and all-false vector. For now, only single element vectors (i.e., vector<1xi1>) are supported. Multi-element cases are caught by InstCombine. Reviewed By: awarzynski Differential Revision: https://reviews.llvm.org/D154682	2023-08-09 18:57:36 +00:00
Andrzej Warzynski	5c581720b9	[mlir][Vector] Add support for scalable vectors in multi_reduction Support for scalable vectors in vector.multi_reduction is added by simply updating MultiDimReductionOp::verify. Also, the conversion pattern for reducing n-D vector.multi_reduction to 2D vector.multi_reduction is updated. Differential Revision: https://reviews.llvm.org/D157092	2023-08-08 17:01:59 +00:00
Matthias Springer	16b75cd2bb	[mlir][vector] Use DenseI64ArrayAttr for ExtractOp/InsertOp positions `DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayAttr`) are generated. Differential Revision: https://reviews.llvm.org/D156684	2023-07-31 15:25:37 +02:00
Matthias Springer	b1d2687501	[mlir][IR] Remove duplicate `isLastMemrefDimUnitStride` functions This function is duplicated in various dialects. Differential Revision: https://reviews.llvm.org/D155462	2023-07-17 16:31:04 +02:00
Matthias Springer	fd5cda3393	[mlir][vector][NFC] Minor VectorTransferOpInterface cleanup * Rename functions with underscore to camel case. * Return C++ bools of "in_bounds" values instead of an `ArrayAttr`. Differential Revision: https://reviews.llvm.org/D155277	2023-07-14 15:41:21 +02:00
Matthias Springer	6040044f2f	[mlir][vector] VectorToSCF: Omit redundant out-of-bounds check There was a bug in `TransferWriteNonPermutationLowering`, a pattern that extends the permutation map of a TransferWriteOp with leading transfer dimensions of size ones. These newly added transfer dimensions are always in-bounds, because the starting point of any dimension is in-bounds. VectorToSCF inserts out-of-bounds checks based on the "in_bounds" attribute and dims that are marked as out-of-bounds but that are actually always in-bounds lead to unnecessary "scf.if" ops. Differential Revision: https://reviews.llvm.org/D155196	2023-07-14 09:50:37 +02:00
Hanhan Wang	8fc433f055	[mlir][MemRef] Move narrow type emulation common methods to MemRefUtils. It also unifies the computation of StridedLayoutAttr. If the stride is static known value, we can just use it. Differential Revision: https://reviews.llvm.org/D155017	2023-07-13 14:43:21 -07:00
Quinn Dawkins	5b6b2caf3c	[mlir][vector] Handle memory space conflicts in VectorTransferSplit patterns Currently the transfer splitting patterns will generate an invalid cast when the source memref for a transfer op has a non-default memory space. This is handled by first introducing a `memref.memory_space_cast` in such cases. Differential Revision: https://reviews.llvm.org/D154515	2023-07-11 22:58:23 -04:00
yzhang93	9a7677d8ee	[mlir] Narrow bitwidth emulation for vector.load This patch is a following for the previous patch https://reviews.llvm.org/D151519. With this patch, vector.load op with narrow bitwidth (e.g., i4) can be converted to supported wider bitwidth (e.g., i8). Reviewed By: hanchung Differential Revision: https://reviews.llvm.org/D154178	2023-07-11 13:38:15 -07:00
Matthias Springer	867afe5e53	[mlir][vector] Remove duplicate tensor subset <-> vector transfer patterns Remove patterns that fold tensor subset ops into vector transfer ops from the vector dialect. These patterns already exist in the tensor dialect. Differential Revision: https://reviews.llvm.org/D154932	2023-07-11 11:12:29 +02:00
Matthias Springer	a7a5641bdc	[mlir][vector] Fix bug in `TransferWriteNonPermutationLowering` This pattern expands the rank of the vector. However, the rank of the mask was not expanded. Differential Revision: https://reviews.llvm.org/D154849	2023-07-10 17:21:03 +02:00
Matthias Springer	cb7bda2ace	[mlir][NFC] Use `getConstantIntValue` instead of casting to `ConstantIndexOp` `getConstantIntValue` extracts constant values from all constant-like ops, not just `arith::ConstantIndexOp`. Differential Revision: https://reviews.llvm.org/D154356	2023-07-04 14:08:37 +02:00
Matthias Springer	030b18fe14	[mlir][vector] Clean up some dimension size checks * Add `memref::getMixedSize` (same as in the tensor dialect). * Simplify in-bounds check in `VectorTransferSplitRewritePatterns.cpp` and fix off-by-one error in the static in-bounds check. * Use "memref::DimOp" instead of `createOrFoldDimOp` when possible. Differential Revision: https://reviews.llvm.org/D154218	2023-07-03 09:10:00 +02:00
Andrzej Warzynski	f22af204ed	[mlir][VectorType] Remove `numScalableDims` from the vector type This is a follow-up of https://reviews.llvm.org/D153372 in which `numScalableDims` (single integer) was effectively replaced with `isScalableDim` bitmask. This change is a part of a larger effort to enable scalable vectorisation in Linalg. See this RFC for more context: * https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/ Differential Revision: https://reviews.llvm.org/D153412	2023-06-28 13:53:45 +01:00
Matthias Springer	efc290ce9c	[mlir][affine] More efficient `makeComposedFolded...` helpers The old code used to materialize constants as ops, immediately folded them into the resulting affine map and then deleted the constant ops again. Instead, directly fold the attributes into the affine map. Furthermore, all helpers accept `OpFoldResult` instead of `Value` now. This makes the code at call sites more efficient, because it is no longer necessary to materialize a `Value`, just to be able to use these helper functions. Note: The API has changed (accepts OpFoldResult instead of Value), otherwise this change is NFC. Differential Revision: https://reviews.llvm.org/D153324	2023-06-22 10:47:38 +02:00
Andrzej Warzynski	4d339ec91e	[mlir][Vector] Add pattern to reorder elementwise and broadcast ops The new pattern will replace elementwise(broadcast) with broadcast(elementwise) when safe. This change affects tests for vectorising nD-extract. In one case ("vectorize_nd_tensor_extract_with_tensor_extract") I just trimmed the test and only preserved the key parts (scalar and contiguous load from the original Op). We could do the same with some other tests if that helps maintainability. Differential Revision: https://reviews.llvm.org/D152812	2023-06-15 10:13:41 +01:00
Cullen Rhodes	1e41a29d73	Revert "[mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write to zero" Apologies I shouldn't have comitted this, need to wait until the planned MLIR ODM: https://discourse.llvm.org/t/rfc-creating-a-armsme-dialect/67208/76 This reverts commit `a48fe89885`.	2023-06-14 09:03:10 +00:00
Cullen Rhodes	a48fe89885	[mlir][ArmSME] Add initial dialect with basic lowering of vector.transfer write to zero This patch adds support for lowering a `vector.transfer_write` of zeroes and type `vector<[16x16]xi8>` to the SME `zero {za}` instruction [1], which zeroes the entire accumulator. This contributes to supporting a path from `linalg.fill` to SME. [1] https://developer.arm.com/documentation/ddi0602/2022-06/SME-Instructions/ZERO--Zero-a-list-of-64-bit-element-ZA-tiles- Reviewed By: awarzynski, dcaballe Differential Revision: https://reviews.llvm.org/D152508	2023-06-14 08:46:53 +00:00
Matthias Springer	80853a1673	[mlir][vector][bufferize] Better analysis for vector.transfer_write The destination operand does not bufferize to a memory read if it is completely overwritten. Differential Revision: https://reviews.llvm.org/D152823	2023-06-14 09:38:51 +02:00
Nicolas Vasilache	e35ff2605f	[mlir][vector] NFC - Add debug information to vector unrolling patterns	2023-06-08 08:06:47 +00:00
Quentin Colombet	1dd00d3903	[mlir][Vector] Fix a propagation bug with broadcast In the vector distribute patterns, we used to move `vector.broadcast`s out of `vector.warp_execute_on_lane0`s irrespectively of how they were defined. This could create broadcast operations with invalid semantic. E.g., ``` %r = warop ...[32] ... -> vector<1x2xf32> { %val = broadcast %in : vector<64xf32> to vetor<1x64xf32> vector.yield %val : vector<1x64xf32> } ``` => ``` %r = warop ...[32] ... -> vector<64xf32> { vector.yield %in : vector<64xf32> } // Broadcasting to a narrower type! broadcast %r : vector<64xf32> to vector<1x2xf32> ``` The root issue is we are trying to broadcast something that is not the same for each thread, so there is actually nothing to propagate here. The fix checks that the broadcast we want to create actually makes sense. Differential Revision: https://reviews.llvm.org/D152154	2023-06-06 16:40:15 +02:00
Manish Gupta	9a795f0c59	[mlir][Vector] Adds a pattern to fold `arith.extf` into `vector.contract` Consider mixed precision data type, i.e., F16 input lhs, F16 input rhs, F32 accumulation, and F32 output. This is typically written as F32 <= F16F16 + F32. During vectorization from linalg to vector for mixed precision data type (F32 <= F16F16 + F32), linalg.matmul introduces arith.extf on input lhs and rhs operands. "linalg.matmul"(%lhs, %rhs, %acc) ({ ^bb0(%arg1: f16, %arg2: f16, %arg3: f32): %lhs_f32 = "arith.extf"(%arg1) : (f16) -> f32 %rhs_f32 = "arith.extf"(%arg2) : (f16) -> f32 %mul = "arith.mulf"(%lhs_f32, %rhs_f32) : (f32, f32) -> f32 %acc = "arith.addf"(%arg3, %mul) : (f32, f32) -> f32 "linalg.yield"(%acc) : (f32) -> () }) There are backend that natively supports mixed-precision data type and does not need the arith.extf. For example, NVIDIA A100 GPU has mma.sync.aligned.*.f32.f16.f16.f32 that can support mixed-precision data type. However, the presence of arith.extf in the IR, introduces the unnecessary casting targeting F32 Tensor Cores instead of F16 Tensor Cores for NVIDIA backend. This patch adds a folding pattern to fold arith.extf into vector.contract Differential Revision: https://reviews.llvm.org/D151918	2023-06-05 23:22:20 +00:00
Quentin Colombet	018d8ac974	[mlir][Vector] Fix a propagation bug with transfer_read In the vector distribute patterns, we used to move `vector.transfer_read`s out of `vector.warp_execute_on_lane0`s irrespectively of how they were defined. This could create transfer_read operations that would read values from within the warpOp's body from outside of the body. E.g., ``` warpop { %defined_in_body %read = transfer_read %defined_in_body vector.yield %read } ``` => ``` warpop { %defined_in_body vector.yield ... } // %defined_in_body is referenced outside of its scope. %read = transfer_read %defined_in_body ``` The fix consists in checking that all the values feeding the new `transfer_read` are defined outside of warpOp's body. Note: We could do this check before creating any operation, but that would mean knowing what `affine::makeComposedAffineApply` actually do. So the current fix is a trade off of coupling the implementations of this propagation and `makeComposedAffineApply` versus compile time. Differential Revision: https://reviews.llvm.org/D152149	2023-06-05 15:52:26 +02:00
Matthias Springer	01128d4baf	[mlir][vector][NFC] Clean up headers Certain functions were declared in `VectorOps.h` instead of `VectorTransforms.h` or `VectorRewritePatterns.h`. Differential Revision: https://reviews.llvm.org/D152146	2023-06-05 15:16:20 +02:00
Diego Caballero	834fcfed24	Reland "[mlir][Vector] Extend xfer drop unit dim patterns" This reverts commit `76d71f3792`.	2023-06-01 22:22:16 +00:00
Diego Caballero	d3e1398bef	[mlir][Vector] Prevent vector-to-scalar xfer patterns from triggering on sub-vectors Patterns that convert extract(transfer_read) into a scalar load where incorrectly triggering for cases where a sub-vector instead of a scalar was extracted. Reviewed By: nicolasvasilache, hanchung, awarzynski Differential Revision: https://reviews.llvm.org/D151862	2023-06-01 22:22:16 +00:00

1 2 3 4 5 ...

258 Commits