clang-p2996

Author	SHA1	Message	Date
Tim Gymnich	67c590004d	[mlir][AMDGPU] Add scaled floating point conversion ops (#141554 ) implement `ScaledExtPackedOp` and `PackedScaledTruncOp`	2025-06-13 11:09:11 +02:00
Simone Pellegrini	4b59b7b946	[mlir][Linalg] Fix fusing of indexed linalg consumer with different axes (#140892 ) When fusing two `linalg.genericOp`, where the producer has index semantics, invalid `affine.apply` ops can be generated where the number of indices do not match the number of loops in the fused genericOp. This patch fixes the issue by directly using the number of loops from the generated fused op.	2025-06-13 10:03:09 +01:00
Longsheng Mou	02f1f6967a	[mlir][linalg] Add pure tensor check for `winogradConv2DHelper` (#142299 ) This PR adds pure tensor semantics check for `winogradConv2DHelper` to prevent a crash. Fixes #141566.	2025-06-13 15:49:54 +08:00
Diego Caballero	1ac61c8334	[mlir][Vector] Remove `vector.extractelement/insertelement` from sparse vectorizer (#143270 ) This PR is part of the last step to remove `vector.extractelement` and `vector.insertelement` ops. RFC: https://discourse.llvm.org/t/rfc-psa-remove-vector-extractelement-and-vector-insertelement-ops-in-favor-of-vector-extract-and-vector-insert-ops It updates the Sparse Vectorizer to use `vector.extract` and `vector.insert` instead of `vector.extractelement` and `vector.insertelement`.	2025-06-12 14:49:00 -07:00
Andrzej Warzyński	4a58a63280	[mlir][linalg] Remove the `test-linalg-to-vector-patterns` option (#142116 ) This patch removes the `test-linalg-to-vector-patterns` option from the `-test-linalg-transform-patterns=` test flag. It was only used in one test, where a more specialized transform dialect op can be used instead: * `transform.apply_patterns.linalg.pad_vectorization` While we could preserve `test-linalg-to-vector-patterns`, it's better to rely on finer-grained transformations — this way, we know exactly what is being run and tested. Now that its only use has been removed, it feels natural to delete `test-linalg-to-vector-patterns`.	2025-06-12 19:26:51 +01:00
fairywreath	2c20bc5112	[mlir][spirv] Add definitions for GL FindILsb and FindSMsb (#143916 ) Adds SPIRV GL FindILsb and FindSMsb instructions which correspond to GL instruction numbers 73 and 74.	2025-06-12 12:54:42 -04:00
Nicolas Vasilache	e4de74ba11	[mlir][Vector] Tighten up application conditions in TransferReadAfter… (#143869 ) …WriteToBroadcast The pattern would previously apply in spurious cases and generate incorrect IR. In the process, we disable the application of this pattern in the case where there is no broadcast; this should be handled separately and may more easily support masking. The case {no-broadcast, yes-transpose} was previously caught by this pattern and arguably could also generate incorrect IR (and was also untested): this case does not apply anymore. The last cast {yes-broadcast, yes-transpose} continues to apply but should arguably be removed from the future because creating transposes as part of canonicalization feels dangerous. There are other patterns that move permutation logic: - either into the transfer, or - outside of the transfer Ideally, this would be target-dependent and not a canonicalization (i.e. does your DMA HW allow transpose on the fly or not) but this is beyond the scope of this PR. Co-authored-by: Nicolas Vasilache <nicolasvasilache@users.noreply.github.com>	2025-06-12 17:11:06 +02:00
Igor Wodiany	62b6940900	[mlir][spirv] Add definition for GL Pack/UnpackHalf2x16 (#143889 )	2025-06-12 16:10:33 +01:00
Adam Siemieniuk	d698ede748	[mlir][amx] Restore conversion interface for AMX (#143871 ) Restores mistakenly removed AMX interface which ensures that the custom tile type is converted to its LLVM equivalent within other operations such as control flow. Fix after #140559	2025-06-12 13:45:19 +02:00
Ian Wood	6e5a1423b7	[mlir] Reapply "Loosen restrictions on folding dynamic reshapes" (#142827 ) The original PR https://github.com/llvm/llvm-project/pull/137963 had a nvidia bot failure. This appears to be a flaky test because rerunning the build was successful. This change needs commit `6f2ba47` to fix incorrect usage of `getReassociationIndicesForCollapse`. Reverts llvm/llvm-project#142639 Co-authored-by: Artem Gindinson <gindinson@roofline.ai>	2025-06-12 10:28:27 +02:00
Ian Wood	6f2ba4712f	[mlir] Fix ComposeExpandOfCollapseOp for dynamic case (#142663 ) Changes `findCollapsingReassociation` to return nullopt in all cases where source shape has `>=2` dynamic dims. `expand(collapse)` can reshape to in any valid output shape but a collapse can only collapse contiguous dimensions. When there are `>=2` dynamic dimensions it is impossible to determine if it can be simplified to a collapse or if it is preforming a more advanced reassociation. This problem was uncovered by https://github.com/llvm/llvm-project/pull/137963 --------- Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>	2025-06-11 14:34:02 -07:00
Rolf Morel	fb761aa38b	[MLIR][Transform] apply_registered_op fixes: arg order & python options auto-conversion (#143779 )	2025-06-11 21:19:52 +01:00
Razvan Lupusoru	34a1b8ce25	[acc] acc.loop verifier now requires parallelism determination flag (#143720 ) The OpenACC specification for `acc loop` describe that a loop's parallelism determination mode is either auto, independent, or seq. The rules are as follows. - As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop construct with no auto or seq clause is treated as if it has the independent clause when it is an orphaned loop construct or its parent compute construct is a parallel construct. - As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent compute construct is a kernels construct, a loop construct with no independent or seq clause is treated as if it has the auto clause. - Additionally, loops marked with gang, worker, or vector are not guaranteed to be parallel. Specifically noted in 2.9.7 auto clause: If not, or if it is unable to make a determination, it must treat the auto clause as if it is a seq clause, and it must ignore any gang, worker, or vector clauses on the loop construct. The verifier for `acc.loop` was updated to enforce this marking because the context in which a loop appears is not trivially determined once IR transformations begin. For example, orphaned loops are implicitly `independent`, but after inlining into an `acc.kernels` region they would be implicitly considered `auto`. Thus now the verifier requires that a frontend specifically generates acc dialect with this marking since it knows the context.	2025-06-11 12:37:08 -07:00
Rolf Morel	fe7bf4b90b	[MLIR][Transform] apply_registered_pass op's options as a dict (#143159 ) Improve ApplyRegisteredPassOp's support for taking options by taking them as a dict (vs a list of string-valued key-value pairs). Values of options are provided as either static attributes or as params (which pass in attributes at interpreter runtime). In either case, the keys and value attributes are converted to strings and a single options-string, in the format used on the commandline, is constructed to pass to the `addToPipeline`-pass API.	2025-06-11 17:33:55 +01:00
Igor Wodiany	9150a8249f	[mlir][spirv] Add definition for GL Exp2 (#143678 )	2025-06-11 15:59:47 +01:00
Igor Wodiany	b09206db15	[mlir][spirv] Include `SPIRV_AnyImage` in `SPIRV_Type` (#143676 ) This change is trigger by encountering the following error: ``` <unknown>:0: error: 'spirv.Load' op result #0 must be void or bool or 8/16/32/64-bit integer or 16/32/64-bit float or vector of bool or 8/16/32/64-bit integer or 16/32/64-bit float values of length 2/3/4/8/16 or any SPIR-V pointer type or any SPIR-V array type or any SPIR-V run time array type or any SPIR-V struct type or any SPIR-V cooperative matrix type or any SPIR-V matrix type or any SPIR-V sampled image type, but got '!spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8>'<unknown>:0: note: see current operation: %126 = "spirv.Load"(%125) {relaxed_precision} : (!spirv.ptr<!spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8>, UniformConstant>) -> !spirv.image<f32, Dim2D, NoDepth, NonArrayed, SingleSampled, NoSampler, Rgba8> ```	2025-06-11 14:37:28 +01:00
Darren Wihandi	e15d50d5ff	[mlir][spirv] Add lowering of multiple math trig/hypb functions (#143604 ) Add Math to SPIRV lowering for tan, asin, acos, sinh, cosh, asinh, acosh and atanh. This completes the lowering of all trigonometric and hyperbolic functions from math to SPIRV.	2025-06-11 09:20:40 -04:00
Simone Pellegrini	abbbe4a6cd	[mlir][vector] Fix attaching write effects on transfer_write's base (#142940 ) This fixes an issue with `TransferWriteOp`'s implementation of the `MemoryEffectOpInterface` where the write effect was attached to the stored value rather than the base. This had the effect that when asking for the memory effects for the input memref buffer using `getEffectsOnValue(...)`, the function would return no-effects (as the effect would have been attached to the stored value rather than the input buffer).	2025-06-11 12:37:34 +01:00
MaheshRavishankar	45ae41e0d8	[mlir][scf] Return `replacements` explicitly in `SCFTilingResult`. (#143217 ) In #120115 the replacements for the tiled operations were wrapped within the `MergeResult` object. That is a bit of an obfuscation and not immediately obvious where to get the replacements post tiling. This changes the `SCFTilingResult` to have `replacements` explicit (as it was before that change). `mergeOps` is added as a separate field of `SCFTilingResult`, which is empty when the reduction type is `FullReduction`. This is a API breaking change. All uses of `mergeResult.replacements` should be replaced with `replacements`. There was also an implicit assumption that `PartialReductionTilingInterface` is derived from `TilingInterface`, so all ops that implemented the `PartialReductionTilingInterface` were expected to implement the `TilingInterface` as well. This pre-dated the existence of derived inheritances. Make `PartialReductionTilingInterface` derive from `TilingInterface`. Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>	2025-06-10 12:13:42 -07:00
Jianhui Li	9630d7cb92	[MLIR][XeGPU] add blocking support for reduce, broadcast, and transpose (#143389 ) This PR adds blocking support for vector dialect operations (`reduce`, `broadcast`, and `transpose`) in the XeGPU based IR. It simply assigned the shape specified by "inst_data" as its target shape of the unrolling to implement the blocking. It is based on https://github.com/llvm/llvm-project/pull/140163.	2025-06-10 10:50:26 -05:00
Igor Wodiany	f7967effa3	[mlir][spirv][nfc] Add missing tests for GL Tanh Op (#143538 ) The problem was noticed when adding Log2 operation.	2025-06-10 15:40:22 +01:00
Cameron McInally	cde1035a2f	[flang] Add support for -mrecip[=<list>] (#143418 ) This patch adds support for the -mrecip command line option. The parsing of this options is equivalent to Clang's and it is implemented by setting the "reciprocal-estimates" function attribute. Also move the ParseMRecip(...) function to CommonArgs, so that Flang is able to make use of it as well. --------- Co-authored-by: Cameron McInally <cmcinally@nvidia.com>	2025-06-10 08:25:33 -06:00
Igor Wodiany	326429022f	[mlir][spirv] Deserialize OpConstantComposite of type Cooperative Matrix (#142786 ) Depends on #142784.	2025-06-10 15:23:06 +01:00
NimishMishra	bf1fe6eb33	[mlir][OpenMP] Reintroduce TODO for translation of linear clause (#143531 ) Reintroduce a TODO for linear clause translation unless corner issues (like linear variables being entities other than `alloca`, and support for linear variables of types other than integer) are solved.	2025-06-10 07:06:28 -07:00
Tulio Magno Quites Machado Filho	5e0e6a0dd6	[MLIR] Use mlir_target_link_libraries with MLIRTestIRDLToCppDialect (#143435 ) Replace LINK_LIBS with mlir_target_link_libraries. Fixes #143246. Suggested-by: Nikita Popov <npopov@redhat.com>	2025-06-10 08:08:21 -03:00
Igor Wodiany	d61a06e255	[mlir][spirv] Add definition for GL Log2 (#143409 )	2025-06-10 10:40:50 +01:00
Darren Wihandi	4e6896244f	[mlir][spirv] Add definitions for GL inverse hyperbolic functions (#141720 ) Adds definitions for `Asinh`, `Acosh` and `Atanh` based on [SPIR-V extended instructions for GLSL](https://registry.khronos.org/SPIR-V/specs/unified1/GLSL.std.450.html). Their instruction numbers are 22, 23 and 24.	2025-06-09 18:40:04 -04:00
Charitha Saumya	10dc8bc519	[mlir][vector] Fix for WarpOpScfForOp failure when scf.for has results that are unused. (#141853 ) Currently, only the values defined outside ForOp but inside the original WarpOp are considered "escaping values". However this is not true if the ForOp has some unused results. In this case, corresponding IterArgs must also be yielded by the original WarpOp. This PR adds the required code changes to achieve this.	2025-06-09 11:56:34 -07:00
Umang Yadav	7f08503a3b	Introduce `arith.scaling_extf` and `arith.scaling_truncf` (#141965 ) This PR adds `arith.scaling_truncf` and `arith.scaling_extf` operations which supports the block quantization following OCP MXFP specs listed here https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf OCP MXFP Spec comes with reference implementation here https://github.com/microsoft/microxcaling/tree/main Interesting piece of reference code is this method `_quantize_mx` `7bc41952de/mx/mx_ops.py (L173)`. Both `arith.scaling_truncf` and `arith.scaling_extf` are designed to be an elementwise operation. Please see description about them in `ArithOps.td` file for more details. Internally, `arith.scaling_truncf` does the `arith.truncf(arith.divf(input/(2^scale)))`. `scale` should have necessary broadcast, clamping, normalization and NaN propagation done before callling into `arith.scaling_truncf`. `arith.scaling_extf` does the `arith.mulf(2^scale, input)` after taking care of necessary data type conversions. CC: @krzysz00 @dhernandez0 @bjacob @pashu123 @MaheshRavishankar @tgymnich --------- Co-authored-by: Prashant Kumar <pk5561@gmail.com> Co-authored-by: Krzysztof Drewniak <Krzysztof.Drewniak@amd.com>	2025-06-09 13:13:31 -05:00
Darren Wihandi	a3c7d46145	[mlir][spirv] Implement UMod canonicalization for vector constants (#141902 ) Closes #63174. Implements this transformation pattern, which is currently only applied to scalars, for vectors: ``` %1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32 %2 = "spirv.UMod"(%1, %CONST_4) : (i32, i32) -> i32 ``` to ``` %1 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32 %2 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32 ``` Additionally fixes and issue where patterns like this: ``` %1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32 %2 = "spirv.UMod"(%1, %CONST_32) : (i32, i32) -> i32 ``` were incorrectly canonicalized to: ``` %1 = "spirv.UMod"(%0, %CONST_4) : (i32, i32) -> i32 %2 = "spirv.UMod"(%0, %CONST_32) : (i32, i32) -> i32 ``` which is incorrect since `(X % A) % B` == `(X % B)` IFF A is a multiple of B, i.e., B divides A.	2025-06-09 11:09:36 -04:00
Igor Wodiany	cc2d5facec	[mlir][spirv] Make `CooperativeMatrixType` a `ShapedType` (#142784 ) This is to enable `CooperativeMatrixType` to be used with `DenseElementsAttr`, so that a `spirv.Constant` can be easily built from `OpConstantComposite`. For example: ```mlir %cst = spirv.Constant dense<0.000000e+00> : !spirv.coopmatrix<1x1xf32, Subgroup, MatrixAcc> ``` Constraints of arithmetic operations are changed, as `SameOperandsAndResultType` can no longer fully verify CoopMatrices. This is because for shaped types the verifier only checks element type and shapes, whereas for any other arbitrary type it looks for an exact match. This patch does not enable the actual deserialization. This will be done in a subsequent PR.	2025-06-09 16:01:48 +01:00
Jeremy Kun	b1b84a629d	Pretty print on -dump-pass-pipeline (#143223 ) This PR makes `dump-pass-pipeline` pretty-print the dumped pipeline. For large pipelines the current behavior produces a wall of text that is hard to visually navigate. For the command ```bash mlir-opt --pass-pipeline="builtin.module(flatten-memref, expand-strided-metadata,func.func(arith-expand,func.func(affine-scalrep)))" --dump-pass-pipeline ``` Before: ```bash Pass Manager with 3 passes: builtin.module(flatten-memref,expand-strided-metadata,func.func(arith-expand{include-bf16=false include-f8e8m0=false},func.func(affine-scalrep))) ``` After: ```bash Pass Manager with 3 passes: builtin.module( flatten-memref, expand-strided-metadata, func.func( arith-expand{include-bf16=false include-f8e8m0=false}, func.func( affine-scalrep ) ) ) ``` Another nice feature of this is that the pretty-printed string can still be copy/pasted into `-pass-pipeline` using a quote: ```bash $ bin/mlir-opt --dump-pass-pipeline test.mlir --pass-pipeline=' builtin.module( flatten-memref, expand-strided-metadata, func.func( arith-expand{include-bf16=false include-f8e8m0=false}, func.func( affine-scalrep ) ) )' ``` --------- Co-authored-by: Jeremy Kun <j2kun@users.noreply.github.com>	2025-06-08 12:23:38 -07:00
Andrzej Warzyński	b4b86a7a3c	[mlir][linalg] Refactor vectorization hooks to improve code reuse (#141244 ) This patch refactors two vectorization hooks in Vectorization.cpp: * `createWriteOrMaskedWrite` gains a new parameter for write indices, aligning it with its counterpart `createReadOrMaskedRead`. * `vectorizeAsInsertSliceOp` is updated to reuse both of the above hooks, rather than re-implementing similar logic. CONTEXT ------- This is effectively a refactoring of the logic for vectorizing `tensor.insert_slice`. Recent updates added masking support: * https://github.com/llvm/llvm-project/pull/122927 * https://github.com/llvm/llvm-project/pull/123031 At the time, reuse of the shared `create` hooks wasn't feasible due to missing parameters and overly rigid assumptions. This patch resolves that and moves us closer to a more maintainable structure. CHANGES IN `createWriteOrMaskedWrite` ------------------------------------- Introduces a clear distinction between the destination tensor and the vector to store, via named variables like `destType`/`vecToStoreType`, `destShape`/`vecToStoreShape`, etc. * Ensures the correct rank and shape are used for attributes like `in_bounds`. For example, the size of the `in_bounds` attr now matches the source vector rank, not the tensor rank. * Drops the assumption that `vecToStoreRank == destRank` - this doesn't hold in many real examples. * Deduces mask dimensions from `vecToStoreShape` (vector) instead of `destShape` (tensor). (Eventually we should not require `inputVecSizesForLeadingDims` at all - mask shape should be inferred.) NEW HELPER: `isMaskTriviallyFoldable` ------------------------------------- Adds a utility to detect when masking is unnecessary. This avoids inserting redundant masks and reduces the burden on canonicalization to clean them up later. Example where masking is provably unnecessary: ```mlir %2 = vector.mask %1 { vector.transfer_write %0, %arg1[%c0, %c0, %c0, %c0, %c0, %c0] {in_bounds = [true, true, true]} : vector<1x2x3xf32>, tensor<9x8x7x1x2x3xf32> } : vector<1x2x3xi1> -> tensor<9x8x7x1x2x3xf32> ``` Also, without this hook, tests are more complicated and require more matching. VECTORIZATION BEHAVIOUR ----------------------- This patch preserves the current behaviour around masking and the use of`in_bounds` attribute. Specifically: * `useInBoundsInsteadOfMasking` is set when no input vector sizes are available. * The vectorizer continues to infer vector sizes where needed. Note: the computation of the `in_bounds` attribute is not always correct. That issue is tracked here: * https://github.com/llvm/llvm-project/issues/142107 This will be addressed separately. TEST CHANGES ----------- Only affects vectorization of: * `tensor.insert_slice` (now refactored to use shared hooks) Test diffs involve additional `arith.constant` Ops due to increased reuse of shared helpers (which generate their own constants). This will be cleaned up via constant caching (see #138265). NOTE FOR REVIEWERS ------------------ This is a fairly substantial rewrite. You may find it easier to review `createWriteOrMaskedWrite` as a new method rather than diffing line-by-line. TODOs (future PRs) ------------------ Further alignment of `createWriteOrMaskedWrite` and `createReadOrMaskedRead`: * Move `createWriteOrMaskedWrite` next to `createReadOrMaskedRead` (in VectorUtils.cpp) * Make `createReadOrMaskedRead` leverage `isMaskTriviallyFoldable`. * Extend `isMaskTriviallyFoldable` with value-bounds-analysis. See the updated test in transform-vector.mlir for an example that would benefit from this. * Address #142107 (*) This method will eventually be moved out of Vectorization.cpp, which isn't the right long-term home for it.	2025-06-07 19:25:30 +01:00
Darren Wihandi	c9c60172a1	[mlir][spirv] Implement lowering `gpu.subgroup_reduce` with cluster size for SPIRV (#141402 ) Implement lowering of `gpu.subgroup_reduce` with a cluster size attribute to SPIRV by using the `ClusteredReduce` group operation.	2025-06-06 12:50:18 -04:00
Kazu Hirata	1eb843b1a0	[mlir] Ensure newline at the end of files (NFC) (#143155 )	2025-06-06 09:16:52 -07:00
Rolf Morel	4eeee41f52	[MLIR][Transform] Allow ApplyRegisteredPassOp to take options as a param (#142683 ) Makes it possible to pass around the options to a pass inside a schedule. The refactoring also makes it so that the pass manager and pass are only constructed once per `apply()` of the transform op versus for each target payload given to the op's `apply()`.	2025-06-06 11:19:39 +01:00
Momchil Velikov	b9d3a644c2	[MLIR] Add apply_patterns.arm_sve.vector_contract_to_i8mm TD Op (#140572 )	2025-06-06 10:54:14 +01:00
Tom Eccles	b03081e9fb	[mlir][OpenMP] set correct insert point after creating a barrier (#142997 ) Fixes #138436	2025-06-06 10:43:13 +01:00
Momchil Velikov	44a047c929	[MLIR][ArmSVE] Add initial lowering of vector.contract to SVE `*MMLA` instructions (#135636 )	2025-06-06 09:54:23 +01:00
asraa	c66b72f8ce	[mlir][tensor] remove tensor.insert constant folding out of canonicalization (#142671 ) Follow ups from https://github.com/llvm/llvm-project/pull/142458/ In particular concerns that indiscriminately folding tensor constants can lead to bloating the IR as these can be arbitrarily large. Signed-off-by: Asra Ali <asraa@google.com>	2025-06-05 14:53:33 -07:00
Chao Chen	def37f7e3a	[mlir][vector] add unroll pattern for broadcast (#142011 ) This PR adds `UnrollBroadcastPattern` to `VectorUnroll` transform. To support this, it also extends `BroadcastOp` definition with `VectorUnrollOpInterface`	2025-06-05 12:42:16 -05:00
Krzysztof Parzyszek	4dcc159485	[utils][TableGen] Implement clause aliases as alternative spellings (#141765 ) Use the spellings in the generated clause parser. The functions `get<lang>ClauseKind` and `get<lang>ClauseName` are not yet updated. The definitions of both clauses and directives now take a list of "Spelling"s instead of a single string. For example ``` def ACCC_Copyin : Clause<[Spelling<"copyin">, Spelling<"present_or_copyin">, Spelling<"pcopyin">]> { ... } ``` A "Spelling" is a versioned string, defaulting to "all versions". For background information see https://discourse.llvm.org/t/rfc-alternative-spellings-of-openmp-directives/85507	2025-06-05 12:35:30 -05:00
James Newling	7ce315d14a	[mlir][vector] Improve shape_cast lowering (#140800 ) Before this PR, a rank-m -> rank-n vector.shape_cast with m,n>1 was lowered to extracts/inserts of single elements, so that a shape_cast on a vector with N elements would always require N extracts/inserts. While this is necessary in the worst case scenario it is sometimes possible to use fewer, larger extracts/inserts. Specifically, the largest common suffix on the shapes of the source and result can be extracted/inserted. For example: ```mlir %0 = vector.shape_cast %arg0 : vector<10x2x3xf32> to vector<2x5x2x3xf32> ``` has common suffix of shape `2x3`. Before this PR, this would be lowered to 60 extract/insert pairs with extracts of the form `vector.extract %arg0 [a, b, c] : f32 from vector<10x2x3xf32>`. With this PR it is 10 extract/insert pairs with extracts of the form `vector.extract %arg0 [a] : vector<2x3xf32> from vector<10x2x3xf32>`.	2025-06-05 10:18:38 -07:00
Srinivasa Ravi	1bc3845c44	[MLIR][NVVM] Add prefetch Ops (#141737 ) This change adds `prefetch` and `prefetch.uniform` Ops to the NVVM dialect for the `prefetch` and `prefetchu` group of instructions. PTX Spec Reference: https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu	2025-06-05 20:08:00 +05:30
Tai Ly	c14078318c	[tosa] Add verifier checks for Scatter (#142661 ) This adds verifier checks for the scatter op to make sure the shapes of inputs and output are consistent with respect to spec. Signed-off-by: Tai Ly <tai.ly@arm.com>	2025-06-05 15:23:39 +01:00
Luke Hutton	9d5e1449f7	[mlir][tosa] Fix MulOp verifier handling for unranked operands (#141980 ) The previous verifier checks did not correctly handle unranked operands. For example, it could incorrectly assume the number of `rankedOperandTypes` would be >= 2, which isn't the case when both a and b are unranked. This change simplifies these checks such that they only operate over the intended a and b operands as opposed to the shift operand as well.	2025-06-05 08:54:01 +01:00
Adam Straw	3172c61895	[mlir][gpu] Fix bug with gpu.printf global location (#142872 ) Bug description: Global variables and functions created during gpu.printf conversion to NVVM may contain debug info metadata from function containing the gpu.printf which cannot be used out of that function.	2025-06-05 00:21:44 -06:00
Matthias Springer	e4c8ff94e7	[mlir][tensor] Add runtime verification for `cast`/`dim`/`extract`/`insert`/`extract_slice` (#141332 ) Add `RuntimeVerifiableOpInterface` implementations for the following ops. These were mostly copied from the respective memref implementations. Only the part that deals with offsets and strides was removed. * `tensor.cast`: `memref.cast` * `tensor.dim`: `memref.dim` * `tensor.extract`: `memref.load` * `tensor.insert`: `memref.store` * `tensor.extract_slice`: `memref.subview`	2025-06-05 12:06:47 +09:00
hanhanW	d96447b4d3	Reapply "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785 )" This reverts commit `178b64e75b`. The author misread the report of the failure, and thought that it broke the CI again. Reland the fix.	2025-06-04 09:05:15 -07:00
hanhanW	178b64e75b	Revert "Reland "[mlir][Affine] Handle null parent op in getAffineParallelInductionVarOwner" (#142785 )" This reverts commit `07a534160a`.	2025-06-04 08:59:54 -07:00

... 2 3 4 5 6 ...

13421 Commits