clang-p2996

Author	SHA1	Message	Date
Matthias Springer	c99670ba51	[mlir][vector] `LoadOp`/`StoreOp`: Allow 0-D vectors (#76134 ) Similar to `vector.transfer_read`/`vector.transfer_write`, allow 0-D vectors. This commit fixes `mlir/test/Dialect/Vector/vector-transfer-to-vector-load-store.mlir` when verifying the IR after each pattern (#74270). That test produces a temporary 0-D load/store op.	2023-12-22 11:12:58 +09:00
Billy Zhu	34a65980d7	[MLIR] Erase location of folded constants (#75415 ) Follow up to the discussion from #75258, and serves as an alternate solution for #74670. Set the location to Unknown for deduplicated / moved / materialized constants by OperationFolder. This makes sure that the folded constants don't end up with an arbitrary location of one of the original ops that became it, and that hoisted ops don't confuse the stepping order.	2023-12-21 09:54:48 -08:00
Finn Plummer	88151dd428	[mlir][spirv] Add folding for SNegate, [Logical]Not (#74992 ) Add missing constant propogation folder for SNegate, [Logical]Not. Implement additional folding when !(!x) for all ops. This helps for readability of lowered code into SPIR-V. Part of work for #70704	2023-12-21 18:24:01 +01:00
Oleksandr "Alex" Zinenko	11140cc238	[mlir] mark ChangeResult as nodiscard (#76147 ) This enum is used by dataflow analyses to indicate whether further propagation is necessary to reach the fix point. Accidentally discarding such a value will likely lead to propagation stopping early, leading to incomplete or incorrect results. The most egregious example is the duality between `join` on the analysis class, which triggers propagation internally, and `join` on the lattice class that does not and expects the caller to trigger it depending on the returned `ChangeResult`.	2023-12-21 17:58:53 +01:00
Jakub Kuderski	72003adf6b	[mlir][gpu] Allow subgroup reductions over 1-d vector types (#76015 ) Each vector element is reduced independently, which is a form of multi-reduction. The plan is to allow for gradual lowering of multi-reduction that results in fewer `gpu.shuffle` ops at the end: 1d `vector.multi_reduction` --> 1d `gpu.subgroup_reduce` --> smaller 1d `gpu.subgroup_reduce` --> packed `gpu.shuffle` over i32 For example we can perform 2 independent f16 reductions with a series of `gpu.shuffles` over i32, reducing the final number of `gpu.shuffles` by 2x.	2023-12-21 11:55:43 -05:00
Valentin Clement	a25da1a921	[mlir][openacc] Add device_type support for compute operations (#75864 ) Re-land PR after being reverted because of buildbot failures. This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 20:36:09 -08:00
Valentin Clement	553748356c	Revert "[mlir][openacc] Add device_type support for compute operations (#75864 )" This reverts commit `8b885eb90f`.	2023-12-20 16:08:10 -08:00
Valentin Clement (バレンタインクレメン)	8b885eb90f	[mlir][openacc] Add device_type support for compute operations (#75864 ) This patch adds representation for `device_type` clause information on compute construct (parallel, kernels, serial). The `device_type` clause on compute construct impacts clauses that appear after it. The values impacted by `device_type` are now tied with an attribute array that represent the device_type associated with them. `DeviceType::None` is used to represent the value produced by a clause before any `device_type`. The operands and the attribute information are parser/printed together. This is an example with `vector_length` clause. The first value (64) is not impacted by `device_type` so it will be represented with DeviceType::None. None is not printed. The second value (128) is tied with the `device_type(multicore)` clause. ``` !$acc parallel vector_length(64) device_type(multicore) vector_length(256) ``` ``` acc.parallel vector_length(%c64 : i32, %c128 : i32 [#acc.device_type<multicore>]) { } ``` When multiple values can be produced for a single clause like `num_gangs` and `wait`, an extra attribute describe the number of values belonging to each `device_type`. Values and attributes are parsed/printed together. ``` acc.parallel num_gangs({%c2 : i32, %c4 : i32}, {%c4 : i32} [#acc.device_type<nvidia>]) ``` While preparing this patch I noticed that the wait devnum is not part of the operations and is not lowered. It will be added in a follow up patch.	2023-12-20 13:45:47 -08:00
Alex Beloi	d84c640143	[mlir] Remove "Syntax:" parser where it's already provided by `assemblyFormat` (#76002 ) See #73359 Types using `assemblyFormat` to define parsing don't need an additional handwritten parser. So we should remove the handwritten parsers where one provided by an `assemblyFormat` already exists to avoid confusion and de-syncing.	2023-12-20 14:58:51 -05:00
Razvan Lupusoru	a711b042fd	[acc] Initial implementation of MemoryEffects on `acc` operations (#75970 ) The `acc` dialect operations now implement MemoryEffects interfaces in the following ways: - Data entry operations which may read host memory via `varPtr` are now marked as so. The majority of them do NOT actually read the host memory. For example, `acc.present` works on the basis of presence of pointer and not necessarily what the data points to - so they are not marked as reading the host memory. They still use `varPtr` though but this dependency is reflected through ssa. - Data clause operations which may mutate the data pointed to by `accPtr` are marked as doing so. - Data clause operations which update required structured or dynamic runtime counters are marked as reading and writing the newly defined `RuntimeCounters` resource. Some operations, like `acc.getdeviceptr` do not actually use the runtime counters - but are marked as reading them since the address obtained depends on the mapping operations which do update the runtime counters. Namely, `acc.getdeviceptr` cannot be moved across other mapping operations. - Constructs are marked as writing to the `ConstructResource`. This may be too strict but is needed for the following reasons: 1) Structured constructs may not use `accPtr` and instead use `varPtr` - when this is the case, data actions may be removed even when used. 2) Unstructured constructs are currently used to aggregate multiple data actions. We do not want such constructs removed or moved for now. - Terminators are marked as `Pure` as in other dialects. The current approach has the following limitations which may require further improvements: - Subsequent `acc.copyin` operations on same data do not actually read host memory pointed to by `varPtr` but are still marked as so. - Two `acc.delete` operations on same data may not mutate `accPtr` until the runtime counters are zero (but are still marked as mutating). - The `varPtrPtr` argument, when present, points to the address of location of `varPtr`. When mapping to target device, an `accPtrPtr` needs computed and this memory is mutated. This effect is not captured since the current operations do not produce `accPtrPtr`. - Runtime counter effects are imprecise since two operations with differing `varPtr` increment/decrement different counters. Additionally, operations with `varPtrPtr` mutate attachment counters. - The `ConstructResource` is too strict and likely can be relaxed with better modeling.	2023-12-20 07:11:19 -08:00
Gil Rapaport	d9803841f2	[mlir][emitc] Add op modelling C expressions (#71631 ) Add an emitc.expression operation that models C expressions, and provide transforms to form and fold expressions. The translator emits the body of emitc.expression ops as a single C expression. This expression is emitted by default as the RHS of an EmitC SSA value, but if possible, expressions with a single use that is not another expression are instead inlined. Specific expression's inlining can be fine tuned by lowering passes and transforms.	2023-12-20 15:04:46 +02:00
Finn Plummer	4c83c27c91	[mlir][spirv] Add folding for [I\|Logical][Not]Equal (#74194 )	2023-12-20 11:00:28 +01:00
Matthias Springer	f7096428b4	[mlir][GPU] Add `RecursiveMemoryEffects` to `gpu.launch` (#75315 ) Infer the side effects of `gpu.launch` from its body.	2023-12-20 15:25:25 +09:00
Matthias Springer	c4457e10fe	[mlir][IR] Change block/region walkers to enumerate `this` block/region (#75020 ) This change makes block/region walkers consistent with operation walkers. An operation walk enumerates the current operation. Similarly, block/region walks should enumerate the current block/region. Example: ``` // Current behavior: op1->walk([](Operation op2) { / op1 is enumerated / }); block1->walk([](Block block2) { /* block1 is NOT enumerated / }); region1->walk([](Block block) { /* blocks of region1 are NOT enumerated / }); region1->walk([](Region region2) { /* region1 is NOT enumerated }); // New behavior: op1->walk([](Operation op2) { / op1 is enumerated / }); block1->walk([](Block block2) { /* block1 IS enumerated / }); region1->walk([](Block block) { /* blocks of region1 ARE enumerated / }); region1->walk([](Region region2) { /* region1 IS enumerated }); ```	2023-12-20 14:51:45 +09:00
Jakub Kuderski	560564f51c	[mlir][vector][gpu] Align minf/maxf reduction kind names with arith (#75901 ) This is to avoid confusion when dealing with reduction/combining kinds. For example, see a recent PR comment: https://github.com/llvm/llvm-project/pull/75846#discussion_r1430722175. Previously, they were picked to mostly mirror the names of the llvm vector reduction intrinsics: https://llvm.org/docs/LangRef.html#llvm-vector-reduce-fmin-intrinsic. In isolation, it was not clear if `<maxf>` has `arith.maxnumf` or `arith.maximumf` semantics. The new reduction kind names map 1:1 to arith ops, which makes it easier to tell/look up their semantics. Because both the vector and the gpu dialect depend on the arith dialect, it's more natural to align names with those in arith than with the lowering to llvm intrinsics. Issue: https://github.com/llvm/llvm-project/issues/72354	2023-12-20 00:14:43 -05:00
Matthias Springer	10056c821a	[mlir][SCF] `scf.parallel`: Make reductions part of the terminator (#75314 ) This commit makes reductions part of the terminator. Instead of `scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops. `scf.reduce` may contain an arbitrary number of reductions, with one region per reduction. Example: ```mlir %init = arith.constant 0.0 : f32 %r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init) -> f32, f32 { %elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32> %elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32> scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) { ^bb0(%lhs : f32, %rhs: f32): %res = arith.addf %lhs, %rhs : f32 scf.reduce.return %res : f32 }, { ^bb0(%lhs : f32, %rhs: f32): %res = arith.mulf %lhs, %rhs : f32 scf.reduce.return %res : f32 } } ``` `scf.reduce` operations can no longer be interleaved with other ops in the body of `scf.parallel`. This simplifies the op and makes it possible to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was not possible before because the op was not a terminator, causing the op to be DCE'd.)	2023-12-20 11:06:27 +09:00
Abhinav271828	cfd51fbadd	[MLIR][Presburger] Add LLL basis reduction (#75565 ) Add a method for LLL basis reduction to the FracMatrix class. This needs an abs() method for Fractions, which is added to Fraction.h.	2023-12-19 17:31:38 +01:00
Ivan Butygin	c0d2ea9d42	[mlir][scf] Improve `scf.parallel` fusion pass (#75852 ) Abort fusion if memref load may alias write, but not the exact alias. Add alias check hook to `naivelyFuseParallelOps`, so user can customize alias checking. Use builtin alias analysis in `ParallelLoopFusion` pass.	2023-12-19 18:07:46 +03:00
Oleksandr "Alex" Zinenko	9519e3ecbf	[mlir] support dialect attribute translation to LLVM IR (#75309 ) Extend the `amendOperation` mechanism for translating dialect attributes attached to operations from another dialect when translating MLIR to LLVM IR. Previously, this mechanism would have no knowledge of the LLVM IR instructions created for the given operation, making it impossible for it to perform local modifications such as attaching operation-level metadata. Collect instructions inserted by the LLVM IR builder and pass them to `amendOperation`.	2023-12-19 14:18:16 +01:00
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Jakub Kuderski	07677113ff	[mlir][vector] Add pattern to break down reductions into arith ops (#75727 ) The number of vector elements considered 'small' enough to extract is parameterized. This is to avoid going into specialized reduction lowering when a single/couple of arith ops can do. Targets without dedicated reduction intrinsics can use that as an emulation path too. Depends on https://github.com/llvm/llvm-project/pull/75846.	2023-12-18 17:54:54 -05:00
Jakub Kuderski	a528cee224	[mlir][vector] Improve `makeArithReduction` expansion (#75846 ) Propagate fast math flags. Distinguish `minf`/`maxf` and `minimumf`/`maximumf`. Required for future patterns in https://github.com/llvm/llvm-project/pull/75727.	2023-12-18 17:47:46 -05:00
Oleksandr "Alex" Zinenko	32a4e3fcca	[mlir] support non-interprocedural dataflow analyses (#75583 ) The core implementation of the dataflow anlysis framework is interpocedural by design. While this offers better analysis precision, it also comes with additional cost as it takes longer for the analysis to reach the fixpoint state. Add a configuration mechanism to the dataflow solver to control whether it operates inteprocedurally or not to offer clients a choice. As a positive side effect, this change also adds hooks for explicitly processing external/opaque function calls in the dataflow analyses, e.g., based off of attributes present in the the function declaration or call operation such as alias scopes and modref available in the LLVM dialect. This change should not affect existing analyses and the default solver configuration remains interprocedural. Co-authored-by: Jacob Peng <jacobmpeng@gmail.com>	2023-12-18 14:16:52 +01:00
Jakub Kuderski	2c668fddad	[mlir][gpu] Trim trailing whitespace in GPUOps.td. NFC.	2023-12-17 21:34:29 -05:00
Jakub Kuderski	dd45be028d	[mlir][gpu] Trim trailing whitespace in dialect docs. NFC.	2023-12-17 21:00:06 -05:00
Rob Suderman	aa165edca8	[mlir][math] Added `math.sinh` with expansions to `math.exp` (#75517 ) Includes end-to-end tests for the cpu running, folders using `libm` and lowerings to the corresponding `libm` operations.	2023-12-15 11:35:40 -08:00
Boian Petkantchin	5e29112719	[mlir][mesh] Add verification and canonicalization for some collectives (#74905 ) Add verification and canonicalization for broadcast, gather, recv, reduce, scatter, send and shift. The canonicalizations only remove trivial collectives with empty mesh_axes attrubutes.	2023-12-15 06:41:10 -08:00
Rafael Ubal	214d32ccd2	Support for dynamic dimensions in 'tensor.splat' (#74626 ) This feature had been marked as `TODO` in the `tensor.splat` documentation for a while. This MR includes: - Support for dynamically shaped tensors in the return type of `tensor.splat` with the syntax suggested in the `TODO` comment. - Updated op documentation. - Bufferization support. - Updates in op folders affected by the new feature. - Unit tests for valid/invalid syntax, valid/invalid folding, and lowering through bufferization. - Additional op builders resembling those available in `tensor.empty`.	2023-12-15 13:54:45 +00:00
martin-luecke	681eacc1b6	[MLIR][transform][python] add sugared python abstractions for transform dialect (#75073 ) This adds Python abstractions for the different handle types of the transform dialect The abstractions allow for straightforward chaining of transforms by calling their member functions. As an initial PR for this infrastructure, only a single transform is included: `transform.structured.match`. With a future `tile` transform abstraction an example of the usage is: ```Python def script(module: OpHandle): module.match_ops(MatchInterfaceEnum.TilingInterface).tile(tile_sizes=[32,32]) ``` to generate the following IR: ```mlir %0 = transform.structured.match interface{TilingInterface} in %arg0 %tiled_op, %loops = transform.structured.tile_using_for %0 [32, 32] ``` These abstractions are intended to enhance the usability and flexibility of the transform dialect by providing an accessible interface that allows for easy assembly of complex transformation chains.	2023-12-15 13:04:43 +01:00
Hsiangkai Wang	f643eec892	[mlir][vector] Add emulation patterns for vector masked load/store (#74834 ) In this patch, it will convert ``` vector.maskedload %base[%idx_0, %idx_1], %mask, %pass_thru ``` to ``` %ivalue = %pass_thru %m = vector.extract %mask[0] %result0 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1] %combined = vector.insert %v, %ivalue[0] scf.yield %combined } else { scf.yield %ivalue } %m = vector.extract %mask[1] %result1 = scf.if %m { %v = memref.load %base[%idx_0, %idx_1 + 1] %combined = vector.insert %v, %result0[1] scf.yield %combined } else { scf.yield %result0 } ... ``` It will convert ``` vector.maskedstore %base[%idx_0, %idx_1], %mask, %value ``` to ``` %m = vector.extract %mask[0] scf.if %m { %extracted = vector.extract %value[0] memref.store %extracted, %base[%idx_0, %idx_1] } %m = vector.extract %mask[1] scf.if %m { %extracted = vector.extract %value[1] memref.store %extracted, %base[%idx_0, %idx_1 + 1] } ... ```	2023-12-15 11:35:48 +00:00
Felix Schneider	8190369e83	[mlir][tosa] Add verifier for `tosa.transpose` (#75376 ) This patch adds a verifier to `tosa.transpose` which fixes a crash. Related: https://github.com/llvm/llvm-project/pull/74367 Fix https://github.com/llvm/llvm-project/issues/74479	2023-12-15 07:22:32 +01:00
Vivian	bd6a2452ae	[mlir][SCF] Add support for peeling the first iteration out of the loop (#74015 ) There is a use case that we need to peel the first iteration out of the for loop so that the peeled forOp can be canonicalized away and the fillOp can be fused into the inner forall loop. For example, we have nested loops as below ``` linalg.fill ins(...) outs(...) scf.for %arg = %lb to %ub step %step scf.forall ... ``` After the peeling transform, it is expected to be ``` scf.forall ... linalg.fill ins(...) outs(...) scf.for %arg = %(lb + step) to %ub step %step scf.forall ... ``` This patch makes the most use of the existing peeling functions and adds support for peeling the first iteration out of the loop.	2023-12-14 17:03:52 -08:00
Jacques Pienaar	ee2deb4cf7	[mlir] Handle simple commutative cases in CSE. Tried to keep this simple while handling obvious CSE instances. For more complicated cases the expectation is still that the sorting pass would run before. While simple, this case did turn up in a real deployed instance where it had a large (>10% e2e) impact. This can of course be refined.	2023-12-14 16:09:05 -08:00
Yinying Li	7bc6c4abe8	[mlir][print]Add functions for printing memref f16/bf16/i16 (#75094 ) 1. Added functions for printMemrefI16/f16/bf16. 2. Added a new integration test for all the printMemref functions.	2023-12-14 13:06:25 -05:00
Tobias Gysi	25d942403c	[mlir][llvm] Add invariant intrinsics (#75354 ) This commit implements the LLVM IR invariant intrinsics in LLVM dialect. These intrinsics can be used to mark a program regions in which the contents of a specific memory object will not change. The LLVM dialect implementation also implements the PromotableOpInterface to ensure Mem2Reg & SROA are able to promote pointers that are marked using the invariant intrinsics.	2023-12-14 14:58:45 +01:00
Kareem Ergawy	2ab926d959	[flang][MLIR][OpenMP] Add support for `target update` directive. (#75047 ) Add an op in the OMP dialect to model the `target update` direcive. This change reuses the `MapInfoOp` used by other device directive to model `map` clauses but verifies that the restrictions imposed by the `target update` directive are respected.	2023-12-14 12:48:45 +01:00
Cullen Rhodes	0e06694235	[mlir][ArmSME][NFC] Remove arm_sme::populateVectorTransferLoweringPatterns decl (#75442 ) Unused since D154867.	2023-12-14 10:51:28 +00:00
Pablo Antonio Martinez	7f4f75c144	[MLIR][SCFToOpenMP] Add num-threads option (#74854 ) Add `num-threads` option to the `-convert-scf-to-openmp` pass, allowing to set the number of threads to be used in the `omp.parallel` to a fixed value.	2023-12-14 09:07:17 +00:00
Prathamesh Tagore	2255795f28	[mlir] [tensor] Fix typo in tensor.pack documentation (#74922 )	2023-12-14 11:21:10 +05:30
Fangrui Song	2a9d8caf29	Revert "[MLIR] Fuse locations of merged constants (#74670 )" This reverts commit `87e2e89019`. and its follow-ups `0d1490f09f` (#75218) and `6fe3cd5467` (#75312). We observed significant OOM/timeout issues due to #74670 to quite a few services including google-research/swirl-lm. The follow-up #75218 and #75312 do not address the issue. Perhaps this is worth more investigation.	2023-12-13 13:49:03 -08:00
Andrzej Warzyński	c02d07fdf0	[mlir][vector] Add pattern to drop unit dim from elementwise(a, b)) (#74817 ) For vectors with either leading or trailing unit dim, replaces: elementwise(a, b) with: sc_a = shape_cast(a) sc_b = shape_cast(b) res = elementwise(sc_a, sc_b) return shape_cast(res) The newly inserted shape_cast Ops fold (before elementwise Op) and then restore (after elementwise Op) the unit dim. Vectors `a` and `b` are required to be rank > 1. Example: ```mlir %mul = arith.mulf %B_row, %A_row : vector<1x[4]xf32> %cast = vector.shape_cast %mul : vector<1x[4]xf32> to vector<[4]xf32> ``` gets converted to: ```mlir %B_row_sc = vector.shape_cast %B_row : vector<1x[4]xf32> to vector<[4]xf32> %A_row_sc = vector.shape_cast %A_row : vector<1x[4]xf32> to vector<[4]xf32> %mul = arith.mulf %B_row_sc, %A_row_sc : vector<[4]xf32> %mul_sc = vector.shape_cast %mul : vector<[4]xf32> to vector<1x[4]xf32> %cast = vector.shape_cast %mul_sc : vector<1x[4]xf32> to vector<[4]xf32> ``` In practice, the bottom 2 shape_cast(s) will be folded away.	2023-12-13 20:29:12 +00:00
Tom Eccles	79524ba527	[mlir][ArmSME] Add sve streaming compatible attribute (#75222 ) Following the same path already used for ArmStreaming and ArmLocallyStreaming. This should correspond to clang's __arm_streaming_compatible attribute.	2023-12-13 13:53:01 +00:00
Sungsoon Cho	762964e97f	Add cosh op to the math dialect. (#75153 )	2023-12-13 12:25:37 +01:00
Georgios Pinitas	92433285d7	[mlir][ArmSME] Add missing dependencies in ArmSME transforms (#75269 ) Inject missing dependency between generated files that could cause build issues. Signed-off-by: Georgios Pinitas <georgios.pinitas@arm.com>	2023-12-13 10:28:16 +00:00
Christian Ulmann	eab62971cd	[MLIR][LLVM] Support nameless and scopeless global constants (#75307 ) This commit ensures that we model DI information for global constants correctly. These constructs can lack scopes, names, and linkage names, so these parameters were made optional for the DIGlobalVariable attribute.	2023-12-13 10:47:59 +01:00
Abhinav271828	84ab06ba2f	[MLIR][Presburger] Add Gram-Schmidt (#70843 ) Implement Gram-Schmidt orthogonalisation for the FracMatrix class. This requires dotProduct, which has been added as a util.	2023-12-13 08:28:47 +00:00
Boian Petkantchin	4b3446771f	[mlir][mesh] Add endomorphism simplification for all-reduce (#73150 ) Does transformations like all_reduce(x) + all_reduce(y) -> all_reduce(x + y) max(all_reduce(x), all_reduce(y)) -> all_reduce(max(x, y)) when the all_reduce element-wise op is max. Added general rewrite pattern HomomorphismSimplification and EndomorphismSimplification that encapsulate the general algorithm. Made specialization for all-reduce with respect to addf, addi, minsi, maxsi, minimumf and maximumf in the Arithmetic dialect.	2023-12-12 10:21:52 -08:00
Jakub Kuderski	8063622721	[mlir][vector] Allow vector distribution with multiple written elements (#75122 ) Add a configuration option to allow vector distribution with multiple elements written by a single lane. This is so that we can perform vector multi-reduction with multiple results per workgroup.	2023-12-12 13:15:17 -05:00
Ivan R. Ivanov	d5fb4c0f11	[MLIR][NVVM] Enable nvvm intrinsics import to LLVMIR (#68843 ) Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com> Co-authored-by: Christian Ulmann <christianulmann@gmail.com>	2023-12-12 13:31:55 +09:00
Billy Zhu	87e2e89019	[MLIR] Fuse locations of merged constants (#74670 ) When merging constants by the operation folder, the location of the op that remains should be updated to track the new meaning of this op. This way we do not lose track of all possible source locations that the constant op came from, and the final location of the op is less reliant on the order of folding. This will also help debuggers understand how to step these instructions. This PR introduces a helper for operation folder to fuse another location into the location of an op. When an op is deduplicated, fuse the location of the op to be removed into the op that is retained. The retained op now represents both original ops. The FusedLoc will have a string metadata to help understand the reason for the location fusion (motivated by the [example](`71be8f3c23/mlir/include/mlir/IR/BuiltinLocationAttributes.td (L130)`) in the docstring of FusedLoc).	2023-12-11 19:31:54 -08:00

1 2 3 4 5 ...

9651 Commits