clang-p2996

Author	SHA1	Message	Date
Guray Ozen	5caae72d1a	[mlir][gpu] Productize `test-lower-to-nvvm` as `gpu-lower-to-nvvm` (#75775 ) The `test-lower-to-nvvm` pipeline serves as the common and proper pipeline for nvvm+host compilation, and it's used across our CUDA integration tests. This PR updates the `test-lower-to-nvvm` pipeline to `gpu-lower-to-nvvm` and moves it within `InitAllPasses.h`. The aim is to call it from Python, also having a standardize compilation process for nvvm.	2023-12-19 08:40:46 +01:00
Stella Laurenzo	8eff570482	Add missing dep on MLIRToLLVMIRTranslationRegistration to mlir-opt. (#75111 ) I was not able to fully triage why this just started failing on one of our bots as it seems that the use was added 4 months ago. I would assume that it was accidentally coming in transitively in some way as the dep was definitely missing. For context, this started failing in [our byo_llvm](https://github.com/openxla/iree/blob/main/build_tools/llvm/byo_llvm.sh) build on a stock build of MLIR on top of an existing LLVM. We were getting: ``` ld.lld: error: undefined symbol: mlir::registerSPIRVDialectTranslation(mlir::DialectRegistry&) >>> referenced by mlir-opt.cpp >>> tools/mlir-opt/CMakeFiles/mlir-opt.dir/mlir-opt.cpp.o:(main) ```	2023-12-12 14:10:06 -08:00
Boian Petkantchin	4b3446771f	[mlir][mesh] Add endomorphism simplification for all-reduce (#73150 ) Does transformations like all_reduce(x) + all_reduce(y) -> all_reduce(x + y) max(all_reduce(x), all_reduce(y)) -> all_reduce(max(x, y)) when the all_reduce element-wise op is max. Added general rewrite pattern HomomorphismSimplification and EndomorphismSimplification that encapsulate the general algorithm. Made specialization for all-reduce with respect to addf, addi, minsi, maxsi, minimumf and maximumf in the Arithmetic dialect.	2023-12-12 10:21:52 -08:00
Matteo Franciolini	7ad9e9dcf5	[mlir][bytecode] Implements back deployment capability for MLIR dialects (#70724 ) When emitting bytecode, clients can specify a target dialect version to emit in `BytecodeWriterConfig`. This exposes a target dialect version to the DialectBytecodeWriter, which can be queried by name and used to back-deploy attributes, types, and properties.	2023-10-31 15:41:29 -07:00
Fabian Mora	1828deb752	[mlir][gpu] Deprecate gpu::Serialization* passes. (#65857 ) Deprecate the `gpu-to-cubin` & `gpu-to-hsaco` passes in favor of the `TargetAttr` workflow. This patch removes remaining upstream uses of the aforementioned passes, including the option to use them in `mlir-opt`. A future patch will remove these passes entirely. The passes can be re-enabled in `mlir-opt` by adding the CMake flag: `-DMLIR_ENABLE_DEPRECATED_GPU_SERIALIZATION=1`.	2023-09-11 16:32:15 -04:00
Will Dietz	08ed557714	[mlir] mlir-opt: Fix linking after `7c4e8c6a27` . Without this, undefined refernces to the LLVMIR translations: ``` ld: mlir-opt.cpp:(.text.startup.main+0x49): undefined reference to `mlir::registerAMXDialectTranslation(mlir::DialectRegistry&)' ld: mlir-opt.cpp:(.text.startup.main+0x51): undefined reference to `mlir::registerArmSMEDialectTranslation(mlir::DialectRegistry&)' ld: mlir-opt.cpp:(.text.startup.main+0x59): undefined reference to `mlir::registerArmSVEDialectTranslation(mlir::DialectRegistry&)' ld: mlir-opt.cpp:(.text.startup.main+0x81): undefined reference to `mlir::registerOpenACCDialectTranslation(mlir::DialectRegistry&)' ld: mlir-opt.cpp:(.text.startup.main+0x89): undefined reference to `mlir::registerOpenMPDialectTranslation(mlir::DialectRegistry&)' ld: mlir-opt.cpp:(.text.startup.main+0x99): undefined reference to `mlir::registerX86VectorDialectTranslation(mlir::DialectRegistry&)' ``` Reviewed By: stellaraccident Differential Revision: https://reviews.llvm.org/D158606	2023-08-25 20:28:27 -05:00
Nicolas Vasilache	7c4e8c6a27	[mlir] Disentangle dialect and extension registrations. This revision avoids the registration of dialect extensions in Pass::getDependentDialects. Such registration of extensions can be dangerous because `DialectRegistry::isSubsetOf` is always guaranteed to return false for extensions (i.e. there is no mechanism to track whether a lambda is already in the list of already registered extensions). When the context is already in a multi-threaded mode, this is guaranteed to assert. Arguably a more structured registration mechanism for extensions with a unique ExtensionID could be envisioned in the future. In the process of cleaning this up, multiple usage inconsistencies surfaced around the registration of translation extensions that this revision also cleans up. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D157703	2023-08-22 00:40:09 +00:00
Matteo Franciolini	bff6a4292f	Expose callbacks for encoding of types/attributes [mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode. Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect. Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D153383	2023-07-28 16:45:42 -07:00
Mehdi Amini	b86a13211f	Revert "Expose callbacks for encoding of types/attributes" This reverts commit `b299ec1666`. The authorship informations were incorrect.	2023-07-28 16:45:42 -07:00
Mehdi Amini	b299ec1666	Expose callbacks for encoding of types/attributes [mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode. Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect. Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D153383	2023-07-28 10:44:02 -07:00
Srishti Srivastava	de826ea35d	[MLIR][ANALYSIS] Add liveness analysis utility This commit adds a utility to implement liveness analysis using the sparse backward data-flow analysis framework. Theoretically, liveness analysis assigns liveness to each (value, program point) pair in the program and it is thus a dense analysis. However, since values are immutable in MLIR, a sparse analysis, which will assign liveness to each value in the program, suffices here. Liveness analysis has many applications. It can be used to avoid the computation of extraneous operations that have no effect on the memory or the final output of a program. It can also be used to optimize register allocation. Both of these applications help achieve one very important goal: reducing runtime. A value is considered "live" iff it: (1) has memory effects OR (2) is returned by a public function OR (3) is used to compute a value of type (1) or (2). It is also to be noted that a value could be of multiple types (1/2/3) at the same time. A value "has memory effects" iff it: (1.a) is an operand of an op with memory effects OR (1.b) is a non-forwarded branch operand and a block where its op could take the control has an op with memory effects. A value `A` is said to be "used to compute" value `B` iff `B` cannot be computed in the absence of `A`. Thus, in this implementation, we say that value `A` is used to compute value `B` iff: (3.a) `B` is a result of an op with operand `A` OR (3.b) `A` is used to compute some value `C` and `C` is used to compute `B`. --- It is important to note that there already exists an MLIR liveness utility here: llvm-project/mlir/include/mlir/Analysis/Liveness.h. So, what is the need for this new liveness analysis utility being added by this commit? That need is explained as follows:- The similarities between these two utilities is that both use the fixpoint iteration method to converge to the final result of liveness. And, both have the same theoretical understanding of liveness as well. However, the main difference between (a) the existing utility and (b) the added utility is the "scope of the analysis". (a) is restricted to analysing each block independently while (b) analyses blocks together, i.e., it looks at how the control flows from one block to the other, how a caller calls a callee, etc. The restriction in the former implies that some potentially non-live values could be marked live and thus the full potential of liveness analysis will not be realised. This can be understood using the example below: ``` 1 func.func private @private_dead_return_value_removal_0() -> (i32, i32) { 2 %0 = arith.constant 0 : i32 3 %1 = arith.addi %0, %0 : i32 4 return %0, %1 : i32, i32 5 } 6 func.func @public_dead_return_value_removal_0() -> (i32) { 7 %0:2 = func.call @private_dead_return_value_removal_0() : () -> (i32, i32) 8 return %0#0 : i32 9 } ``` Here, if we just restrict our analysis to a per-block basis like (a), we will say that the %1 on line 3 is live because it is computed and then returned outside its block by the function. But, if we perform a backward data-flow analysis like (b) does, we will say that %0#1 of line 7 is not live because it isn't returned by the public function and thus, %1 of line 3 is also not live. So, while (a) will be unable to suggest any IR optimizations, (b) can enable this IR to convert to:- ``` 1 func.func private @private_dead_return_value_removal_0() -> i32 { 2 %0 = arith.constant 0 : i32 3 return %0 : i32 4 } 5 func.func @public_dead_return_value_removal_0() -> i32 { 6 %0 = call @private_dead_return_value_removal_0() : () -> i32 7 return %0 : i32 8 } ``` One operation was removed and one unnecessary return value of the function was removed and the function signature was modified. This is an optimization that (b) can enable but (a) cannot. Such optimizations can help remove a lot of extraneous computations that are currently being done. Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com> Reviewed By: matthiaskramm, jcai19 Differential Revision: https://reviews.llvm.org/D153779	2023-07-21 13:29:14 -07:00
Mahesh Ravishankar	67399932c7	[mlir][Linalg] Cleanup the drop unit dims pass in Linalg. TL;DR the following API functions have been merged ``` void populateFoldUnitExtentDimsViaReshapesPatterns(RewritePatternSet &patterns); void populateFoldUnitExtentDimsViaSlicesPatterns(RewritePatternSet &patterns); ``` into ``` void populateFoldUnitExtentDimsPatterns(RewritePatternSet &patterns, ControlDropUnitDims &options); ``` To use the previous functionality use ``` ControlDropUnitDims options; // By default options.rankReductionStrategy is // ControlDropUnitDims::RankReductionStrategy::ReassociativeReshape. populateFoldUnitExtentDimsPatterns(patterns, options); ``` and ``` ControlDropUnitDims options; options.rankReductionStrategy = ControlDropUnitDims::RankReductionStrategy::ExtractInsertSlice populateFoldUnitExtentDimsPatterns(patterns, options); ``` This pass is quite old and needed to be updated based on the current approach to transformations in Linalg - Instead of two patterns, one to just remove loop dimensions that are unit extent (and using 0 in the indexing maps), and another to drop the unit-extents in the operand shapes, combine into a single transformation. This avoid creating an intermediate step with indexing maps having 0's in the domains exp ressions. - Expose the core transformation as a utility function and add a pattern that calls this transformation. This is a mostly NFC change, apart from the API change and dropping the patterns/test that only dropped the loops that are unit extents. Differential Revision: https://reviews.llvm.org/D155518	2023-07-19 17:47:18 +00:00
Nicolas Vasilache	7e78ecfe10	[mlir][cuda] Add a test-lower-to-nvvm catchall passpipeline. This mirrors the test-lower-to-llvm pass pipeline that provides some sanity when running e2e examples. One peculiarity of the GPU pipeline is that we want to allow 32b indexing in kernels. This is currently not straightforward as there are dependencies between passes. This new test pass orders passes in a way that connects end-to-end. Differential Revision: https://reviews.llvm.org/D155463	2023-07-17 15:18:33 +00:00
Alex Zinenko	8a918c54bb	[mlir] add backward dense dataflow analysis This is the counterpart to the forward dense dataflow analysis and integrates into the dataflow framework. The implementation follows the structure of existing dataflow analyses. Reviewed By: Mogball, phisiart Differential Revision: https://reviews.llvm.org/D154713	2023-07-11 16:47:53 +00:00
Tobias Gysi	728a8d5a81	[mlir] Add a builtin distinct attribute A distinct attribute associates a referenced attribute with a unique identifier. Every call to its create function allocates a new distinct attribute instance. The address of the attribute instance temporarily serves as its unique identifier. Similar to the names of SSA values, the final unique identifiers are generated during pretty printing. Examples: #distinct = distinct[0]<42.0 : f32> #distinct1 = distinct[1]<42.0 : f32> #distinct2 = distinct[2]<array<i32: 10, 42>> This mechanism is meant to generate attributes with a unique identifier, which can be used to mark groups of operations that share a common properties such as if they are aliasing. The design of the distinct attribute ensures minimal memory footprint per distinct attribute since it only contains a reference to another attribute. All distinct attributes are stored outside of the storage uniquer in a thread local store that is part of the context. It uses one bump pointer allocator per thread to ensure distinct attributes can be created in-parallel. Reviewed By: rriddle, Dinistro, zero9178 Differential Revision: https://reviews.llvm.org/D153360	2023-07-11 07:33:16 +00:00
yzhang93	5a1cdcbd86	[mlir] Narrow bitwidth emulation for MemRef load This patch adds support for narrow bitwidth storage emulation. The goal is to support sub-byte type codegen for LLVM CPU. Specifically, a type converter is added to convert memref of narrow bitwidth (e.g., i4) into supported wider bitwidth (e.g., i8). Another focus of this patch is to populate the pattern for int4 memref.load. memref.store pattern should be added in a seperate patch. Reviewed By: hanchung, mravishankar Differential Revision: https://reviews.llvm.org/D151519	2023-06-26 14:18:30 -07:00
River Riddle	a5ef51d786	[mlir] Add support for "promised" interfaces Promised interfaces allow for a dialect to "promise" the implementation of an interface, i.e. declare that it supports an interface, but have the interface defined in an extension in a library separate from the dialect itself. A promised interface is powerful in that it alerts the user when the interface is attempted to be used (e.g. via cast/dyn_cast/etc.) and the implementation has not yet been provided. This makes the system much more robust against misconfiguration, and ensures that we do not lose the benefit we currently have of defining the interface in the dialect library. Differential Revision: https://reviews.llvm.org/D120368	2023-06-09 11:30:13 -07:00
Matteo Franciolini	612781918f	Preserve use-list orders in mlir bytecode This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D149755	2023-05-21 16:48:12 -07:00
Mehdi Amini	3128b3105d	Add support for Lazyloading to the MLIR bytecode IsolatedRegions are emitted in sections in order for the reader to be able to skip over them. A new class is exposed to manage the state and allow the readers to load these IsolatedRegions on-demand. Differential Revision: https://reviews.llvm.org/D149515	2023-05-20 15:24:33 -07:00
Mehdi Amini	9c8db444bc	Remove deprecated `preloadDialectInContext` flag for MlirOptMain that has been deprecated for 2 years See https://discourse.llvm.org/t/psa-preloaddialectincontext-has-been-deprecated-for-1y-and-will-be-removed/68992 Differential Revision: https://reviews.llvm.org/D149039	2023-04-24 14:37:31 -07:00
Mahesh Ravishankar	da784e77da	[mlir] Add a utility function to make a region isolated from above. The utility functions takes a region and makes it isolated from above by appending to the entry block arguments that represent the captured values and replacing all uses of the captured values within the region with the newly added arguments. The captures values are returned. The utility function also takes an optional callback that allows cloning operations that define the captured values into the region during the process of making it isolated from above. The cloned value is no longer a captured values. The operands of the operation are then captured values. This is done transitively allow cloning of a DAG of operations into the region based on the callback. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D148684	2023-04-20 16:40:25 +00:00
Matthias Springer	8c885658ed	[mlir][Interfaces] Add ValueBoundsOpInterface Ops can implement this interface to specify lower/upper bounds for their result values and block arguments. Bounds can be specified for: * Index-type values * Dimension sizes of shapes values The bounds are added to a constraint set. Users can query this constraint set to compute bounds wrt. to a user-specified set of values. Only EQ bounds are supported at the moment. This revision also contains interface implementations for various tensor dialect ops, which illustrates how to implement this interface. Differential Revision: https://reviews.llvm.org/D145681	2023-04-06 02:57:14 +02:00
Christian Ulmann	1ef51e0452	[mlir][Analysis] Introduce LoopInfo in mlir This commit introduces an instantiation of LLVM's LoopInfo for CFGs in MLIR. To test the LoopInfo, a test pass is added the checks the analysis results for a set of CFGs. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D147323	2023-04-05 12:57:16 +00:00
Ingo Müller	0ceb7a12db	[mlir] Implement pass utils for 1:N type conversions. The current dialect conversion does not support 1:N type conversions. This commit implements a (poor-man's) dialect conversion pass that does just that. To keep the pass independent of the "real" dialect conversion infrastructure, it provides a specialization of the TypeConverter class that allows for N:1 target materializations, a specialization of the RewritePattern and PatternRewriter classes that automatically add appropriate unrealized casts supporting 1:N type conversions and provide converted operands for implementing subclasses, and a conversion driver that applies the provided patterns and replaces the unrealized casts that haven't folded away with user-provided materializations. The current pass is powerful enough to express many existing manual solutions for 1:N type conversions or extend transforms that previously didn't support them, out of which this patch implements call graph type decomposition (which is currently implemented with a ValueDecomposer that is only used there). The goal of this pass is to illustrate the effect that 1:N type conversions could have, gain experience in how patterns should be written that achieve that effect, and get feedback on how the APIs of the dialect conversion should be extended or changed to support such patterns. The hope is that the "real" dialect conversion eventually supports such patterns, at which point, this pass could be removed again. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D144469	2023-03-27 16:04:26 +00:00
Ingo Müller	a8416e3c04	Revert "[mlir] Implement pass utils for 1:N type conversions." This reverts commit `9c4611f9c7`.	2023-03-27 09:23:57 +00:00
Ingo Müller	9c4611f9c7	[mlir] Implement pass utils for 1:N type conversions. The current dialect conversion does not support 1:N type conversions. This commit implements a (poor-man's) dialect conversion pass that does just that. To keep the pass independent of the "real" dialect conversion infrastructure, it provides a specialization of the TypeConverter class that allows for N:1 target materializations, a specialization of the RewritePattern and PatternRewriter classes that automatically add appropriate unrealized casts supporting 1:N type conversions and provide converted operands for implementing subclasses, and a conversion driver that applies the provided patterns and replaces the unrealized casts that haven't folded away with user-provided materializations. The current pass is powerful enough to express many existing manual solutions for 1:N type conversions or extend transforms that previously didn't support them, out of which this patch implements call graph type decomposition (which is currently implemented with a ValueDecomposer that is only used there). The goal of this pass is to illustrate the effect that 1:N type conversions could have, gain experience in how patterns should be written that achieve that effect, and get feedback on how the APIs of the dialect conversion should be extended or changed to support such patterns. The hope is that the "real" dialect conversion eventually supports such patterns, at which point, this pass could be removed again. Reviewed By: springerm Differential Revision: https://reviews.llvm.org/D144469	2023-03-27 09:02:28 +00:00
Nicolas Vasilache	0fa20ecafe	[mlir][Affine] Add helper functions to allow reordering affine.apply operands and decompose the ops into smaller components Care is taken to order operands from least hoistable to most hoistable and to process subexpressions in the same order. This allows exposing more oppportunities for licm, cse and strength reduction. Such a step should typically be applied while we still have loops in the IR and just before lowering affine ops to arith. This is because the affine.apply canonicalization currently tries to maximally compose chains of affine.apply operations and could undo the effects of these decompositions. Depends on: D145784 Differential Revision: https://reviews.llvm.org/D145685	2023-03-14 04:07:32 -07:00
Jakub Kuderski	b194ef692c	[mlir][spirv][vector] Add pattern to convert reduction to SPIR-V dot prod This converts a specific form of `vector.reduction` to SPIR-V integer dot product ops. Add a new test pass to excercise this outside of the main vector to spirv conversion pass. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D145760	2023-03-10 13:54:16 -05:00
Nicolas Vasilache	c624027633	[mlir][linalg][TransformOps] Connect hoistRedundantVectorTransfers Connect the hoistRedundantVectorTransfers functionality to the transform dialect. Authored-by: Quentin Colombet <quentin.colombet@gmail.com> Differential Revision: https://reviews.llvm.org/D144260	2023-02-20 01:50:29 -08:00
Tom Eccles	81a79ee446	[mlir] Add function for checking if a block is inside a loop This function returns whether a block is nested inside of a loop. There can be three kinds of loop: 1) The block is nested inside of a LoopLikeOpInterface 2) The block is nested inside another block which is in a loop 3) There is a cycle in the control flow graph This will be useful for Flang's stack arrays pass, which moves array allocations from the heap to the stack. Special handling is needed when allocations occur inside of loops to ensure additional stack space is not allocated on each loop iteration. Differential Revision: https://reviews.llvm.org/D141401	2023-02-10 16:14:17 +00:00
Kiran Chandramohan	bacf1aa3c0	Revert "[mlir] Add function for checking if a block is inside a loop" Reverting since the shared library builds are failing. This reverts commit `dcee187522`.	2023-02-09 18:36:28 +00:00
Tom Eccles	dcee187522	[mlir] Add function for checking if a block is inside a loop This function returns whether a block is nested inside of a loop. There can be three kinds of loop: 1) The block is nested inside of a LoopLikeOpInterface 2) The block is nested inside another block which is in a loop 3) There is a cycle in the control flow graph This will be useful for Flang's stack arrays pass, which moves array allocations from the heap to the stack. Special handling is needed when allocations occur inside of loops to ensure additional stack space is not allocated on each loop iteration. Differential Revision: https://reviews.llvm.org/D141401	2023-02-09 15:18:54 +00:00
Ingo Müller	b716bf84ea	[mlir][scf] Fix builder of WhileOp with region builder arguments. The overload of WhileOp::build with arguments for builder functions for the regions of the op was broken: It did not compute correctly the types (and locations) of the region arguments, which lead to failed assertions when the result types were different from the operand types. Specifically, it used the result types (and operand locations) for both regions, instead of the operand types (and locations) for the 'before' region and the result types (and loecations) for the 'after' region. Reviewed By: Mogball, mehdi_amini Differential Revision: https://reviews.llvm.org/D142952	2023-02-07 13:40:54 +00:00
Matthias Springer	325b58d59f	[mlir][cf] Print message in cf.assert to LLVM lowering The assert message was previously ignored. The lowered IR now calls `puts` it in case of a failed assertion. Differential Revision: https://reviews.llvm.org/D138647	2022-12-15 17:45:34 +01:00
Matthias Kramm	4e98d611ef	[mlir] Implement backward dataflow. This enables interprocedural lifeness analysis, very busy expression analysis, etc. Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D138935	2022-12-13 18:35:27 +01:00
Hanhan Wang	0f297cad4d	[mlir][tensor][linalg] Introduce DataLayoutPropagation pass. It introduces a pattern that swaps `linalg.generic + tensor.pack` to `tensor.pack + linalg.generic`. It requires all the iteration types being parallel; the indexing map of output operand is identiy. They can all be relaxed in the future. The user can decide whether the propagation should be applied or not by passing a control function. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D138882	2022-12-06 15:00:07 -08:00
Matthias Springer	c1fef4e88a	[mlir][bufferization] Make `TensorCopyInsertionPass` a test pass TensorCopyInsertion should not have been exposed as a pass. This was a flaw in the original design. It is a preparation step for bufferization and certain transforms (that would otherwise be legal) are illegal between TensorCopyInsertion and actual rewrite to MemRef ops. Therefore, even if broken down as two separate steps internally, they should be exposed as a single pass. This change affects the sparse compiler, which uses `TensorCopyInsertionPass`. A new `SparsificationAndBufferizationPass` is added to replace all passes in the sparse tensor pipeline from `TensorCopyInsertionPass` until the actual bufferization (rewrite to memref/non-tensor). It is generally unsafe to run arbitrary passes in-between, in particular passes that hoist tensor ops out of loops or change SSA use-def chains along tensor ops. Differential Revision: https://reviews.llvm.org/D138915	2022-12-02 15:38:02 +01:00
Nicolas Vasilache	6e92d3fead	[mlir][Test] Add a test pass to act as a sink towards LLVM conversion This allows writing simple e2e tests where we can check for the proper materialization of specific LLVM IR (e.g. `llvm.intr.fmuladd`). Differential Revision: https://reviews.llvm.org/D138776	2022-11-28 00:59:55 -08:00
River Riddle	8c66344ee9	[mlir:PDL] Add support for DialectConversion with pattern configurations Up until now PDL(L) has not supported dialect conversion because we had no way of remapping values or integrating with type conversions. This commit rectifies that by adding a new "pattern configuration" concept to PDL. This essentially allows for attaching external configurations to patterns, which can hook into pattern events (for now just the scope of a rewrite, but we could also pass configs to native rewrites as well). This allows for injecting the type converter into the conversion pattern rewriter. Differential Revision: https://reviews.llvm.org/D133142	2022-11-08 01:57:57 -08:00
Nicolas Vasilache	44cfea0279	[mlir][Linalg] Retire LinalgStrategyTilePass and filter-based pattern. Context: https://discourse.llvm.org/t/psa-retire-linalg-filter-based-patterns/63785 Uses of `LinalgTilingPattern::returningMatchAndRewrite` are replaced by a top-level `tileWithLinalgTilingOptions` function that is marked obsolete and serves as a temporary means to transition away from `LinalgTilingOptions`-based tiling. LinalgTilingOptions supports too many options that have been orthogonalized with the use of the transform dialect. Additionally, the revision introduces a `transform.structured.tile_to_scf_for` structured transform operation that is needed to properly tile `tensor.pad` via the TilingInterface. Uses of `transform.structured.tile` will be deprecated and replaced by this new op. This will achieve the deprecation of `linalg::tileLinalgOp`. Context: https://discourse.llvm.org/t/psa-retire-tileandfuselinalgops-method/63850 In the process of transitioning, tests that were performing tile and distribute on tensors are retired: transformations should be orthogonalized better in the future. In particular, tiling to specific loop types and tileAndDistribute behavior are not available via the transform ops. The behavior is still available as part of the `tileWithLinalgTilingOptions` method to allow downstream clients to transition without breakages but is meant to be retired soon. As more tests are ported to the transform dialect, it became necessary to introduce a test-transform-dialect-erase-schedule-pass to discard the transform specification once applied so that e2e lowering and execution is possible. Lastly, a number of redundant tests that were testing composition of patterns are retired as they are available with a better mechanism via the transform dialect. Differential Revision: https://reviews.llvm.org/D135573	2022-10-11 02:42:56 -07:00
Yuanqiang Liu	9f77909a5e	[mlir][shape] add outline-shape-computation pass Add outline-shape-computation pass. This pass his pass outlines the shape computation part in high level IR by adding shape.func and populate corresponding mapping information into ShapeMappingAnalysis. Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D131810	2022-10-02 20:24:49 -07:00
Jakub Kuderski	abc362a107	[mlir][arith] Change dialect name from Arithmetic to Arith Suggested by @lattner in https://discourse.llvm.org/t/rfc-define-precise-arith-semantics/65507/22. Tested with: `ninja check-mlir check-mlir-integration check-mlir-mlir-spirv-cpu-runner check-mlir-mlir-vulkan-runner check-mlir-examples` and `bazel build --config=generic_clang @llvm-project//mlir:all`. Reviewed By: lattner, Mogball, rriddle, jpienaar, mehdi_amini Differential Revision: https://reviews.llvm.org/D134762	2022-09-29 11:23:28 -04:00
Jakub Kuderski	242d558658	[mlir][arith] Add test pass for wide integer emulation The new test pass allows for running wide integer emulation conversion within specified functions only. I intend to use it in integration tests in a way that allows me print both original and emulated results in the same format, or even compare both results at runtime and print on mismatch only. Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D134120	2022-09-20 11:22:28 -04:00
Mathieu Fehr	ba8424a251	[mlir] Add Dynamic Dialects Dynamic dialects are dialects that can be defined at runtime. Dynamic dialects are extensible by new operations, types, and attributes at runtime. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D125201	2022-09-19 09:58:18 -07:00
Matthias Springer	31fbdab376	[mlir][transforms] Add topological sort analysis This change add a helper function for computing a topological sorting of a list of ops. E.g. this can be useful in transforms where a subset of ops should be cloned without dominance errors. The analysis reuses the existing implementation in TopologicalSortUtils.cpp. Differential Revision: https://reviews.llvm.org/D131669	2022-08-15 21:09:18 +02:00
Manish Gupta	14d79afeae	[mlir][NVGPU] nvgpu.mmasync on F32 through TF32 Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3. For mma.sync on f32 input using tensor cores there are two possibilites: (a) tf32 (1 `mma.sync` per warp-level matrix-multiply-accumulate) (b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate) Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations. Reviewed By: ThomasRaoux Differential Revision: https://reviews.llvm.org/D130294	2022-08-01 23:23:27 +00:00
srishti-cb	b508c5649f	[MLIR] Add a utility to sort the operands of commutative ops Added a commutativity utility pattern and a function to populate it. The pattern sorts the operands of an op in ascending order of the "key" associated with each operand iff the op is commutative. This sorting is stable. The function is intended to be used inside passes to simplify the matching of commutative operations. After the application of the above-mentioned pattern, since the commutative operands now have a deterministic order in which they occur in an op, the matching of large DAGs becomes much simpler, i.e., requires much less number of checks to be written by a user in her/his pattern matching function. The "key" associated with an operand is the list of the "AncestorKeys" associated with the ancestors of this operand, in a breadth-first order. The operand of any op is produced by a set of ops and block arguments. Each of these ops and block arguments is called an "ancestor" of this operand. Now, the "AncestorKey" associated with: 1. A block argument is `{type: BLOCK_ARGUMENT, opName: ""}`. 2. A non-constant-like op, for example, `arith.addi`, is `{type: NON_CONSTANT_OP, opName: "arith.addi"}`. 3. A constant-like op, for example, `arith.constant`, is `{type: CONSTANT_OP, opName: "arith.constant"}`. So, if an operand, say `A`, was produced as follows: ``` `<block argument>` `<block argument>` \ / \ / `arith.subi` `arith.constant` \ / `arith.addi` \| returns `A` ``` Then, the block arguments and operations present in the backward slice of `A`, in the breadth-first order are: `arith.addi`, `arith.subi`, `arith.constant`, `<block argument>`, and `<block argument>`. Thus, the "key" associated with operand `A` is: ``` { {type: NON_CONSTANT_OP, opName: "arith.addi"}, {type: NON_CONSTANT_OP, opName: "arith.subi"}, {type: CONSTANT_OP, opName: "arith.constant"}, {type: BLOCK_ARGUMENT, opName: ""}, {type: BLOCK_ARGUMENT, opName: ""} } ``` Now, if "keyA" is the key associated with operand `A` and "keyB" is the key associated with operand `B`, then: "keyA" < "keyB" iff: 1. In the first unequal pair of corresponding AncestorKeys, the AncestorKey in operand `A` is smaller, or, 2. Both the AncestorKeys in every pair are the same and the size of operand `A`'s "key" is smaller. AncestorKeys of type `BLOCK_ARGUMENT` are considered the smallest, those of type `CONSTANT_OP`, the largest, and `NON_CONSTANT_OP` types come in between. Within the types `NON_CONSTANT_OP` and `CONSTANT_OP`, the smaller ones are the ones with smaller op names (lexicographically). --- Some examples of such a sorting: Assume that the sorting is being applied to `foo.commutative`, which is a commutative op. Example 1: > %1 = foo.const 0 > %2 = foo.mul <block argument>, <block argument> > %3 = foo.commutative %1, %2 Here, 1. The key associated with %1 is: ``` { {CONSTANT_OP, "foo.const"} } ``` 2. The key associated with %2 is: ``` { {NON_CONSTANT_OP, "foo.mul"}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""} } ``` The key of %2 < the key of %1 Thus, the sorted `foo.commutative` is: > %3 = foo.commutative %2, %1 Example 2: > %1 = foo.const 0 > %2 = foo.mul <block argument>, <block argument> > %3 = foo.mul %2, %1 > %4 = foo.add %2, %1 > %5 = foo.commutative %1, %2, %3, %4 Here, 1. The key associated with %1 is: ``` { {CONSTANT_OP, "foo.const"} } ``` 2. The key associated with %2 is: ``` { {NON_CONSTANT_OP, "foo.mul"}, {BLOCK_ARGUMENT, ""} } ``` 3. The key associated with %3 is: ``` { {NON_CONSTANT_OP, "foo.mul"}, {NON_CONSTANT_OP, "foo.mul"}, {CONSTANT_OP, "foo.const"}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""} } ``` 4. The key associated with %4 is: ``` { {NON_CONSTANT_OP, "foo.add"}, {NON_CONSTANT_OP, "foo.mul"}, {CONSTANT_OP, "foo.const"}, {BLOCK_ARGUMENT, ""}, {BLOCK_ARGUMENT, ""} } ``` Thus, the sorted `foo.commutative` is: > %5 = foo.commutative %4, %3, %2, %1 Signed-off-by: Srishti Srivastava <srishti.srivastava@polymagelabs.com> Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D124750	2022-07-30 19:25:18 -04:00
Mahesh Ravishankar	485190df95	[mlir][Linalg] Deprecate `tileAndFuseLinalgOps` method and associated patterns. The `tileAndFuseLinalgOps` is a legacy approach for tiling + fusion of Linalg operations. Since it was also intended to work on operations with buffer operands, this method had fairly complex logic to make sure tile and fuse was correct even with side-effecting linalg ops. While complex, it still wasnt robust enough. This patch deprecates this method and thereby deprecating the tiling + fusion method for ops with buffer semantics. Note that the core transformation to do fusion of a producer with a tiled consumer still exists. The deprecation here only removes methods that auto-magically tried to tile and fuse correctly in presence of side-effects. The `tileAndFuseLinalgOps` also works with operations with tensor semantics. There are at least two other ways the same functionality exists. 1) The `tileConsumerAndFuseProducers` method. This does a similar transformation, but using a slightly different logic to automatically figure out the legal tile + fuse code. Note that this is also to be deprecated soon. 2) The prefered way uses the `TilingInterface` for tile + fuse, and relies on the caller to set the tiling options correctly to ensure that the generated code is correct. As proof that (2) is equivalent to the functionality provided by `tileAndFuseLinalgOps`, relevant tests have been moved to use the interface, where the test driver sets the tile sizes appropriately to generate the expected code. Differential Revision: https://reviews.llvm.org/D129901	2022-07-21 05:05:06 +00:00
Mahesh Ravishankar	3139cc766c	[mlir][Linalg] Add a pattern to decompose `linalg.generic` ops. This patch adds a pattern to decompose a `linalg.generic` operations that - has only parallel iterator types - has more than 2 statements (including the yield) into multiple `linalg.generic` operation such that each operation has a single statement and a yield. The pattern added here just splits the matching `linalg.generic` into two `linalg.generic`s, one containing the first statement, and the other containing the remaining. The same pattern can be applied repeatedly on the second op to ultimately fully decompose the generic op. Differential Revision: https://reviews.llvm.org/D129704	2022-07-15 23:01:18 +00:00
Nicolas Vasilache	cd6e02eebc	[mlir][Linalg] Retire TestLinalgCodegenStrategy pass. This pass tests patterns that are already tested elsewhere by applying them in a semi-targeted fashion using anchor function and op names. From now on, targeted tests should use the transform dialect interpreter. Differential Revision: https://reviews.llvm.org/D129627	2022-07-13 04:20:42 -07:00

1 2 3 4 5 ...

274 Commits