clang-p2996

Author	SHA1	Message	Date
Chuanqi Xu	c75cedc237	[Coroutines] Set presplit attribute in Clang and mlir This fixes bug49264. Simply, coroutine shouldn't be inlined before CoroSplit. And the marker for pre-splited coroutine is created in CoroEarly pass, which ran after AlwaysInliner Pass in O0 pipeline. So that the AlwaysInliner couldn't detect it shouldn't inline a coroutine. So here is the error. This patch set the presplit attribute in clang and mlir. So the inliner would always detect the attribute before splitting. Reviewed By: rjmccall, ezhulenev Differential Revision: https://reviews.llvm.org/D115790	2022-01-05 10:25:02 +08:00
Mehdi Amini	1fc096af1e	Apply clang-tidy fixes for performance-unnecessary-value-param to MLIR (NFC) Reviewed By: Mogball Differential Revision: https://reviews.llvm.org/D116250	2022-01-02 01:45:18 +00:00
Mehdi Amini	89de9cc8a7	Apply clang-tidy fixes for performance-for-range-copy to MLIR (NFC) Differential Revision: https://reviews.llvm.org/D116248	2022-01-02 01:13:42 +00:00
Jacques Pienaar	c0342a2de8	[mlir] Switching accessors to prefixed form (NFC) Makes eventual prefixing flag flip smaller change.	2021-12-20 08:03:43 -08:00
bakhtiyar	ec0e4545ca	Make AsyncParallelForRewrite parameterizable with a cost model which drives deciding the parallelization granularity. Reviewed By: ezhulenev, mehdi_amini Differential Revision: https://reviews.llvm.org/D115423	2021-12-19 08:41:01 -08:00
Eugene Zhulenev	49ce40e9ab	[mlir] AsyncParallelFor: align block size to be a multiple of inner loops iterations Depends On D115263 By aligning block size to inner loop iterations parallel_compute_fn LLVM can later unroll and vectorize some of the inner loops with small number of trip counts. Up to 2x speedup in multiple benchmarks. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D115436	2021-12-09 06:50:50 -08:00
Eugene Zhulenev	9f151b784b	[mlir] AsyncParallelFor: sink constants into the parallel compute function With complex recursive structure of async dispatch function LLVM can't always propagate constants to the parallel_compute_fn and it often prevents optimizations like loop unrolling and vectorization. We help LLVM by pushing known constants into the parallel_compute_fn explicitly. Reviewed By: bkramer Differential Revision: https://reviews.llvm.org/D115263	2021-12-09 06:48:23 -08:00
Mehdi Amini	be0a7e9f27	Adjust "end namespace" comment in MLIR to match new agree'd coding style See D115115 and this mailing list discussion: https://lists.llvm.org/pipermail/llvm-dev/2021-December/154199.html Differential Revision: https://reviews.llvm.org/D115309	2021-12-08 06:05:26 +00:00
Eugene Zhulenev	68a7c001ad	[mlir] Improve async parallel for tests + fix typos Do load and store to verify that we process each element of the iteration space once. Reviewed By: cota Differential Revision: https://reviews.llvm.org/D115152	2021-12-06 13:27:54 -08:00
bakhtiyar	7bd87a03fd	Promote readability by factoring out creation of min/max operation. Remove unnecessary divisions. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D110680	2021-11-24 16:17:23 -08:00
Jacques Pienaar	cfb72fd3a0	[mlir] Switch arith, llvm, std & shape dialects to accessors prefixed both form. Following https://llvm.discourse.group/t/psa-ods-generated-accessors-will-change-to-have-a-get-prefix-update-you-apis/4476, this follows flipping these dialects to _Both prefixed form. This changes the accessors to have a prefix. This was possibly mostly without breaking breaking changes if the existing convenience methods were used. (https://github.com/jpienaar/llvm-project/blob/main/clang-tools-extra/clang-tidy/misc/AddGetterCheck.cpp was used to migrate the callers post flipping, using the output from Operator.cpp) Differential Revision: https://reviews.llvm.org/D112383	2021-10-24 18:36:33 -07:00
Mogball	cb3aa49ec0	[MLIR][arith] fix references to std.constant in comments Reviewed By: jpienaar Differential Revision: https://reviews.llvm.org/D111820	2021-10-14 20:38:47 +00:00
Mogball	a54f4eae0e	[MLIR] Replace std ops with arith dialect ops Precursor: https://reviews.llvm.org/D110200 Removed redundant ops from the standard dialect that were moved to the `arith` or `math` dialects. Renamed all instances of operations in the codebase and in tests. Reviewed By: rriddle, jpienaar Differential Revision: https://reviews.llvm.org/D110797	2021-10-13 03:07:03 +00:00
bakhtiyar	bdde959533	Remove unnecessary async group creates and awaits. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D110605	2021-09-28 14:52:08 -07:00
bakhtiyar	55dfab39a2	Rename target block size to min task size for clarity. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D110604	2021-09-28 14:51:55 -07:00
Eugene Zhulenev	92db09cde0	[mlir] AsyncRuntime: use int64_t for ref counting operations Workaround for SystemZ ABI problem: https://bugs.llvm.org/show_bug.cgi?id=51898 Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D110550	2021-09-27 07:55:01 -07:00
River Riddle	b54c724be0	[mlir:OpConversionPattern] Add overloads for taking an Adaptor instead of ArrayRef This has been a TODO for a long time, and it brings about many advantages (namely nice accessors, and less fragile code). The existing overloads that accept ArrayRef are now treated as deprecated and will be removed in a followup (after a small grace period). Most of the upstream MLIR usages have been fixed by this commit, the rest will be handled in a followup. Differential Revision: https://reviews.llvm.org/D110293	2021-09-24 17:51:41 +00:00
Eugene Zhulenev	fd52b4357a	[mlir] Async: check awaited operand error state after sync await Previously only await inside the async function (coroutine after lowering to async runtime) would check the error state Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D109229	2021-09-04 05:00:17 -07:00
Eugene Zhulenev	b537c5b414	[mlir] Async: clone constants into async.execute functions and parallel compute functions Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D107007	2021-08-02 12:17:41 -07:00
bakhtiyar	1c144410e7	Refactor AsyncToAsyncRuntime pass to boost understandability. Depends On D106730 Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D106731	2021-07-29 12:01:07 -07:00
bakhtiyar	9a5bc83660	Add an escape-hatch for conversion of funcs with blocking awaits to coroutines. Currently TFRT does not support top-level coroutines, so this functionality will allow to have a single blocking await at the top level until TFRT implements the necessary functionality. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D106730	2021-07-29 08:52:28 -07:00
bakhtiyar	6ea22d4626	Optionally eliminate blocking runtime.await calls by converting functions to coroutines. Interop parallelism requires needs awaiting on results. Blocking awaits are bad for performance. TFRT supports lightweight resumption on threads, and coroutines are an abstraction than can be used to lower the kernels onto TFRT threads. Reviewed By: ezhulenev Differential Revision: https://reviews.llvm.org/D106508	2021-07-28 12:37:05 -07:00
Eugene Zhulenev	de7a4e53a2	[mlir] Async: lower SCF operations into CFG inside coroutines Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D106747	2021-07-24 14:36:26 -07:00
Eugene Zhulenev	6c1f655818	[mlir] Async: special handling for parallel loops with zero iterations Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D106590	2021-07-23 01:22:59 -07:00
Benjamin Kramer	ce857d3cfd	[mlir][async] Remove unused variable. NFC.	2021-07-01 12:24:55 +02:00
Eugene Zhulenev	c1194c2ec3	[mlir:Async] Change async-parallel-for block size/count calculation Depends On D105037 Avoid creating too many tasks when the number of workers is large. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D105126	2021-06-29 12:57:11 -07:00
Eugene Zhulenev	f57b2420b2	[mlir:Async] Add an async reference counting pass based on the user defined policy Depends On D104999 Automatic reference counting based on the liveness analysis can add a lot of reference counting overhead at runtime. If the IR is known to be constrained to few particular "shapes", it's much more efficient to provide a custom reference counting policy that will specify where it is required to update the async value reference count. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D105037	2021-06-29 12:53:09 -07:00
Eugene Zhulenev	9ccdaac8f9	[mlir:Async] Fix a bug in automatic refence counting around function calls Depends On D104998 Function calls "transfer ownership" to the callee and it puts additional constraints on the reference counting optimization pass Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D104999	2021-06-29 09:35:43 -07:00
Eugene Zhulenev	a8f819c6d8	[mlir:Async] Remove async operations if it is statically known that the parallel operation has a single compute block Depends On D104850 Add a test that verifies that canonicalization removes all async overheads if it is statically known that the scf.parallel operation will be computed using a single block. Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D104891	2021-06-29 09:26:28 -07:00
Eugene Zhulenev	34a164c938	[mlir:Async] Submit accidentally omitted changes Accidentally pushed old branches that did not include all the changes discussed in the PRs. https://reviews.llvm.org/rGd43b23608ad664f02f56e965ca78916bde220950 https://reviews.llvm.org/rG86ad0af87054c3cccd68d32e103a6f1f6c6194c7 Differential Revision: https://reviews.llvm.org/D104943	2021-06-25 12:23:02 -07:00
Eugene Zhulenev	86ad0af870	[mlir:Async] Implement recursive async work splitting for scf.parallel operation (async-parallel-for pass) Depends On D104780 Recursive work splitting instead of sequential async tasks submission gives ~20%-30% speedup in microbenchmarks. Algorithm outline: 1. Collapse scf.parallel dimensions into a single dimension 2. Compute the block size for the parallel operations from the 1d problem size 3. Launch parallel tasks 4. Each parallel task reconstructs its own bounds in the original multi-dimensional iteration space 5. Each parallel task computes the original parallel operation body using scf.for loop nest Reviewed By: herhut Differential Revision: https://reviews.llvm.org/D104850	2021-06-25 10:34:39 -07:00
Eugene Zhulenev	d43b23608a	[mlir:Async] Add the size parameter to the async.group Specify the `!async.group` size (the number of tokens that will be added to it) at construction time. `async.await_all` operation can potentially race with `async.execute` operations that keep updating the group, for this reason it is required to know upfront how many tokens will be added to the group. Reviewed By: ftynse, herhut Differential Revision: https://reviews.llvm.org/D104780	2021-06-25 10:26:50 -07:00
Eugene Zhulenev	8f23fac4da	[mlir:Async] Convert assertions to async errors only inside async functions Differential Revision: https://reviews.llvm.org/D103278	2021-05-27 12:49:00 -07:00
Eugene Zhulenev	9136b7d075	[mlir] AsyncRefCounting: check that LivenessBlockInfo is not nullptr Differential Revision: https://reviews.llvm.org/D103270	2021-05-27 10:54:21 -07:00
Eugene Zhulenev	d8c84d2a4e	[mlir] Async: Add error propagation support to async groups Depends On D103109 If any of the tokens/values added to the `!async.group` switches to the error state, than the group itself switches to the error state. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D103203	2021-05-27 09:35:11 -07:00
Eugene Zhulenev	39957aa424	[mlir] Add error state and error propagation to async runtime values Depends On D103102 Not yet implemented: 1. Error handling after synchronous await 2. Error handling for async groups Will be addressed in the followup PRs Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D103109	2021-05-27 09:28:47 -07:00
Eugene Zhulenev	c412979cde	[mlir] Async reference counting for block successors with divergent reference counted liveness Support reference counted values implicitly passed (live) only to some of the successors. Example: if branched to ^bb2 token will leak, unless `drop_ref` operation is properly created ``` ^entry: %token = async.runtime.create : !async.token cond_br %cond, ^bb1, ^bb2 ^bb1: async.runtime.await %token async.runtime.drop_ref %token br ^bb2 ^bb2: return ``` Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D103102	2021-05-27 09:21:59 -07:00
Nico Weber	297a5b7cbc	[mlir] hopefully final round of iwyu fixes after `ba7a92c01e`	2021-04-21 11:03:06 -04:00
River Riddle	4efb7754e0	[mlir][NFC] Add a using directive for llvm::SetVector Differential Revision: https://reviews.llvm.org/D100436	2021-04-15 16:09:34 -07:00
Eugene Zhulenev	8a316b00d6	[mlir] Convert async dialect passes from function passes to op agnostic passes Differential Revision: https://reviews.llvm.org/D100401	2021-04-13 11:46:00 -07:00
Eugene Zhulenev	a6628e596e	[mlir] Async: add automatic reference counting at async.runtime operations level Depends On D95311 Previous automatic-ref-counting pass worked with high level async operations (e.g. async.execute), however async values reference counting is a runtime implementation detail. New pass mostly relies on the save liveness analysis to place drop_ref operations, and does better verification of CFG with different liveIn sets in block successors. This is almost NFC change. No new reference counting ideas, just a cleanup of the previous version. Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95390	2021-04-12 18:54:55 -07:00
Mehdi Amini	973ddb7d6e	Define a `NoTerminator` traits that allows operations with a single block region to not provide a terminator In particular for Graph Regions, the terminator needs is just a historical artifact of the generalization of MLIR from CFG region. Operations like Module don't need a terminator, and before Module migrated to be an operation with region there wasn't any needed. To validate the feature, the ModuleOp is migrated to use this trait and the ModuleTerminator operation is deleted. This patch is likely to break clients, if you're in this case: - you may iterate on a ModuleOp with `getBody()->without_terminator()`, the solution is simple: just remove the ->without_terminator! - you created a builder with `Builder::atBlockTerminator(module_body)`, just use `Builder::atBlockEnd(module_body)` instead. - you were handling ModuleTerminator: it isn't needed anymore. - for generic code, a `Block::mayNotHaveTerminator()` may be used. Differential Revision: https://reviews.llvm.org/D98468	2021-03-25 03:59:03 +00:00
Chris Lattner	dc4e913be9	[PatternMatch] Big mechanical rename OwningRewritePatternList -> RewritePatternSet and insert -> add. NFC This doesn't change APIs, this just cleans up the many in-tree uses of these names to use the new preferred names. We'll keep the old names around for a couple weeks to help transitions. Differential Revision: https://reviews.llvm.org/D99127	2021-03-22 17:20:50 -07:00
Chris Lattner	3a506b31a3	Change OwningRewritePatternList to carry an MLIRContext with it. This updates the codebase to pass the context when creating an instance of OwningRewritePatternList, and starts removing extraneous MLIRContext parameters. There are many many more to be removed. Differential Revision: https://reviews.llvm.org/D99028	2021-03-21 10:06:31 -07:00
Eugene Zhulenev	25f80e16d1	[mlir] Async: add a separate pass to lower from async to async.coro and async.runtime Depends On D95000 Move async.execute outlining and async -> async.runtime lowering into the separate Async transformation pass Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D95311	2021-01-26 03:33:20 -08:00
Eugene Zhulenev	9c53b8e52e	[mlir:Async] Add intermediate async.coro and async.runtime operations to simplify Async to LLVM lowering [NFC] No new functionality, mostly a cleanup and one more abstraction level between Async and LLVM IR. Instead of lowering from Async to LLVM coroutines and Async Runtime API in one shot, do it progressively via async.coro and async.runtime operations. 1. Lower from async to async.runtime/coro (e.g. async.execute to function with coro setup and runtime calls) 2. Lower from async.runtime/coro to LLVM intrinsics and runtime API calls Intermediate coro/runtime operations will allow to run transformations on a higher level IR and do not try to match IR based on the LLVM::CallOp properties. Although async.coro is very close to LLVM coroutines, it is not exactly the same API, instead it is optimized for usability in async lowering, and misses a lot of details that are present in @llvm.coro intrinsic. Reviewed By: ftynse Differential Revision: https://reviews.llvm.org/D94923	2021-01-25 14:04:33 -08:00
Kazuaki Ishizaki	f88fab5006	[mlir] NFC: fix trivial typos fix typo under include and lib directories Reviewed By: antiagainst Differential Revision: https://reviews.llvm.org/D94220	2021-01-08 02:10:12 +09:00
River Riddle	1b97cdf885	[mlir][IR][NFC] Move context/location parameters of builtin Type::get methods to the start of the parameter list This better matches the rest of the infrastructure, is much simpler, and makes it easier to move these types to being declaratively specified. Differential Revision: https://reviews.llvm.org/D93432	2020-12-17 13:01:36 -08:00
Eugene Zhulenev	94e645f9cc	[mlir] Async: Add numWorkerThreads argument to createAsyncParallelForPass Add an option to pass the number of worker threads to select the number of async regions for parallel for transformation. ``` std::unique_ptr<OperationPass<FuncOp>> createAsyncParallelForPass(int numWorkerThreads); ``` Reviewed By: mehdi_amini Differential Revision: https://reviews.llvm.org/D92835	2020-12-08 10:30:14 -08:00
Christian Sigg	c4a0405902	Add `Operation* OpState::operator->()` to provide more convenient access to members of Operation. Given that OpState already implicit converts to Operator*, this seems reasonable. The alternative would be to add more functions to OpState which forward to Operation. Reviewed By: rriddle, ftynse Differential Revision: https://reviews.llvm.org/D92266	2020-12-02 15:46:20 +01:00

1 2

52 Commits