clang-p2996

Author	SHA1	Message	Date
Johannes Doerfert	e87f10a771	[Attributor] CGSCC pass should not recompute results outside the SCC (reapply) When we run the CGSCC pass we should only invest time on the SCC. We can initialize AAs with information from the module slice but we should not update those AAs. We make an exception for are call site of the SCC as they are helpful providing information for the SCC. Minor modifications to pointer privatization allow us to perform it even in the CGSCC pass, similar to ArgumentPromotion.	2022-04-17 12:48:49 -05:00
Johannes Doerfert	39a68cc016	Revert "[Attributor] CGSCC pass should not recompute results outside the SCC" This reverts commit `0d7f81e313`, it caused the AMDGPU tests that use the Attributor to fail.	2022-04-15 15:29:51 -05:00
Johannes Doerfert	0d7f81e313	[Attributor] CGSCC pass should not recompute results outside the SCC When we run the CGSCC pass we should only invest time on the SCC. We can initialize AAs with information from the module slice but we should not update those AAs.	2022-04-15 14:56:09 -05:00
Johannes Doerfert	af30de7788	[Attributor] Introduce AAInstanceInfo The Attributor, as many other parts in LLVM, uses pointer equivalence for `llvm::Value`s. This only works as long as `llvm::Value`s are dynamically unique, or, to be exact, we will never end up with the same `llvm::Value` representing two dynamic instances. We already provided a helper to check the former, namely `AA::isDynamicallyUnique`, however we could not check the latter. In this patch we move the logic into a separate AA which helps with the growing complexity and use cases. We also extend the interface to answer the second question rather than the first. So we do not determine dynamically uniqueness but if we might end up with the `llvm::Value` describing a different dynamic instance. Note that the latter is very much tied to the Attributor capabilities to look through memory, recursion, etc. so we need to update the logic as we go.	2022-04-05 23:07:13 -05:00
Johannes Doerfert	857bf306d7	[Attributor] Remove broken and duplicated load simplification We look through loads in the "generic value traversal" and we consequently don't need to look through them again in AAValueSimplify*. The test changes stem from the fact that we allowed any simplified value, incl. non-dynamically unique ones, as long as the underlying memory was an alloca. This doesn't seem to make sense as allocas do not protect against dynamically non-unique values. We need to make the unique check better rather than excluding allocas. That in mind, we can remove a lot of code by simply relying on the generic value traversal load look through. To soften the blow some minor adjustments have been made that allow more simplification through the now used scheme and some tests have been given a `norecurse` for now.	2022-04-05 20:49:03 -05:00
Augie Fackler	e90bce8f91	CallBase: fix getFnAttr so it also checks the function Prior to this change, CallBase::hasFnAttr checked the called function to see if it had an attribute if it wasn't set on the CallBase, but getFnAttr didn't do the same delegation, which led to very confusing behavior. This patch fixes the issue by making CallBase::getFnAttr also check the function under the same circumstances. Test changes look (to me) like they're cleaning up redundant attributes which no longer get specified both on the callee and call. We also clean up the one ad-hoc implementation of this getter over in InlineCost.cpp. Differential Revision: https://reviews.llvm.org/D122821	2022-04-03 23:19:23 -04:00
Johannes Doerfert	a81fff8afd	Reapply "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `c5f789050d` and reapplies `7aea3ea8c3` with additional test changes.	2022-03-25 09:36:50 -05:00
Johannes Doerfert	c5f789050d	Revert "[Intrinsics] Add `nocallback` to the default intrinsic attributes" This reverts commit `7aea3ea8c3` as it breaks the buildbots. I didn't see these failures in the pre-merge checks, looking into it.	2022-03-24 14:04:41 -05:00
Johannes Doerfert	7aea3ea8c3	[Intrinsics] Add `nocallback` to the default intrinsic attributes Most intrinsics, especially "default" ones, will not call back into the IR module. `nocallback` encodes this nicely. As it was not used before, this patch also makes use of `nocallback` in the Attributor which results in many more `norecurse` deductions. Tablegen part is mechanical, test updates by script. Differential Revision: https://reviews.llvm.org/D118680	2022-03-24 13:50:54 -05:00
Johannes Doerfert	ee94a4a3d0	[Attributor][FIX] Avoid endless recursion, simple case There is potential for endless recursion if we try to determine the underlying objects of a load, just to end up with the load as underlying object. A proper solution will require us to pass a visited set around. This will happen as we cleanup genericValueTraversal soon.	2022-03-23 15:55:32 -05:00
Johannes Doerfert	4166738c38	[OpenMP][FIX] Do not crash when kernels are debug wrapper functions With debug information enabled (-g) Clang will wrap the actual target region into a new function which is called from the "kernel". The problem is that the "kernel" is now basically a wrapper without all the things we expect. More importantly, if we end up asking for an AAKernelInfo for the "target region function" we might try to turn it into SPMD mode. That used to cause an assertion as that function doesn't have an appropriately named `_exec_mode` global. While the global is going away soon we still need to make sure to properly handle this case, e.g., perform optimizations reliably. Differential Revision: https://reviews.llvm.org/D122043	2022-03-19 14:15:55 -05:00
Nikita Popov	875782bd9e	[OpenMPOpt] Avoid pointer element type access during region merging Hardcode the function type as ParallelTask, which is the guaranteed pointee type of this runtime function argument (if pointee types exist). The elimination of the callee bitcast is left for InstCombine. Differential Revision: https://reviews.llvm.org/D120885	2022-03-15 09:52:46 +01:00
Johannes Doerfert	e92891f864	[Attributor] Allow not to default initialize AAs for live internal functions Outside users of the Attributor, e.g., OpenMP-opt, want to seed AAs themselves. We should not seed all default AAs one an internal function becomes live. That said, there should be a callback such that they can do lazy seeding as well. Differential Revision: https://reviews.llvm.org/D121489	2022-03-11 16:46:03 -06:00
Johannes Doerfert	5af11ec34b	[Attributor] Determine potentially loaded values through memory We already look through memory to determine where a value that is stored might pop up again (potential copies). This patch introduces the other direction with similar logic. If a value is loaded, we can follow all the accesses to the pointer (or better object) and try to determine what value might have been stored.	2022-03-06 23:26:37 -06:00
Johannes Doerfert	eb73af4af4	[Attributor] Handle undef and null in AAAlignFloating Both `undef` and `nullptr` are maximally aligned. This is especially important as we often see `undef` until a proper value has been identified during simplification.	2022-03-06 23:26:22 -06:00
Johannes Doerfert	192a34ddb0	[Attributor][OpenMPOpt][FIX] Register simplification callbacks Heap-2-stack and heap-2-shared can replace an allocation call with something else. To avoid us deriving information from the allocator implementation we register a simplification callback now that will force us to stop at the call site. We probably should create the replacement memory eagerly and return that instead though.	2022-03-06 21:28:38 -06:00
Augie Fackler	e1895a46dc	OpenMP: add allocsize(0) attribute to __kmpc_alloc_shared This is the second step in obviating two columns about allocation functions in MemoryBuiltins.cpp. Differential Revision: https://reviews.llvm.org/D119583	2022-03-04 16:26:03 -05:00
Johannes Doerfert	f9c2d6005e	[OpenMP][FIX] Ensure custom state machine works The custom state machine had a check for surplus threads that filtered the main thread if the kernel was executed by a single warp only. We now first check for the main thread, then for surplus threads, avoiding to filter the former out. Fixes #54214. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D121011	2022-03-04 13:51:19 -05:00
Joseph Huber	0136a4401f	[OpenMP] Add an option to limit shared memory usage in OpenMPOpt One of the optimizations performed in OpenMPOpt pushes globalized variables to static shared memory. This is preferable to keeping the runtime call in all cases, however if too many variables are pushed to hared memory the kernel will crash. Since this is an optimization and not something the user specified explicitly, there should be an option to limit this optimization in those cases. This path introduces the `-openmp-opt-shared-limit=` option to limit the amount of bytes that will be placed in shared memory from HeapToShared. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120079	2022-02-18 08:35:26 -05:00
Joseph Huber	74cacf212b	[OpenMP] Add RTL function to externalization RAII This patch adds the '_kmpc_get_hardware_num_threads_in_block' OpenMP RTL function to the externalization RAII struct. This was getting optimized out and then being replaced with an undefined value once added back in, causing bugs for complex reductions. Fixes #53909. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D120076	2022-02-17 14:30:58 -05:00
Johannes Doerfert	8ad39fbaf2	[Attributor][FIX] Heap2Stack needs to use the alloca AS When we move an allocation from the heap to the stack we need to allocate it in the alloca AS and then cast the result. This also prevents us from inserting the alloca after the allocation call but rather right before. Fixes https://github.com/llvm/llvm-project/issues/53858	2022-02-16 15:58:32 -06:00
Johannes Doerfert	6ed1ef0643	[Attributor][FIX] Pipe UsedAssumedInformation through more interfaces `UsedAssumedInformation` is a return argument utilized to determine what information is known. Most APIs used it already but `genericValueTraversal` did not. This adds it to `genericValueTraversal` and replaces `AllCallSitesKnown` of `checkForAllCallSites` with the commonly used `UsedAssumedInformation`. This was supposed to be a NFC commit, then the test change appeared. Turns out, we had one user of `AllCallSitesKnown` (AANoReturn) and the way we set `AllCallSitesKnown` was wrong as we ignored the fact some call sites were optimistically assumed dead. Included a dedicated test for this as well now. Fixes https://github.com/llvm/llvm-project/issues/53884	2022-02-16 14:44:20 -06:00
Johannes Doerfert	ede248e614	[OpenMP][FIX] The `llvm.amdgcn.s.barrier` is actually not aligned If we assume `llvm.amdgcn.s.barrier` is aligned we may remove it and cause OpenMP GPU applications on the AMD GPU to be stuck or wrongly synchronized. Reported by Carlo Bertolli.	2022-02-11 12:42:50 -06:00
Johannes Doerfert	d1387a26a5	[Attributor][FIX] Reachability needs to account for readonly callees The oversight caused us to ignore call sites that are effectively dead when we computed reachability (or more precise the call edges of a function). The problem is that loads in the readonly callee might depend on stores prior to the callee. If we do not track the call edge we mistakenly assumed the store before the call cannot reach the load. The problem is nicely visible in: `llvm/test/Transforms/Attributor/ArgumentPromotion/basictest.ll` Caused by D118673. Fixes https://github.com/llvm/llvm-project/issues/53726	2022-02-10 13:52:24 -06:00
Joseph Huber	6b78526b1b	[OpenMP] Emit remark on the captured call instead of the variable Changes the remark to emit on the function call that captures the globalized variable instead of the globalized variable itself. The user should be able to see which variable it was in the argument list of the function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106980	2022-02-04 17:50:53 -05:00
Johannes Doerfert	ac3ec22df9	[Attributor] Use AAFunctionReachability to determine AANoRecurse We missed out on AANoRecurse in the module pass because we had no call graph. With AAFunctionReachability we can simply ask if the function may reach itself. Differential Revision: https://reviews.llvm.org/D110099	2022-02-01 01:40:44 -06:00
Johannes Doerfert	a1db0e523d	[Attributor][FIX] Liveness handling in the isAssumedDead helpers This fixes a conceptual problem with our AAIsDead usage which conflated call site liveness with call site return value liveness. Without the fix tests would obviously miscompile as we make genericValueTraversal more powerful (in a follow up). The effects on the tests are mixed but mostly marginal. The most prominent one is the lack of `noreturn` for functions. The reason is that we make entire blocks live at the same time (for time reasons). Now that we actually look at the block liveness, which we need to do, the return instructions are live and will survive. As an example, `noreturn_async.ll` has been modified to retain the `noreturn` even with block granularity. We could address this easily but there is little need in practice.	2022-02-01 01:18:52 -06:00
Johannes Doerfert	3c8a4c6f47	[OpenMP] Eliminate redundant barriers in the same block Patch originally by Giorgis Georgakoudis (@ggeorgakoudis), typos and bugs introduced later by me. This patch allows us to remove redundant barriers if they are part of a "consecutive" pair of barriers in a basic block with no impacted memory effect (read or write) in-between them. Memory accesses to local (=thread private) or constant memory are allowed to appear. Technically we could also allow any other memory that is not used to share information between threads, e.g., the result of a malloc that is also not captured. However, it will be easier to do more reasoning once the code is put into an AA. That will also allow us to look through phis/selects reasonably. At that point we should also deal with calls, barriers in different blocks, and other complexities. Differential Revision: https://reviews.llvm.org/D118002	2022-02-01 01:07:50 -06:00
Johannes Doerfert	989674f110	[OpenMP] Ensure to remove noinline from all runtime functions eventually We used to remove noinline from known OpenMP runtime functions (which are declared in OMPKinds.td). Now we remove noinline from all functions with the proper prefixes: __kmpc, _ZN4_OMP (= namespace omp), omp_	2022-02-01 01:07:50 -06:00
Giorgis Georgakoudis	7cb4c26173	[OMPIRBuilder] Generate aggregate argument for parallel region outlined functions Summary: This patch modifies code generation in OpenMPIRBuilder to pass arguments to the parallel region outlined function in an aggregate (struct), besides the global_tid and bound_tid arguments. It depends on the updated CodeExtractor (see D96854) for support. It mirrors functionality of Clang codegen (see D102107). Differential Revision: https://reviews.llvm.org/D110114	2022-01-25 20:53:45 -05:00
Joseph Huber	5eb49009eb	[OpenMP] Add more identifier to created shared globals Currenly we push some variables to a global constant containing shared memory as an optimization. This generated constant had internal linkage and should not have collided with any known identifiers in the translation unit. However, there have been observed cases of this optimiztaion unintentionally colliding with undocumented PTX identifiers. This patch adds a suffix to the created globals to hopefully bypass this. Depends on D118059 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D118068	2022-01-24 20:37:54 -05:00
Joseph Huber	06cfdd5224	[OpenMP][Fix] Properly inherit calling convention Previously in OpenMPOpt we did not correctly inherit the calling convention of the callee when creating new OpenMP runtime calls. This created issues when the calling convention was changed during `GlobalOpt` but a new call was creating without the correct calling convention. This lead to the call being replaced with a poison value in `InstCombine` due to undefined behaviour and causing large portions of the program to be incorrectly eliminated. This patch correctly inherits the existing calling convention from the callee. Reviewed By: tianshilei1992, jdoerfert Differential Revision: https://reviews.llvm.org/D118059	2022-01-24 20:37:52 -05:00
Johannes Doerfert	b4a7559844	[OpenMP][FIX] Replace ICVs only with values valid at the getter position While we might know the value if an ICV at a getter position it is not always clear that we can simply use it. Verify the value is valid first to avoid invalid IR. Fixes #53300.	2022-01-19 18:40:13 -06:00
Johannes Doerfert	944aa0421c	Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t" This reverts commit `73ece231ee` and reapplies `7bfcdbcbf3` with mlir changes. Also reverts commit `423ba12971` and includes the unit test changes of `16da214004`.	2021-12-29 01:10:38 -06:00
Mehdi Amini	73ece231ee	Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t" This reverts commit `7bfcdbcbf3`. Broke MLIR build	2021-12-29 06:57:36 +00:00
Johannes Doerfert	3e0c512ce6	[OpenMP] Simplify all stores in the device code Similar to loads, we want to be aggressive when it comes to store simplification. Not everything in LLVM handles dead stores well when address space casts are involved, we can simply ask the Attributor to do it for us though. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D109998	2021-12-29 00:19:38 -06:00
Johannes Doerfert	7bfcdbcbf3	[OpenMP][NFCI] Embed the source location string size in the ident_t One of the unused ident_t fields now holds the size of the string (=const char *) field so we have an easier time dealing with those in the future. Differential Revision: https://reviews.llvm.org/D113126	2021-12-28 23:53:29 -06:00
Joseph Huber	6e220296d7	[OpenMP] Use alignment information in HeapToShared This patch uses the return alignment attribute now present in the `__kmpc_alloc_shared` runtime call to set the alignment of the shared memory global created to replace it. Depends on D115971 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D116319	2021-12-27 16:58:27 -05:00
Joseph Huber	9ea5b97203	[OpenMP][FIX] Invalidate the SPMDCompatibilityTracker explicitly Before SPMDzation it was sufficient to add an incompatible instruction to the SPMDCompatibilityTracker. However, now adding instructions means they need guarding. As calls cannot be guarded in general we need to explicitly prevent SPMD mode. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D115158	2021-12-06 12:31:57 -05:00
Joseph Huber	058c312a44	[OpenMP][FIX] SPMDzation guarding needs to account for all reaching kernels If two reaching kernels disagree on the execution mode we cannot guard a function right now. Ensure we do not as we otherwise will cause a deadlock. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D114866	2021-12-01 11:44:32 -05:00
Joel E. Denny	c9dfe322ee	[OpenMP] Fix main thread barrier for Pascal and amdgpu Fixes what's left of https://bugs.llvm.org/show_bug.cgi?id=51781. Reviewed By: jdoerfert, JonChesterfield, tianshilei1992 Differential Revision: https://reviews.llvm.org/D113602	2021-11-12 11:18:45 -05:00
Joseph Huber	e52937eba0	[OpenMP] Use AAAssumptionInfo to get assumptions in OpenMPOpt This patch uses the abstract attributor introduced in D111054 to get the assumption values instead of the `hasAssumption` function. This also calls it so assumption information should propagate throug the device where applicabile. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D111445	2021-11-09 17:39:21 -05:00
Johannes Doerfert	d61aac76bf	[OpenMP][FIX] Do not signal SPMD-mode but then keep generic-mode If we assume SPMD-mode during the fixpoint iteration we have to execute the kernel in SPMD-mode. If we change our mind during manifest there is the chance of a mismatch between the simplification, e.g., of `__kmpc_is_spmd_exec_mode` calls, and the execution mode. This problem was introduced in D109438. This patch is compromise to resolve the problem purely in OpenMP-opt while trying to keep the benefits of D109438 around. This might not always work, see `get_hardware_num_threads_in_block_fold` but it often does. At the same time we do keep value specialization and execution mode in sync. Proper solutions to this problem should be considered. I believe a new execution mode is the easiest way forward (Singleton-SPMD). Alternatively, SPMD-mode execution can be used with a way to provide a new thread_limit (here 1) to the runtime. This is more general and could be useful if we see `num_threads` clauses or workshared loops with small trip counts in the kernel. In either proposal we need to disable the guarding for the kernel (which was the motivation for D109438). Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D112894	2021-11-02 23:22:04 -05:00
Johannes Doerfert	73720c8059	[OpenMP][FIX] Introduce and use a simple generic-mode barrier Before we had aligned barriers the `__kmpc_barrier_simple_spmd` was OK to be used in the custom state machine. Now that SPMD barriers are assumed to be aligned we need to use a "generic" barrier in places that are not aligned. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D112893	2021-11-02 23:22:01 -05:00
Johannes Doerfert	c690c1c977	[NVVM] Update intrinsic definitions to include more attributes A lot of NVVM intrinsics can use the default intrinsic attributes (e.g., nosync, nofree, ...) as well as `speculatable`. The latter is important if we want to recompute intrinsics results instead of communicating them via memory. I did use default attributes for almost all `readnone` attributes but speculatable only where I had reasonable confidence they cannot experience UB. That said, someone should double check. TODO: There seem to be various intrinsics marked `Commutative` which should not, e.g., fma and div. Reviewed By: tra Differential Revision: https://reviews.llvm.org/D109987	2021-11-02 23:21:57 -05:00
Johannes Doerfert	8a4551b893	[Attributor][FIX] Use right address space to avoid assertion When we strip and accumulate constant offsets we need to pick the right address space such that the offset APInt has the right bit width. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D112544	2021-10-27 18:22:37 -05:00
Arthur Eubanks	05392466f0	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 13:29:23 -07:00
Arthur Eubanks	569346f274	Revert "Reland [IR] Increase max alignment to 4GB" This reverts commit `8d64314ffe`.	2021-10-06 11:38:11 -07:00
Arthur Eubanks	8d64314ffe	Reland [IR] Increase max alignment to 4GB Currently the max alignment representable is 1GB, see D108661. Setting the align of an object to 4GB is desirable in some cases to make sure the lower 32 bits are clear which can be used for some optimizations, e.g. https://crbug.com/1016945. This uses an extra bit in instructions that carry an alignment. We can store 15 bits of "free" information, and with this change some instructions (e.g. AtomicCmpXchgInst) use 14 bits. We can increase the max alignment representable above 4GB (up to 2^62) since we're only using 33 of the 64 values, but I've just limited it to 4GB for now. The one place we have to update the bitcode format is for the alloca instruction. It stores its alignment into 5 bits of a 32 bit bitfield. I've added another field which is 8 bits and should be future proof for a while. For backward compatibility, we check if the old field has a value and use that, otherwise use the new field. Updating clang's max allowed alignment will come in a future patch. Reviewed By: hans Differential Revision: https://reviews.llvm.org/D110451	2021-10-06 11:03:51 -07:00
Arthur Eubanks	72cf8b6044	Revert "[IR] Increase max alignment to 4GB" This reverts commit `df84c1fe78`. Breaks some bots	2021-10-06 10:21:35 -07:00

1 2 3 4 5

205 Commits