clang-p2996

Author	SHA1	Message	Date
Joseph Huber	29a74a3915	[OpenMP] Add an option to always inline OpenMP device functions. Performance on GPU targets can be highly variable, sometimes inlining everything hurts performance and sometimes it greatly improves it. Add an option to toggle this behaviour to better investigate it. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D109014	2021-08-31 18:48:30 -04:00
Roman Lebedev	564d85e090	The maximal representable alignment in LLVM IR is 1GiB, not 512MiB In LLVM IR, `AlignmentBitfieldElementT` is 5-bit wide But that means that the maximal alignment exponent is `(1<<5)-2`, which is `30`, not `29`. And indeed, alignment of `1073741824` roundtrips IR serialization-deserialization. While this doesn't seem all that important, this doubles the maximal supported alignment from 512MiB to 1GiB, and there's actually one noticeable use-case for that; On X86, the huge pages can have sizes of 2MiB and 1GiB (!). So while this doesn't add support for truly huge alignments, which i think we can easily-ish do if wanted, i think this adds zero-cost support for a not-trivially-dismissable case. I don't believe we need any upgrade infrastructure, and since we don't explicitly record the IR version, we don't need to bump one either. As @craig.topper speculates in D108661#2963519, this might be an artificial limit imposed by the original implementation of the `getAlignment()` functions. Differential Revision: https://reviews.llvm.org/D108661	2021-08-26 12:53:39 +03:00
Vyacheslav Zakharin	2e192ab1f4	[CodeExtractor] Preserve topological order for the return blocks. Differential Revision: https://reviews.llvm.org/D108673	2021-08-25 08:09:01 -07:00
Jon Chesterfield	21d91a8ef3	[libomptarget][devicertl] Replace lanemask with uint64 at interface Use uint64_t for lanemask on all GPU architectures at the interface with clang. Updates tests. The deviceRTL is always linked as IR so the zext and trunc introduced for wave32 architectures will fold after inlining. Simplification partly motivated by amdgpu gfx10 which will be wave32 and is awkward to express in the current arch-dependant typedef interface. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108317	2021-08-18 20:47:33 +01:00
Joseph Huber	58f9326487	[OpenMP] Change AAKernelInfo to ignore non-kernels Currently, AAKernelInfo will fail on an assertion if we attempt to run it on a kernel without the init / deinit runtime calls. However, this occurs for global constructors on the device. This will cause OpenMPOpt to crash whenever global constructors are present. This patch removes this assertion and just gives up instead. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D108258	2021-08-18 11:24:29 -04:00
Johannes Doerfert	e0c5d83a92	[OpenMP][FIX] Disabled optimizations have to be made known To avoid simplification with wrong constants we need to make sure we know that we won't perform specific optimizations based on the users request. The non-SPMDzation and non-CustomStateMachine flags did only prevent the final transformation but allowed to value simplification to go ahead. Differential Revision: https://reviews.llvm.org/D107862	2021-08-11 00:49:53 -05:00
Joseph Huber	640091884f	[OpenMP] AlwaysInline __kmpc_parallel_51 to improve inlining hueristics This patch adds the `AlwaysInline` attribute to the `__kmpc_parallel_51` device runtime call. This improves inlining heuristics which encourages the indirect function pointer arguemnt to also be inlined. This greatly improves performance for a few applications whose outlined regions were not inlined otherwise. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D107839	2021-08-10 14:41:43 -04:00
Giorgis Georgakoudis	29a3e3dd7b	[OpenMPOpt] Expand SPMDization with guarding for target parallel regions This patch expands SPMDization (converting generic execution mode to SPMD for target regions) by guarding code regions that should be executed only by the main thread. Specifically, it generates guarded regions, which only the main thread executes, and the synchronization with worker threads using simple barriers. For correctness, the patch aborts SPMDization for target regions if the same code executes in a parallel region, thus must be not be guarded. This check is implemented using the ParallelLevels AA. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106892	2021-08-04 11:49:24 -07:00
Joseph Huber	cd0dd8ece8	[OpenMP] Adding flags for disabling the following optimizations: Deglobalization SPMDization State machine rewrites Folding This work provides four flags to disable four different sets of OpenMP optimizations. These flags take effect in llvm/lib/Transforms/IPO/OpenMPOpt.cpp and include the following: - openmp-opt-disable-deglobalization: Defaults to false, adding this flag sets the variable DisableOpenMPOptDeglobalization to true. This prevents AA registration for HeapToStack and HeapToShared. - openmp-opt-disable-spmdization: Defaults to false, adding this flag sets the variable DisableOpenMPOptSPMDization to true. This indicates a pessimistic fixpoint in changeToSPMDMode. - openmp-opt-disable-folding: Defaults to false, adding this flag sets the variable DisableOpenMPOptFolding to true. This indicates a pessimistic fixpoint in the attributor init for AAFoldRuntimeCall. - openmp-opt-disable-state-machine-rewrite: Defaults to false, adding this flag sets the variable DisableOpenMPOptStateMachineRewrite to true. This first prevents changes to the state machine in rewriteDeviceCodeStateMachine by returning before changes are made, and if a custom state machine is built in buildCustomStateMachine, stops by returning a pessimistic fixpoint. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D106802	2021-07-29 19:28:31 -04:00
Joseph Huber	adbaa39dfc	[Attributor] Change function internalization to not replace uses in internalized callers The current implementation of function internalization creats a copy of each function and replaces every use. This has the downside that the external versions of the functions will call into the internalized versions of the functions. This prevents them from being fully independent of eachother. This patch replaces the current internalization scheme with a method that creates all the copies of the functions intended to be internalized first and then replaces the uses as long as their caller is not already internalized. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106931	2021-07-28 18:57:28 -04:00
Jose M Monsalve Diaz	5ab6aedda9	[OpenMP] Folding threadLimit and numThreads when single value in kernels The device runtime contains several calls to `__kmpc_get_hardware_num_threads_in_block` and `__kmpc_get_hardware_num_blocks`. If the thread_limit and the num_teams are constant, these calls can be folded to the constant value. In this patch we use the already introduced `AAFoldRuntimeCall` and the `NumTeams` and `NumThreads` kernel attributes (to be introduced in a different patch) to fold these functions. The code checks all the kernels, and if their attributes match, the functions are folded. In the future we will explore specializing for multiple values of NumThreads and NumTeams. Depends on D106390 Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D106033	2021-07-27 21:47:12 -04:00
Johannes Doerfert	41bd26dff9	[Attributor] Delete dead stores D106185 allows us to determine if a store is needed easily. Using that knowledge we can start to delete dead stores. In AAIsDead we now track more state as an instruction can be dead (= the old optimisitc state) or just "removable". A store instruction can be removable while being very much alive, e.g., if it stores a constant into an alloca or internal global. If we would pretend it was dead instead of only removablewe we would ignore it when we determine what values a load can see, so that is not what we want. Differential Revision: https://reviews.llvm.org/D106188	2021-07-26 23:33:36 -05:00
Johannes Doerfert	adddd3dbda	[Attributor] Introduce getPotentialCopiesOfStoredValue and use it This patch introduces `getPotentialCopiesOfStoredValue` which uses AAPointerInfo to determine all "aliases" or "potential copies" of a value that is stored into memory. This operation can fail but if it succeeds it means we can visit all "uses" of a value even if it is temporarily stored in memory. There are two users for the function: 1) `Attributor::checkForAllUses` which will now ignore the value use in a store if all "potential copies" can be identified and instead be visited. This allows various AAs, including AAPointerInfo itself, to look through memory. 2) `AANoCapture` which uses a custom use tracking through the CaptureTracker interface and therefore needs to be thought explicitly. Differential Revision: https://reviews.llvm.org/D106185	2021-07-26 23:33:36 -05:00
Shilei Tian	e97e0a4fad	[AbstractAttributor] Fold __kmpc_parallel_level if possible Similar to D105787, this patch tries to fold `__kmpc_parallel_level` if possible. Note that `__kmpc_parallel_level` doesn't take activeness into consideration, based on current `deviceRTLs`, its return value can be such as 0, 1, 2, instead of 0, 129, 130, etc. that also indicate activeness. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106154	2021-07-26 22:46:19 -04:00
Johannes Doerfert	be2b569646	[OpenMP] Run rewriteDeviceCodeStateMachine in the Module not CGSCC pass While rewriteDeviceCodeStateMachine should probably be folded into buildCustomStateMachine, we at least need the optimization to happen. This was not reliably the case in the CGSCC pass but in the Module pass it seems to work reliably. This also ports a test to the new kernel encoding (target_init/deinit), and makes sure we cannot run the kernel in SPMD mode. Differential Revision: https://reviews.llvm.org/D106345	2021-07-26 21:26:07 -05:00
Giorgis Georgakoudis	f97de4cb0b	[OpenMPOpt] Move dedup runtime calls after init for target regions Deduplication in OpenMPOpt finds redundant OpenMP runtime calls and replaces them with a single call placed in the earliest safe location in the IR. When deduplication happens in a target region this patch makes sure replacement calls are put after target_init. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106556	2021-07-23 05:54:01 -07:00
Giorgis Georgakoudis	f8c40ed8f8	[OpenMP] Use AAHeapToStack/AAHeapToShared analysis in SPMDization SPMDization D102307 detects incompatible OpenMP runtime calls to abort converting a target region to SPMD mode. Calls to memory allocation/de-allocation routines kmpc_alloc_shared, kmpc_free_shared are incompatible unless they are removed by AAHeapToStack/AAHeapToShared analysis. This patch extends SPMDization detection to include AAHeapToStack/AAHeapToShared analysis results for enlarging the scope of possible SPMDized regions detected. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105634	2021-07-22 18:08:37 -07:00
Alexey Bataev	b88a68c45e	[OPENMP]Fix PR49787: Codegen for calling __tgt_target_teams_nowait_mapper has too few arguments. Added missed arguments in __tgt_target_teams_nowait_mapper/__tgt_target_nowait_mapper runtime functions calls. Differential Revision: https://reviews.llvm.org/D106542	2021-07-22 08:44:37 -07:00
Joseph Huber	16206d17cd	[OpenMP] Strip NoInline from known OpenMP runtime functions This patch strips the NoInline attribute from known OpenMP runtime functions. This is done so that we can denote certain runtime functions as NoInline to ensure their call sites are intact so they can be checked by OpenMPOpt. We don't wan't this noinline attribute to remain for any functions after OpenMPOpt has been run however. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106482	2021-07-21 21:18:26 -04:00
Joseph Huber	196fe994b8	[OpenMP] Fold `__kmpc_is_generic_main_thread_id` if possible This patch adds the ability to fold `__kmpc_is_generic_main_thread_id` if we know for a fact that it is executed by the initial thread using AAExecutionDomain. This combined with folding `__kmpc_is_spmd_exec_mode` will allow us to fully fold `__kmpc_is_generic_main_thread`. Depends on D106438 D106437 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106439	2021-07-21 21:18:22 -04:00
Joseph Huber	7d57639264	[OpenMP] Add new execution mode for SPMD execution with Generic semantics Qualified kernels can be transformed from generic-mode to SPMD mode using an optimization in OpenMPOpt. This patch introduces a new execution mode to indicate kernels that have been transformed from generic-mode to SPMD-mode. These kernels have SPMD-mode execution, but need generic-mode semantics for scheduling the blocks and threads. Without this far too few blocks will be scheduled for a generic region as SPMD mode expects the trip count to be divided by the number of threads. Reviewed By: ggeorgakoudis Differential Revision: https://reviews.llvm.org/D106460	2021-07-21 20:57:28 -04:00
Joseph Huber	754eb1c210	[OpenMP] Change `__kmpc_free_shared` to include the paired allocation size This patch changes `__kmpc_free_shared` to take an additional argument corresponding to the associated allocation's size. This makes it easier to implement the allocator in the runtime. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D106496	2021-07-21 20:56:21 -04:00
Giorgis Georgakoudis	e8439ec893	[OpenMP] Set RequiresFullRuntime false in SPMDization SPMDization in D102307 does not change the RequiresFullRuntime argument of kmpc_target_init/deinit calls. However, the constraints of SPMDization detection for converting a target region to SPMD mode should guarantee that the region does not require full runtime support. Hence, this patch sets RequiresFullRuntime to false for improved execution performance. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105556	2021-07-20 09:54:51 -07:00
Johannes Doerfert	205c520387	[OpenMP] Remove XFAIL and update check lines properly Undo `15c5701c83` and update check lines.	2021-07-20 00:35:13 -05:00
Johannes Doerfert	15c5701c83	[OpenMP][FIX] Temporarily XFAIL tests waiting for new check lines The test is not wrong nor is the current main broken, it just an interplay issue. Check lines will be updated in shortly.	2021-07-19 23:14:35 -05:00
Johannes Doerfert	c2281f1565	[Attributor] Introduce AAPointerInfo This patch introduces AAPointerInfo which tracks the uses of a pointer and places them in "bins" based on their offset from the base and access size. As with other AAs, any pointer can be tracked but it is up to the user to make sense of the results. The user in this patch is AAValueSimplify and AAPotentialValues which both utilize AAPointerInfo to determine the value of a load. For now, this is restricted to loads of allocas and internal globals. Through the use of AAPointerInfo and the "bins" we can track struct members separately. The users also know that storing only zeros (at unknown indices) will result in loading only 0 (from unknown indices). Other than that, the users are flow and context insensitive (for now). To deal with the "bins" more easily, AAPointerInfo provides a forallInterfearingAccesses that applies a callback on all accesses that might interfere with a given load or store. Differential Revision: https://reviews.llvm.org/D104432	2021-07-19 22:48:35 -05:00
Joseph Huber	eef6601b0f	[OpenMP] Rework OpenMP remarks This patch rewrites and reworks a few of the existing remarks to make the mmore concise and consistent prior to writing the documentation for them. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105898	2021-07-16 14:07:00 -04:00
Shilei Tian	ca662297d5	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-15 18:23:23 -04:00
Shilei Tian	a70ef3f568	Revert "[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible" This reverts commit `1100e4aafe`.	2021-07-15 11:19:28 -04:00
Shilei Tian	1100e4aafe	[AbstractAttributor] Fold function calls to `__kmpc_is_spmd_exec_mode` if possible In the device runtime there are many function calls to `__kmpc_is_spmd_exec_mode` to query the execution mode of current kernels. In many cases, user programs only contain target region executing in one mode. As a consequence, those runtime function calls will only return one value. If we can get rid of these function calls during compliation, it can potentially improve performance. In this patch, we use `AAKernelInfo` to analyze kernel execution. Basically, for each kernel (device) function `F`, we collect all kernel entries `K` that can reach `F`. A new AA, `AAFoldRuntimeCall`, is created for each call site. In each iteration, it will check all reaching kernel entries, and update the folded value accordingly. In the future we will support more function. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105787	2021-07-13 22:28:35 -04:00
Johannes Doerfert	514c033db1	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 18:44:25 -05:00
Johannes Doerfert	8cb7d71355	[OpenMP][FIX] Add missing `)` to remark	2021-07-10 18:40:32 -05:00
Johannes Doerfert	d9659bf6a0	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 17:57:08 -05:00
Johannes Doerfert	e2cfbfcc0c	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 17:53:56 -05:00
Johannes Doerfert	5b05a5f6ce	[OpenMP][FIX] Update remark in test file after rewording	2021-07-10 16:36:02 -05:00
Johannes Doerfert	c1c1fe9385	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 16:32:24 -05:00
Johannes Doerfert	5ef18e2421	[Attributor] Use AAValueSimplify to simplify returned values We should use AAValueSimplify for all value simplification, however there was some leftover logic that predates AAValueSimplify in AAReturnedValues. This remove the AAReturnedValues part and provides a replacement by making AAValueSimplifyReturned strong enough to handle all previously covered cases. Further, this improve AAValueSimplifyCallSiteReturned to handle returned arguments. AAReturnedValues is now much easier and the collected returned values/instructions are now from the associated function only, making it much more sane. We also do not have the brittle logic anymore that looks for unresolved calls. Instead, we use AAValueSimplify to handle recursion. Useful code has been split into helper functions, e.g., an Attributor interface to get a simplified value. Differential Revision: https://reviews.llvm.org/D103860	2021-07-10 15:52:36 -05:00
Nico Weber	d3e7491333	Revert Attributor patch series Broke check-clang, see https://reviews.llvm.org/D102307#2869065 Ran `git revert -n ebbe149a6f08535ede848a531a601ae6591cfbc5..269416d41908bb670f67af689155d5ab8eea689a`	2021-07-10 16:15:55 -04:00
Johannes Doerfert	d39179d7fa	[OpenMP] Detect SPMD compatible kernels and execute them as such In the spirit of TRegions [0], this patch analyzes a kernel and tracks if it can be executed in SPMD-mode. If so, we flip the arguments of the __kmpc_target_init and deinit call to enable the mode. We also update the `<kernel>_exec_mode` flag to indicate to the runtime we changed the mode to SPMD. The code analysis is done interprocedurally by extending the AAKernelInfo abstract attribute to track SPMD compatibility as well. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D102307	2021-07-10 12:32:51 -05:00
Johannes Doerfert	f0628c6ff7	[OpenMP] Create custom state machines for generic target regions In the spirit of TRegions [0], this patch creates a custom state machine for a generic target region based on the potentially called parallel regions. The code analysis is done interprocedurally via an abstract attribute (AAKernelInfo). All outermost parallel regions are collected and we check if there might be unknown outermost parallel regions for which we need an indirect call. Other AAKernelInfo extensions are expected. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 Differential Revision: https://reviews.llvm.org/D101977	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1d5711c3ee	[OpenMP] Unified entry point for SPMD & generic kernels in the device RTL In the spirit of TRegions [0], this patch provides a simpler and uniform interface for a kernel to set up the device runtime. The OMPIRBuilder is used for reuse in Flang. A custom state machine will be generated in the follow up patch. The "surplus" threads of the "master warp" will not exit early anymore so we need to use non-aligned barriers. The new runtime will not have an extra warp but also require these non-aligned barriers. [0] https://link.springer.com/chapter/10.1007/978-3-030-28596-8_11 This was in parts extracted from D59319. Reviewed By: ABataev, JonChesterfield Differential Revision: https://reviews.llvm.org/D101976	2021-07-10 12:32:50 -05:00
Johannes Doerfert	1eb31d6de3	[Attributor] Reorganize AAHeapToStack In order to simplify future extensions, e.g., the merge of AAHeapToShared in to AAHeapToStack, we reorganize AAHeapToStack and the state we keep for each malloc-like call. The result is also less confusing as we only track malloc-like calls, not all calls. Further, we only perform the updates necessary for a malloc-like to argue it can go to the stack, e.g., we won't check all uses if we moved on to the "must-be-freed" argument. This patch also uses Attributor helps to simplify the allocated size, alignment, and the potentially freed objects. Overall, this is mostly a reorganization and only the use of the optimistic helpers should change (=improve) the capabilities a bit. Differential Revision: https://reviews.llvm.org/D104993	2021-07-10 12:32:50 -05:00
Johannes Doerfert	374e573cfc	[Attributor] Use AAValueSimplify to simplify returned values We should use AAValueSimplify for all value simplification, however there was some leftover logic that predates AAValueSimplify in AAReturnedValues. This remove the AAReturnedValues part and provides a replacement by making AAValueSimplifyReturned strong enough to handle all previously covered cases. Further, this improve AAValueSimplifyCallSiteReturned to handle returned arguments. AAReturnedValues is now much easier and the collected returned values/instructions are now from the associated function only, making it much more sane. We also do not have the brittle logic anymore that looks for unresolved calls. Instead, we use AAValueSimplify to handle recursion. Useful code has been split into helper functions, e.g., an Attributor interface to get a simplified value. Differential Revision: https://reviews.llvm.org/D103860	2021-07-10 12:32:50 -05:00
Joseph Huber	ecabc6684f	[OpenMP] Change analysis remarks to not emit on cold functions The remarks will trigger on some functions that are marked cold, such as the `__muldc3` intrinsic functions. Change the remarks to avoid these functions. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105196	2021-06-30 11:54:24 -04:00
Joseph Huber	0edb87773b	[OpenMP] Add additional remarks for OpenMPOpt This patch adds additional remarks, suggesting the use of `noescape` for failed globalization and indicating when internalization failed. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D105150	2021-06-30 09:49:25 -04:00
Nikita Popov	f8aaec19e6	[OpaquePtr] Support forward references in textual IR Currently, LLParser will create a Function/GlobalVariable forward reference based on the desired pointer type and then modify it when it is declared. With opaque pointers, we generally do not know the correct type to use until we see the declaration. Solve this by creating the forward reference with a dummy type, and then performing a RAUW with the correct Function/GlobalVariable when it is declared. The approach is adopted from `b5b55963f6`. This results in a change to the use list order, which is why we see test changes on some module passes that are not stable under use list reordering. Differential Revision: https://reviews.llvm.org/D104950	2021-06-29 20:10:31 +02:00
Joseph Huber	57ad2e1067	[OpenMP] Prevent OpenMPOpt from internalizing uncalled functions Currently OpenMPOpt will only check if a function is a kernel before deciding not to internalize it. Any uncalled function that gets internalized will be trivially dead in the module so this is unnnecessary. Depends on D102423 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104890	2021-06-28 16:47:53 -04:00
Joseph Huber	5ccb7424fa	[OpenMP] Change OpenMPOpt to check openmp metadata The metadata added in D102361 introduces a module flag that we can check to determine if the module was compiled with `-fopenmp` enables. We can now check for the precense of this instead of scanning the call graph for OpenMP runtime functions. Depends on D102361 Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D102423	2021-06-25 16:34:22 -04:00
Joseph Huber	44feacc736	[OpenMP] Change remaining globalization from an analysis remark to missed After landing the globalization optimizations, the precense of globalization on the device that was not put in shared or stack memory is a failed optimization with performance consequences so it should indicate a missed remark. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D104735	2021-06-22 16:52:06 -04:00
Joseph Huber	30e36c9b3c	[Attributor] Add interface to emit remarks in Attributor Summary: This patch adds support for the Attributor to emit remarks on behalf of some other pass. The attributor can now optionally take a callback function that returns an OptimizationRemarkEmitter object when given a Function pointer. If this is availible then a remark will be emitted for the corresponding pass name. Depends on D102197 Reviewed By: sstefan1 thegameg Differential Revision: https://reviews.llvm.org/D102444	2021-06-22 14:12:46 -04:00

1 2 3

137 Commits