clang-p2996

Author	SHA1	Message	Date
Joseph Huber	1776dc8124	[Libomptarget][Obvious] Fix uninitialized pointer Summary: This pointer was not initliazed to null which meant that it would be erronenously deleted by plugins that were not in use.	2023-07-11 15:41:46 -05:00
Joseph Huber	8a0763f19c	[Libomptarget] Remove RPCHandleTy indirection The 'RPCHandleTy' was intended to capture the intention that a specific device owns its slot in the RPC server. However, this required creating a temporary store to hold these pointers. This was causing really weird spurious failure due to undefined behaviour in the order of library teardown. For example, the x64 plugin would be torn down, set this to some invalid memory, and then the CUDA plugin would crash. Rather than spend the time to fully diagnose this problem I found it pertinent to simply remove the failure mode. This patch removes this indirection so now the usage of the RPC server must always be done with the intended device. This just requires some extra handling for the AMDGPU indirection where we need to store a reference to the device. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154971	2023-07-11 10:54:40 -05:00
Michael Halkenhaeuser	142faf56f5	[OpenMP] [OMPT] [amdgpu] [5/8] Implemented device init/fini/load callbacks Added support in the generic plugin to invoke registered callbacks. Depends on D124070 Patch from John Mellor-Crummey <johnmc@rice.edu> (With contributions from Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com>) Differential Revision: https://reviews.llvm.org/D124652	2023-07-11 07:13:22 -04:00
Shao-Ce SUN	048423702d	[OpenMP] Fix build warnings ``` llvm-project/openmp/libomptarget/src/private.h:260:9: warning: 'DEBUG_PREFIX' macro redefined [-Wmacro-redefined] #define DEBUG_PREFIX GETNAME(TARGET_NAME) ^ llvm-project/openmp/libomptarget/include/ompt_device_callbacks.h:22:9: note: previous definition is here #define DEBUG_PREFIX "OMPT" ^ 1 warning generated. ``` ``` llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:458:14: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move] return std::move(Err); ^ llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:458:14: note: remove std::move call here return std::move(Err); ^~~~~~~~~~ ~ llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:552:12: warning: moving a local object in a return statement prevents copy elision [-Wpessimizing-move] return std::move(Err); ^ llvm-project/openmp/libomptarget/plugins-nextgen/common/PluginInterface/PluginInterface.cpp:552:12: note: remove std::move call here return std::move(Err); ^~~~~~~~~~ ~ 2 warnings generated. ``` Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D154787	2023-07-09 22:12:23 +08:00
Joseph Huber	e526a7fc15	[Libomptarget][NFC] Clean up warnings and format	2023-07-07 18:59:26 -05:00
Joseph Huber	338c80516b	[Libomptarget] Refine logic for determining if we support RPC Summary: Add a requirement for the GPU libc to only be on if its enabled explicitly. Fix the logic around the pythonification of the variable.	2023-07-07 14:06:58 -05:00
Joseph Huber	691dc2d10d	[Libomptarget] Begin implementing support for RPC services This patch adds the intial support for running an RPC server in libomptarget to handle host services. We interface with the library provided by the `libc` project to stand up a basic server. We introduce a new type that is controlled by the plugin and has each device intialize its interface. We then run a basic server to check the RPC buffer. This patch does not fully implement the interface. In the future each plugin will want to define special handlers via the interface to support things like malloc or H2D copies coming from RPC. We will also want to allow the plugin to specify t he number of ports. This is currently capped in the implementation but will be adjusted soon. Right now running the server is handled by whatever thread ends up doing the waiting. This is probably not a completely sound solution but I am not overly familiar with the behaviour of OpenMP tasks and what would be required here. This works okay with synchrnous regions, and somewhat fine with `nowait` regions, but I've observed some weird behavior when one of those regions calls `exit`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154312	2023-07-07 12:36:46 -05:00
Joseph Huber	6764301a6b	[Libomptarget] Correctly implement `getWTime` on AMDGPU AMDGPU provides a fixed frequency clock since some generations back. However, the frequency is variable by card and must be looked up at runtime. This patch adds a new device environment line for the clock frequency so that we can use it in the same way as NVPTX. This is the correct implementation and the version in ASO should be replaced. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D154456	2023-07-04 21:50:43 -05:00
Job Noorman	8de9f2b558	Move SubtargetFeature.h from MC to TargetParser SubtargetFeature.h is currently part of MC while it doesn't depend on anything in MC. Since some LLVM components might have the need to work with target features without necessarily needing MC, it might be worthwhile to move SubtargetFeature.h to a different location. This will reduce the dependencies of said components. Note that I choose TargetParser as the destination because that's where Triple lives and SubtargetFeatures feels related to that. This issues came up during a JITLink review (D149522). JITLink would like to avoid a dependency on MC while still needing to store target features. Reviewed By: MaskRay, arsenm Differential Revision: https://reviews.llvm.org/D150549	2023-06-26 11:20:08 +02:00
Shao-Ce SUN	f042890521	[openmp] remove initializeRewriteSymbolsLegacyPassPass Fix build error caused by D153679 Reviewed By: nikic Differential Revision: https://reviews.llvm.org/D153704	2023-06-25 00:35:01 +08:00
Johannes Doerfert	6629a96a8c	[OpenMP] Improve default block count selection fow low block counts If a combined loop has insufficient parallelism (= low trip count), we might end up with too few teams/blocks. To counter that we can reduce the number of threads per team we use. This patch implements a heuristic and exposes a new environment variable to control the minimum of threads to be employed in this case. Issue reported by: Felipe Cabarcas Jaramillo <cabarcas@udel.edu> (@fel-cab). Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D152014	2023-06-05 16:35:44 -07:00
Kevin Sala	843f496b71	[OpenMP][libomptarget] Improve device info printing in NextGen plugins This patch improves the device info printing in the NextGen plugins. The device info properties are composed of keys, values and units (if necessary). These properties are pushed into a queue by each vendor-specifc plugin, and later, these properties are printed processed and printed by the common Plugin Interface. The printing format is common across the different plugins. Differential Revision: https://reviews.llvm.org/D148178	2023-05-09 15:34:15 +02:00
Dhruva Chakrabarti	01035dc04d	[OpenMP] [OMPT] [amdgpu] [4/8] Implemented callback registration in nextgen plugins The purpose of this patch is to Implement registration of callback functions in the generic plugin by looking up corresponding callbacks in libomptarget. The overall design document is https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc Defined an object of type OmptDeviceCallbacksTy in the amdgpu plugin for holding the tool-provided callback functions. Implemented a global constructor in the plugin that creates a connector object to connect with libomptarget. The callbacks that are already registered with libomptarget are looked up and registered with the plugin. Combined with an internal patch from Dhruva Chakrabarti, which fixes the OMPT initialization ordering. Achieved through removal of the constructor attribute from ompt_init. Patch from John Mellor-Crummey <johnmc@rice.edu> With contributions from: Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com> Michael Halkenhaeuser <MichaelGerald.Halkenhauser@amd.com> Reviewed By: dhruvachak, tianshilei1992 Differential Revision: https://reviews.llvm.org/D124070	2023-05-05 07:16:15 -04:00
Shilei Tian	c7de29e7bb	Revert "[OpenMP] [OMPT] [amdgpu] [4/8] Implemented callback registration in nextgen plugins" This reverts commit `8cd1f0d888`. It causes issues when OMPT is disabled explicitly and dependences are not set correctly.	2023-05-02 14:33:12 -04:00
Dhruva Chakrabarti	8cd1f0d888	[OpenMP] [OMPT] [amdgpu] [4/8] Implemented callback registration in nextgen plugins The purpose of this patch is to Implement registration of callback functions in the generic plugin by looking up corresponding callbacks in libomptarget. The overall design document is https://rice.app.box.com/s/pf3gix2hs4d4o1aatwir1set05xmjljc Defined an object of type OmptDeviceCallbacksTy in the amdgpu plugin for holding the tool-provided callback functions. Implemented a global constructor in the plugin that creates a connector object to connect with libomptarget. The callbacks that are already registered with libomptarget are looked up and registered with the plugin. Combined with an internal patch from Dhruva Chakrabarti, which fixes the OMPT initialization ordering. Achieved through removal of the constructor attribute from ompt_init. Patch from John Mellor-Crummey <johnmc@rice.edu> With contributions from: Dhruva Chakrabarti <Dhruva.Chakrabarti@amd.com> Michael Halkenhaeuser <MichaelGerald.Halkenhauser@amd.com> Differential Revision: https://reviews.llvm.org/D124070	2023-05-02 18:35:30 +02:00
Shilei Tian	d4ecd1241c	Revert "[OpenMP] Introduce kernel environment" This reverts commit `35cfadfbe2`. It makes a couple of buildbots unhappy because of the following test failures: - `Transforms/OpenMP/add_attributes.ll'` - `mapping/declare_mapper_target_data.cpp` on AMDGPU	2023-04-22 20:56:35 -04:00
Shilei Tian	35cfadfbe2	[OpenMP] Introduce kernel environment This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime. This is a combination and refinement of patch series D116908, D116909, and D116910. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D142569	2023-04-22 20:46:38 -04:00
Kevin Sala	221350965a	[OpenMP][libomptarget][NFC] Remove error data member from AsyncInfoWrapperTy This patch removes the Err data member from the AsyncInfoWrapperTy class. Now the error is stored externally, in the caller side, and it is explicitly passed to the AsyncInfoWrapperTy::finalize() function as a reference. Differential Revision: https://reviews.llvm.org/D148027	2023-04-18 18:52:01 +02:00
Johannes Doerfert	110cf873ad	[OpenMP][NFC] Silence warning	2023-04-17 15:57:10 -07:00
Kevin Sala	8dad7f4953	[OpenMP][libomptarget] Do not rely on AsyncInfoWrapperTy's destructor	2023-04-04 17:51:28 +02:00
Kevin Sala	48cd8b54d1	[NFC][OpenMP][libomptarget] Remove unnecessary AsyncInfoWrapperTy parameter	2023-03-28 17:28:12 +02:00
Joseph Huber	edc0355006	[Libomptarget] Add missing explicit moves on llvm::Error Summary: Some older compilers, which we still support, have problems handling the copy elision that allows us to directly move an `Error` to an `Expected`. This patch adds explicit moves to remove the error.	2023-03-20 11:49:59 -05:00
Nikita Popov	a8f6b5763e	[PassBuilder] Support O0 in default pipelines The default and pre-link pipeline builders currently require you to call a separate method for optimization level O0, even though they have perfectly well-defined O0 optimization pipelines. Accept O0 optimization level and call buildO0DefaultPipeline() internally, so all consumers don't need to repeat this. Differential Revision: https://reviews.llvm.org/D146200	2023-03-17 10:00:05 +01:00
Joseph Huber	48d5ad93cd	[OpenMP][NFC] Clean up Twines and other issues in plugins Summary: Tihs patch is mostly NFC to fix some warning currently present in OpenMP offloading plugins. Specifically this mostly removes the use of Twine variables in favor of LLVM's small string. Twine variables are prone to use-after-free and this is a cleaner way to concatenate a string.	2023-03-01 15:03:21 -06:00
Joseph Huber	656378085e	[Libomptarget] Fix block and thread limit environment variables not being respected The next-gen plugins did not properly set the values from `OMP_NUM_TEAMS` and `OMP_TEAMS_THREAD_LIMIT`. This is because these maximum values are set by each plugin to its hardware maximum. This happens after the previous initialization. Move it to the correct place and then add a test. Fixes https://github.com/llvm/llvm-project/issues/61082 Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D145105	2023-03-01 14:12:46 -06:00
JP Lehr	b82ac74f7e	[OpenMP][AMDGPU] More detail in AMDGPU kernel launch info Makes the info that is printed for kernel launches configurable for different plugins. Adds all machinery to print the detailed launch info that the current AMD plugin provides and includes e.g. register spill counts. The files msgpack.cpp, msgpack.def, and msgpack.h are copied from the old plugin and are untouched. The contents of UtilitiesHSA.cpp and .h are copied together from various files from the old plugin. The code was originally written by Jon Chesterfield. I updated the function and type names visible to the outside, i.e. in headers, to respect the LLVM conventions. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D144521	2023-02-28 07:41:48 -05:00
Joseph Huber	9b8e4b4f96	[Libomptarget] Remove unused image argument from global handler function Summary: A previous patch got rid of the use of this image but forgot to remove it from this function. Simply remove it as it is unused now.	2023-02-24 07:24:29 -06:00
Joseph Huber	22d618f543	[libomptarget] Remove unused image from global data movement function This interface function does not actually need the device image type. It's unused in the function, so it should be able to be safely removed. The motivation for this is to facilitate downsteam porting of the amd-stg-open RPC module into the nextgen plugin so we can delete the old plugin entirely. For that to work we need to be able to call this function at kernel-launch time, which doesn't have the image. Also it's cleaner. Reviewed By: jplehr Differential Revision: https://reviews.llvm.org/D144436	2023-02-21 07:09:36 -06:00
Samuel Parker	2a58be4239	[HardwareLoops] NewPM support. With the NPM, we're now defaulting to preserving LCSSA, so a couple of tests have changed slightly. Differential Revision: https://reviews.llvm.org/D140982	2023-02-13 09:46:31 +00:00
Archibald Elliott	62c7f035b4	[NFC][TargetParser] Remove llvm/ADT/Triple.h I also ran `git clang-format` to get the headers in the right order for the new location, which has changed the order of other headers in two files.	2023-02-07 12:39:46 +00:00
Kevin Sala	230d976853	[NFC][OpenMP][libomptarget] Fix format in PluginInterface header	2023-02-06 10:15:50 +01:00
Kevin Sala	6ca034644d	[OpenMP][libomptarget] Notify the plugins regarding new mapping/unmappings The NextGen plugins use the information regarding new mapping/unmappings to lock/unlock the corresponding host buffer and speed up the host-device memory transfers involving those buffers. The locking/unlocking is disabled by default and can be enabled by the LIBOMPTARGET_LOCK_MAPPED_HOST_BUFFERS envar. The envar accepts boolean values (on/off) and a special option: - off: Do not lock mapped host buffers (default). - on: Lock mapped host buffers automatically, but do not report lock failures if the plugin fails to lock them. - mandatory: Lock mapped host buffers automatically and treat locking failures in the plugins as fatal errors. This option may be useful for debugging purposes. Differential Revision: https://reviews.llvm.org/D142514	2023-02-06 10:09:35 +01:00
Kevin Sala	2a539ee17d	[OpenMP][libomptarget] Implement memory lock/unlock API in NextGen plugins This patch implements the memory lock/unlock API, introduced in patch https://reviews.llvm.org/D139208, in the NextGen plugins. Locked buffers feature reference counting and we allow certain overlapping. Given an already locked buffer A, other buffers that are fully contained inside A can be locked again, even if they are smaller than A. In this case, the reference count of locked buffer A will be incremented. However, extending an existing locked buffer is not allowed. The original buffer is actually unlocked once all its users have released the locked buffer and sub-buffers (i.e., the reference counter becomes zero). Differential Revision: https://reviews.llvm.org/D141227	2023-01-25 00:11:38 +01:00
Scott Linder	25c0ea2a53	[NFC] Consolidate llvm::CodeGenOpt::Level handling Add free functions llvm::CodeGenOpt::{getLevel,getID,parseLevel} to provide common implementations for functionality that has been duplicated in many places across the codebase. Differential Revision: https://reviews.llvm.org/D141968	2023-01-23 22:50:49 +00:00
Joseph Huber	b280e12a3d	[Libomptarget][NFC] Address a few warnings in libomptarget Summary: Fix a few minor warnings that show up in `libomptarget`.	2023-01-23 08:56:03 -06:00
Johannes Doerfert	40f9bf082f	[OpenMP] Introduce the `ompx_dyn_cgroup_mem(<N>)` clause Dynamic memory allows users to allocate fast shared memory when a kernel is launched. We support a single size for all kernels via the `LIBOMPTARGET_SHARED_MEMORY_SIZE` environment variable but now we can control it per kernel invocation, hence allow computed values. Note: Only the nextgen plugins will allocate memory based on the clause, the old plugins will silently miscompile. Differential Revision: https://reviews.llvm.org/D141233	2023-01-21 18:46:36 -08:00
Johannes Doerfert	16a385ba21	[OpenMP] Modernize the kernel launching interface and APIs We already created a versioned `__tgt_kernel_arguments` struct but it was only briefly used and its content was passed in isolation anyway. This makes it hard to add more information in the future. With this patch we fully embrace the struct as means to pass information from the compiler to the plugin as part of a kernel launch. The patch also extends and renames the struct, bumping the version number to 2. Version 1 entries are auto-upgraded. This is in preparation for "bare" kernel launches, per kernel dynamic shared memory, CUDA/HIP lowering, etc. The `__tgt_target_kernel_nowait` interface was deprecated as it was unused. Once we actually implement support for something like that, we can add an appropriate API. Note: Only plugins with the `launch_kernel` interface are now supported. That means that a new clang won't be able to use an old runtime. An old clang can still use the new runtime since the libomptarget interface did not change. Differential Revision: https://reviews.llvm.org/D141232	2023-01-21 11:16:21 -08:00
Giorgis Georgakoudis	0f4b4e8e4d	[OpenMP] RecordReplay saves bitcode when JIT-ing This patch enables to store bitcode images when JIT is enabled for the record-and-replay functionality (see https://reviews.llvm.org/D138931). Credits to @jdoerfert for refactoring the code. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D141986	2023-01-18 11:25:25 -08:00
Giorgis Georgakoudis	94c772dc92	[OpenMP] Support kernel record and replay This patch adds functionality for recording and replaying the execution of OpenMP offload kernels, based on an original implementation by Steve Rangel. The patch extends libomptarget to extract a json description of the kernel, the device image binary, and a device memory snapshot before and after the execution of a recorded kernel. Kernel recording/replaying in libomptarget is controlled through env vars (LIBOMPTARGET_RECORD, LIBOMPTARGET_REPLAY). It provides a tool, llvm-omp-kernel-replay, for replaying a kernel using the extracted information with the ability to verify replayed execution using the post-execution device memory snapshot, also supporting changing the number of teams/threads for replaying. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D138931	2023-01-17 16:29:03 -08:00
Joseph Huber	566ecc2231	[Libomptarget][NFC] Rename device environment variable This variable is used by the runtime. Before kernel launch we set it to indicate several configuration options from the host. This patch renames it to be more in-line with the rest of the named exported from the runtime. This is better because this is the only symbol visible to the host from the runtime, so it should have a reserved name. Reviewed By: tianshilei1992 Differential Revision: https://reviews.llvm.org/D141960	2023-01-17 14:28:04 -06:00
Johannes Doerfert	d9415cd024	[OpenMP][JIT] Introduce more debugging configuration options The JIT is a great debugging tool since we can modify the IR manually before launching it in an existing test case. The new flasks allow to skip optimizations, to use the exact given IR, as well as to provide a finished object file. The latter is useful to try out different backend options and to have complete freedom with pass pipelines. Documentation is included. Minimal refactoring was performed to make the second object fit in nicely.	2023-01-15 11:44:10 -08:00
Johannes Doerfert	f8e094be81	[OpenMP][JIT] Cleanup JIT interface, caching, and races The JIT interface was somewhat irregular as it used multiple global functions. It also did not cache the results of the JIT, hence multiple GPU systems would perform the work multiple times. Finally, there might have been races on the state if we have multi-threaded initialization of different embedded images, or one image initialized on multiple devices. This patch tries to rectify all of the above. The JITEngine is now a part of the GenericPluginTy and tied to one target triple. To support multiple "ComputeUnitKind"s (previously confusingly called Arch or [M]CPU) and to avoid re-jitting for the same ComputeUnitKind, we keep a map of JIT results per ComputeUnitKind. All interaction with the JIT happens through the JITEngine directly, two functions are exposed. Both use (shared) locks to avoid races and cache the result. All JIT-related environment variables are now defined together. Differential Revision: https://reviews.llvm.org/D141081	2023-01-15 11:43:50 -08:00
Johannes Doerfert	158aa99d39	[OpenMP][NFC] Introduce helper functions to hide casts and such Differential Revision: https://reviews.llvm.org/D140719	2023-01-15 11:43:50 -08:00
Ron Lieberman	d179dfe8ce	fix : add missing open brace [OpenMP][FIX] Avoid using an Error object after a std::move.	2023-01-09 19:52:16 -05:00
Johannes Doerfert	bdcbf6c85d	[OpenMP][FIX] Avoid using an Error object after a std::move. The error was always a success even if the error case happened as the std::move reseted the error object.	2023-01-09 16:03:52 -08:00
Joseph Huber	2d588461bc	[Libomptarget] Add more moves to expected conversion Summary: Fixes other instances of the same problem in the previous patch.	2023-01-06 09:09:45 -06:00
Joseph Huber	75c03596b8	[Libomptarget] Add move to expected conversion Summary: These implicit conversions from move-only types to expected seem to only work with newer compilers. This should hopefully fix it.	2023-01-06 09:09:45 -06:00
Johannes Doerfert	ccc1324120	Introduce environment variables to deal with JIT IR We can now dump the IR before and after JIT optimizations into the files passed via `LIBOMPTARGET_JIT_PRE_OPT_IR_MODULE` and `LIBOMPTARGET_JIT_POST_OPT_IR_MODULE`, respectively. Similarly, users can set `LIBOMPTARGET_JIT_REPLACEMENT_MODULE` to replace the IR in the image with a custom IR module in a file. All options take file paths, documentation was added. Reviewed by: tianshilei1992 Differential revision: https://reviews.llvm.org/D140945	2023-01-05 00:17:46 -08:00
Johannes Doerfert	5524952c14	[OpenMP][JIT][FIX] Create the default O0 pipeline for -O0	2023-01-03 17:07:52 -08:00
Johannes Doerfert	428bc510bf	[OpenMP] Unify "exec_mode" query code and default to SPMD Defaulting to Generic mode doesn't make much sense as the kernel needs to be prepared for it. SPMD mode is the "native" execution, e.g., for "bare" kernels. It also is the execution method for constructors and destructors (as we might otherwise throw an extra warp onto them). Differential Revision: https://reviews.llvm.org/D140718	2023-01-03 16:58:13 -08:00

1 2

78 Commits