clang-p2996

Author	SHA1	Message	Date
Krzysztof Parzyszek	f4fc2d731c	[flang][OpenMP] Map ByRef if size/alignment exceed that of a pointer (#130832 ) Improve the check for whether a type can be passed by copy. Currently, passing by copy is done via the OMP_MAP_LITERAL mapping, which can only transfer as much data as can be contained in a pointer representation.	2025-03-12 19:41:11 -05:00
Krzysztof Parzyszek	d67947162f	[flang][OpenMP] Implement HAS_DEVICE_ADDR clause (#128568 ) The HAS_DEVICE_ADDR indicates that the object(s) listed exists at an address that is a valid device address. Specifically, `has_device_addr(x)` means that (in C/C++ terms) `&x` is a device address. When entering a target region, `x` does not need to be allocated on the device, or have its contents copied over (in the absence of additional mapping clauses). Passing its address verbatim to the region for use is sufficient, and is the intended goal of the clause. Some Fortran objects use descriptors in their in-memory representation. If `x` had a descriptor, both the descriptor and the contents of `x` would be located in the device memory. However, the descriptors are managed by the compiler, and can be regenerated at various points as needed. The address of the effective descriptor may change, hence it's not safe to pass the address of the descriptor to the target region. Instead, the descriptor itself is always copied, but for objects like `x`, no further mapping takes place (as this keeps the storage pointer in the descriptor unchanged). --------- Co-authored-by: Sergio Afonso <safonsof@amd.com>	2025-03-10 08:11:01 -05:00
agozillon	f1178815d2	[Flang][OpenMP][MLIR] Implement close, present and ompx_hold modifiers for Flang maps (#129586 ) This PR adds an initial implementation for the map modifiers close, present and ompx_hold, primarily just required adding the appropriate map type flags to the map type bits. In the case of ompx_hold it required adding the map type to the OpenMP dialect. Close has a bit of a problem when utilised with the ALWAYS map type on descriptors, so it is likely we'll have to make sure close and always are not applied to the descriptor simultaneously in the future when we apply always to the descriptors to facilitate movement of descriptor information to device for consistency, however, we may find an alternative to this with further investigation. For the moment, it is a TODO/Note to keep track of it.	2025-03-07 22:22:30 +01:00
Akash Banerjee	785a5b4676	[MLIR][OpenMP] Add LLVM translation support for OpenMP UserDefinedMappers (#124746 ) This patch adds OpenMPToLLVMIRTranslation support for the OpenMP Declare Mapper directive. Since both MLIR and Clang now support custom mappers, I've changed the respective function params to no longer be optional as well. Depends on #121005	2025-02-18 17:55:48 +00:00
Joseph Huber	1435c8ed95	Reapply "[LinkerWrapper] Clean up options after proper forwarding" (#126495 ) Summary: The test failed because it no longer passed Rpass by default without LTO. I think that's desirable as it matches the standard behavior. This reverts commit `6fd99de318`.	2025-02-14 09:56:46 -06:00
Ethan Luis McDonough	52ee06d273	[PGO][Offload] Fix pgo1.c (#126864 ) pgo1.c had outdated test checks	2025-02-12 00:54:31 -06:00
Ethan Luis McDonough	9e5c136d5a	[PGO][Offload] Profile profraw generation for GPU instrumentation #76587 (#93365 ) This pull request is the second part of an ongoing effort to extends PGO instrumentation to GPU device code and depends on #76587. This PR makes the following changes: - Introduces `__llvm_write_custom_profile` to PGO compiler-rt library. This is an external function that can be used to write profiles with custom data to target-specific files. - Adds `__llvm_write_custom_profile` as weak symbol to libomptarget so that it can write the collected data to a profraw file. - Adds `PGODump` debug flag and only displays dump when the aforementioned flag is set	2025-02-11 23:30:54 -06:00
Christian Clauss	1f56bb3137	[Offload][NFC] Fix typos discovered by codespell (#125119 ) https://github.com/codespell-project/codespell % `codespell --ignore-words-list=archtype,hsa,identty,inout,iself,nd,te,ths,vertexes --write-changes`	2025-01-31 09:35:29 -06:00
agozillon	2428b6ec40	[Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771 ) Currently if we generate code for the below target data map that uses an optional mapping: !$omp target data if(present(a)) map(alloc:a) do i = 1, 10 a(i) = i end do !$omp end target data We yield an LLVM-IR error as the branch for the else path is not generated. This occurs because we enter the NoDupPriv path of the call back function when generating the else branch, however, the emitBranch function needs to be set to a block for it to functionally generate and link in a follow up branch. The NoDupPriv path currently doesn't do this, while it's not supposed to generate anything (as far as I am aware) we still need to at least set the builders placement back so that it emits the appropriate follow up branch. This avoids the missing terminator LLVM-IR verification error by correctly generating the follow up branch.	2025-01-30 17:33:36 +01:00
agozillon	e0054e984c	[MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604 ) This PR aims to fix a mapping error when trying to map nullary elements of a record type (primary example is allocatables/pointer types in Fortran at the moment). This should be legal to map, just not write to without pointing to anything within the target region. A common Fortran OpenMP idiom/example where this is useful can be found in the added Fortran offload example. The runtime error arises when we try to map the pointer member utilising a prescribed constant size that we receive from the lowered type, resulting in mapping of data that will be non-existent when there is no allocated data. The fix in this case is to emit a runtime check to see if the data has been allocated, if it hasn't been we select a size of 0, if it has we emit the usual type size.	2025-01-29 17:51:33 +01:00
Joseph Huber	13dcc95dcd	[Offload] Rework offloading entry type to be more generic (#124018 ) Summary: The previous offloading entry type did not fit the current use-cases very well. This widens it and adds a version to prevent further annoyances. It also includes the kind to better sort who's using it. The first 64-bytes are reserved as zero so the OpenMP runtime can detect the old format for binary compatibilitry.	2025-01-28 07:26:13 -06:00
Jinsong Ji	8d1d67ec4d	[Offload][PGO] Fix dump of array in ProfData (#122039 ) Exposed by -Warray-bounds: In file included from ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:252: ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: error: array index 4 is past the end of the array (that has type 'const std::remove_const<const uint16_t>::type[4]' (aka 'const unsigned short[4]')) [-Werror,-Warray-bounds] 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ~~~~~~~~~~~ ../../../../../../../llvm/offload/plugins-nextgen/common/src/GlobalHandler.cpp:250:15: note: expanded from macro 'INSTR_PROF_DATA' 250 \| outs() << ProfData.Name << " "; \ \| ^ ~~~~ ../../../../../../../llvm/llvm/include/llvm/ProfileData/InstrProfData.inc:109:1: note: array 'NumValueSites' declared here 109 \| INSTR_PROF_DATA(const uint16_t, Int16ArrayTy, NumValueSites[IPVK_Last+1], \ \| ^ ../../../../../../../llvm/offload/plugins-nextgen/common/include/GlobalHandler.h:62:3: note: expanded from macro 'INSTR_PROF_DATA' 62 \| std::remove_const<Type>::type Name; Avoid accessing out-of-bound data, but skip printing array data for now. As there is no simple way to do this without hardcoding the NumValueSites field. --------- Co-authored-by: Ethan Luis McDonough <ethanluismcdonough@gmail.com>	2025-01-14 15:46:27 -05:00
agozillon	fa56e8bb64	[OpenMP][MLIR] Fix threadprivate lowering when compiling for target when target operations are in use (#119310 ) Currently the compiler will ICE in programs like the following on the device lowering pass: ``` program main implicit none type i1_t integer :: val(1000) end type i1_t integer :: i type(i1_t), pointer :: newi1 type(i1_t), pointer :: tab=>null() integer, dimension(:), pointer :: tabval !$omp THREADPRIVATE(tab) allocate(newi1) tab=>newi1 tab%val(:)=1 tabval=>tab%val !$omp target teams distribute parallel do do i = 1, 1000 tabval(i) = i end do !$omp end target teams distribute parallel do end program main ``` This is due to the fact that THREADPRIVATE returns a result operation, and this operation can actually be used by other LLVM dialect (or other dialect) operations. However, we currently skip the lowering of threadprivate, so we effectively never generate and bind an LLVM-IR result to the threadprivate operation result. So when we later go on to lower dependent LLVM dialect operations, we are missing the required LLVM-IR result, try to access and use it and then ICE. The fix in this particular PR is to allow compilation of threadprivate for device as well as host, and simply treat the device compilation as a no-op, binding the LLVM-IR result of threadprivate with no alterations and binding it, which will allow the rest of the compilation to proceed, where we'll eventually discard the host segment in any case. The other possible solution to this I can think of, is doing something similar to Flang's passes that occur prior to CodeGen to the LLVM dialect, where they erase/no-op certain unrequired operations or transform them to lower level series of operations. And we would erase/no-op threadprivate on device as we'd never have these in target regions. The main issues I can see with this are that we currently do not specialise this stage based on wether we're compiling for device or host, so it's setting a precedent and adding another point of having to understand the separation between target and host compilation. I am also not sure we'd necessarily want to enforce this at a dialect level incase someone else wishes to add a different lowering flow or translation flow. Another possible issue is that a target operation we have/utilise would depend on the result of threadprivate, meaning we'd not be allowed to entirely erase/no-op it, I am not sure of any situations where this may be an issue currently though.	2025-01-03 18:01:01 +01:00
agozillon	5137c209f0	[Flang][OpenMP] Fix allocating arrays with size intrinisic (#119226 ) Attempt to address the following example from causing an assert or ICE: ``` subroutine test(a) implicit none integer :: i real(kind=real64), dimension(:) :: a real(kind=real64), dimension(size(a, 1)) :: b !$omp target map(tofrom: b) do i = 1, 10 b(i) = i end do !$omp end target end subroutine ``` Where we utilise a Fortran intrinsic (size) to calculate the size of allocatable arrays and then map it to device.	2025-01-03 16:46:15 +01:00
Kareem Ergawy	e532241b02	Re-apply (#117867 ): [flang][OpenMP] Implicitly map allocatable record fields (#120374 ) This re-applies #117867 with a small fix that hopefully prevents build bot failures. The fix is avoiding `dyn_cast` for the result of `getOperation()`. Instead we can assign the result to `mlir::ModuleOp` directly since the type of the operation is known statically (`OpT` in `OperationPass`).	2024-12-18 09:19:45 +01:00
Kareem Ergawy	dc936f3c19	Revert "[flang][OpenMP] Implicitly map allocatable record fields (#117867 )" (#120360 )	2024-12-18 06:52:24 +01:00
Kareem Ergawy	db09014a07	[flang][OpenMP] Implicitly map allocatable record fields (#117867 ) This is a starting PR to implicitly map allocatable record fields. This PR contains the following changes: 1. Re-purposes some of the utils used in `Lower/OpenMP.cpp` so that these utils work on the `mlir::Value` level rather than the `semantics::Symbol` level. This takes one step towards to enabling MLIR passes to more easily do some lowering themselves (e.g. creating `omp.map.bounds` ops for implicitely caputured data like this PR does). 2. Adds support for implicitely capturing and mapping allocatable fields in record types. There is quite some distant to still cover to have full support for this. I added a number of todos to guide further development. Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com> Co-authored-by: Andrew Gozillon <andrew.gozillon@amd.com>	2024-12-18 05:37:58 +01:00
Shilei Tian	92376c3ff5	[Offload][OMPX] Add the runtime support for multi-dim grid and block (#118042 )	2024-12-06 09:07:50 -05:00
Joseph Huber	506ca19dc9	[OpenMP] Remove use of '__AMDGCN_WAVEFRONT_SIZE' (#113156 ) Summary: This is going to be deprecated in https://github.com/llvm/llvm-project/pull/112849. This patch ports it to use the builtin instead. This isn't a compile constant, so it could slightly negatively affect codegen. There really should be an IR pass to turn it into a constant if the function has known attributes. Using the builtin is correct when we just do it for knowing the size like we do here. Obviously guarding w32/w64 code with this check would be broken.	2024-11-25 07:38:28 -06:00
agozillon	3723449955	[OpenMP] Allocatable explicit member mapping fortran offloading tests (#113555 ) This PR is one in a series of 3 that aim to add support for explicit member mapping of allocatable components in derived types within OpenMP+Fortran for Flang. This PR provides all of the runtime tests that are currently upstreamable, unfortunately some of the other tests would require linking of the fortran runtime for offload which we currently do not do. But regardless, this is plenty to ensure that the mapping is working in most cases.	2024-11-16 12:22:33 +01:00
agozillon	d84d0caf28	[Flang][OpenMP] Update MapInfoFinalization to use BlockArgs Interface and modify use_device_ptr/addr to be order independent (#113919 ) This patch primarily updates the MapInfoFinalization pass to utilise the BlockArgument interface. It also shuffles newly added arguments the MapInfoFinalization passes to the end of the BlockArg/Relevant MapInfo lists, instead of one prior to the owning descriptor type. During this it was noted that the use_device_ptr/addr handling of target data was a little bit too order dependent so I've attempted to make it less so, as we cannot depend on argument ordering to be the same as Fortran for any future frontends.	2024-11-14 15:47:37 +01:00
Akash Banerjee	a67b6e1635	Fix typo in test.	2024-09-06 14:23:06 +01:00
Akash Banerjee	142433684a	[OpenMP][Flang] Fix dynamic-extent array mapping (#107247 ) This patch fixes the mapping and lowering of arrays with dynamic extents and adds a new test for the same. The fix discards the incomplete the dynamic extent information and replacing it with just the base type. When lowering to llvm later, the bounds information is used instead.	2024-09-05 12:44:10 +01:00
Jan Patrick Lehr	1a0cf245ac	[Offload] Change x86_64-pc-linux to x86_64-unknown-linux (#107023 ) It appears that the RUNTIMES build prefers the x86-64-unknown-linux-gnu triple notation for the host. This fixes runtime / test breakages when compiler-rt is used as the CLANG_DEFAULT_RTLIB.	2024-09-03 14:25:33 +02:00
agozillon	f4cf93fb50	[Flang][OpenMP] Align map clause generation and fix issue with non-shared allocations for assumed shape/size descriptor types (#97855 ) This PR aims to unify the map argument generation behavior across both the implicit capture (captured in a target region) and the explicit capture (process map), currently the varPtr field of the MapInfo for the same variable will be different depending on how it's captured. This PR tries to align that across the generations of MapInfoOp in the OpenMP lowering. Currently, I have opted to utilise the rawInput (input memref to a HLFIR DeclareInfoOp) as opposed to the addr field which includes more information. The side affect of this is that we have to deal with BoxTypes less often, which will result in simpler maps in these cases. The negative side affect of this is that we don't have access to the bounds information through the resulting value, however, I believe the bounds information we require in our case is still appropriately stored in the map bounds, and this seems to be the case from testing so far. The other fix is for cases where we end up with a BoxType argument into a function (certain assumed shape and sizes cases do this) that has no fir.ref wrapping it. As we need the Box to be a reference type to actually utilise the operation to access the base address stored inside and create the correct mappings we currently generate an intermediate allocation in these cases, and then store into it, and utilise this as the map argument, as opposed to the original. However, as we were not sharing the same intermediate allocation across all of the maps for a variable, this resulted in errors in certain cases when detatching/attatching the data e.g. via enter and exit. This PR adjusts this for cases Currently we only maintain tracking of all intermediate allocations for the current function scope, as opposed to module. Primarily as the only case I am aware of that this is required is in cases where we pass certain types of arguments to functions (so I opted to minimize the overhead of the pass for now). It could likely be extended to module scope if required if we find other cases where it's applicable and causing issues.	2024-08-23 19:48:43 +02:00
Ethan Luis McDonough	fde2d23ee2	[PGO][OpenMP] Instrumentation for GPU devices (Revision of #76587 ) (#102691 ) This pull request is a revised version of #76587. This pull request fixes some build issues that were present in the previous version of this change. > This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: > > - Adds blank registration functions to device RTL > - Gives PGO globals protected visibility when targeting a supported GPU > - Handles any addrspace casts for PGO calls > - Implements PGO global extraction in GPU plugins (currently only dumps info) > > These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-08-22 01:10:54 -05:00
Johannes Doerfert	ff12c0061b	[Offload] Ensure to load images when the device is used (#103002 ) When we use the device, e.g., with an API that interacts with it, we need to ensure the image is loaded and the constructors are executed. Two tests are included to verify we 1) load images and run constructors when needed, and 2) we do so lazily only if the device is actually used. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-08-13 14:41:26 -07:00
Joseph Huber	dcc27ea41e	[LinkerWrapper] Always pass `-flto` if the linker supports it (#102972 ) Summary; Now that we use the linker to do LTO / device linking, we need to inform the `clang` invocation to use `-flto` so it forwards arguments like `-On` correctly.	2024-08-13 11:23:55 -05:00
Johannes Doerfert	3f9c9acedd	[Offload] Add the right paths to the CUDA lit tests (#102997 )	2024-08-12 20:34:33 -07:00
Johannes Doerfert	80525dfcde	[Offload][CUDA] Allow CUDA kernels to use LLVM/Offload (#94549 ) Through the new `-foffload-via-llvm` flag, CUDA kernels can now be lowered to the LLVM/Offload API. On the Clang side, this is simply done by using the OpenMP offload toolchain and emitting calls to `llvm` functions to orchestrate the kernel launch rather than `cuda` functions. These `llvm` functions are implemented on top of the existing LLVM/Offload API. As we are about to redefine the Offload API, this wil help us in the design process as a second offload language. We do not support any CUDA APIs yet, however, we could: https://www.osti.gov/servlets/purl/1892137 For proper host execution we need to resurrect/rebase https://tianshilei.me/wp-content/uploads/2021/12/llpp-2021.pdf (which was designed for debugging). ``` ❯❯❯ cat test.cu extern "C" { void llvm_omp_target_alloc_shared(size_t Size, int DeviceNum); void llvm_omp_target_free_shared(void DevicePtr, int DeviceNum); } __global__ void square(int A) { A = 42; } int main(int argc, char argv) { int DevNo = 0; int Ptr = reinterpret_cast<int >(llvm_omp_target_alloc_shared(4, DevNo)); Ptr = 7; printf("Ptr %p, Ptr %i\n", Ptr, Ptr); square<<<1, 1>>>(Ptr); printf("Ptr %p, Ptr %i\n", Ptr, Ptr); llvm_omp_target_free_shared(Ptr, DevNo); } ❯❯❯ clang++ test.cu -O3 -o test123 -foffload-via-llvm --offload-arch=native ❯❯❯ llvm-objdump --offloading test123 test123: file format elf64-x86-64 OFFLOADING IMAGE [0]: kind elf arch gfx90a triple amdgcn-amd-amdhsa producer openmp ❯❯❯ LIBOMPTARGET_INFO=16 ./test123 Ptr 0x155448ac8000, Ptr 7 Ptr 0x155448ac8000, Ptr 42 ```	2024-08-12 17:44:58 -07:00
Joseph Huber	363c1e6e51	[OpenMP] Re-enable test after correctly forwarding `mllvm`	2024-07-29 10:06:54 -05:00
Johannes Doerfert	3c8efd7928	[OpenMP] Ensure the actual kernel is annotated with launch bounds (#99927 ) In debug mode there is a wrapper (the kernel) around the function in which we generate the kernel code. We worked around this before to get the correct kernel name, but now we really distinguish both to attach the launch bounds to the kernel, not the inner function.	2024-07-23 09:02:47 -07:00
Joseph Huber	4854e25359	[Offload] Re-enable tests that are now passing Summary: Some recent patches made these stop failing so the XFAIL now makes the bots go red. Fixes https://github.com/llvm/llvm-project/issues/98903	2024-07-23 10:56:55 -05:00
Pranav Bhandarkar	d7e185cca9	[OMPIRBuilder] - Handle dependencies in `createTarget` (#93977 ) This patch handles dependencies specified by the `depend` clause on an OpenMP target construct. It does this much the same way clang does it by materializing an OpenMP `task` that is tagged with the dependencies. The following functions are relevant to this patch - 1) `createTarget` - This function itself is largely unchanged except that it now accepts a vector of `DependData` objects that it simply forwards to `emitTargetCall` 2) `emitTargetCall` - This function has changed now to check if an outer target-task needs to be materialized (i.e if `target` construct has `nowait` or has `depend` clause). If yes, it calls `emitTargetTask` to do all the heavy lifting for creating and dispatching the task. 3) `emitTargetTask` - Bulk of the change is here. See the large comment explaining what it does at the beginning of this function	2024-07-22 10:56:45 -05:00
Jan Patrick Lehr	4ed0f84d38	[Offload] XFAIL four tests while working on fix (#98899 ) omp_dynamic_shared_memory_mixed_amdgpu.c omp_dynamic_shared_memory_amdgpu.c amdgcn-amd-amdhsa::bug51982.c amdgcn-amd-amdhsa::bug51781.c	2024-07-15 15:45:59 +02:00
Jinsong Ji	6556ba66b2	[Offload][test]Fix typo of requires (#98327 ) Typos in `8823448807`.	2024-07-10 10:51:47 -04:00
Gheorghe-Teodor Bercea	1a478a69bc	[OpenMP][offload] Fix dynamic schedule tracking (#97065 ) This patch fixes the dynamic schedule tracking.	2024-07-01 10:23:11 -04:00
Ethan Luis McDonough	8823448807	[Offload] Refactor offload test requirements (#95196 ) Many tests in the `offload` project have requirements defined by which targets are not supported rather than which platforms are supported. This patch aims to streamline the requirement definitions by adding four new feature tags: `host`, `gpu`, `amdgpu`, and `nvidiagpu`.	2024-06-29 00:56:18 -05:00
Ethan Luis McDonough	2c8b912f63	Revert "[PGO][OpenMP] Instrumentation for GPU devices (#76587 )" This reverts commit `5fd2af38e4`. It caused build issues and broke the buildbot.	2024-06-28 12:30:45 -05:00
Ethan Luis McDonough	5fd2af38e4	[PGO][OpenMP] Instrumentation for GPU devices (#76587 ) This pull request is the first part of an ongoing effort to extends PGO instrumentation to GPU device code. This PR makes the following changes: - Adds blank registration functions to device RTL - Gives PGO globals protected visibility when targeting a supported GPU - Handles any addrspace casts for PGO calls - Implements PGO global extraction in GPU plugins (currently only dumps info) These changes can be tested by supplying `-fprofile-instrument=clang` while targeting a GPU.	2024-06-28 10:42:19 -05:00
agozillon	aec735cf47	[Flang][OpenMP][MLIR] Fix common block mapping for regular and declare target link (#91829 ) This PR attempts to fix common block mapping for regular mapping of these types as well as when they have been marked as "declare target link". This PR should allow correct mapping of both the members of a common block and the full common block via its block symbol. The main changes were some adjustments to the Fortran OpenMP lowering to HLFIR/FIR, the lowering of the LLVM+OpenMP dialect to LLVM-IR and adjustments to the way the we handle target kernel map argument rebinding inside of the OMPIRBuilder. For the Fortran OpenMP lowering were two changes, one to prevent the implicit capture of common block members when the common block symbol itself has been marked and the other creates intermediate member access inside of the target region to be used in-place of those external to the target region, this prevents external usages breaking the IsolatedFromAbove pact. In the latter case, there was an adjustment to the size calculation for types to better handle cases where we pass an array as the type of a map (as opposed to the bounds and the type of the element), which occurs in the case of common blocks. There is also some adjustment to how handleDeclareTargetMapVar handles renaming of declare target symbols in the module to the reference pointer, now it will only apply to those within the kernel that is currently being generated and we also perform a modification to replace constants with instructions as necessary as we cannot replace these with our reference pointer (non-constant and constants do not mix nicely). In the case of the OpenMPIRBuilder some changes were made to defer global symbol rebinding to kernel arguments until all other arguments have been rebound. This makes sure we do not replace uses that may refer to the global (e.g. a GEP) but are themselves actually a separate argument that needs bound. Currently "declare target to" still needs some work, but this may be the case for all types in conjunction with "declare target to" at the moment.	2024-06-25 20:54:04 +02:00
Tim Gymnich	597d2f7662	[OpenMP] Add Environment Variable to disable Reuse of Blocks for High Loop Trip Counts (#89239 ) Sometimes it might be beneficial to spawn more thread blocks instead of reusing existing for multiple loop iterations. Alternatives considered: Make `DefaultNumBlocks` settable via an environment variable. --------- Co-authored-by: Joseph Huber <huberjn@outlook.com>	2024-06-14 07:35:23 -07:00
agozillon	0aeaa2d93d	[OMPIRBuilder][OpenMP][LLVM] Modify and use ReplaceConstant utility in convertTarget (#94541 ) This PR seeks to expand/replace the Constant -> Instruction conversion that needs to occur inside of the OpenMP Target kernel generation to allow kernel argument replacement of uses within the kernel (cannot replace constant uses within constant expressions with non-constants). It does so by making use of the new-ish utility convertUsersOfConstantsToInstructions which is a much more expansive version of what the smaller "version" of the function I wrote does, effectively expanding uses of the input argument that are constant expressions into instructions so that we can replace with the appropriate kernel argument. Also alters convertUsersOfConstantsToInstructions to optionally restrict the replacement to a function and optionally leave dead constants alone, the latter is necessary when lowering from MLIR as we cannot be sure we can remove the constants at this stage, even if rewritten to instructions the ModuleTranslation may maintain links to the original constants and utilise them in further lowering steps (as when we're lowering the kernel, the module is still in the process of being lowered). This can result in unusual ICEs later. These dead constants can be tidied up later (and appear to be in subsequent lowering from checking with emit-llvm).	2024-06-13 15:57:15 +02:00
Joseph Huber	2cc1644299	[Offload] Fix missing `abs` function for test Summary: We don't have the abs function to link against, just use the builtin.	2024-06-06 14:37:06 -05:00
Shilei Tian	4cd115ca34	[OpenMP][OMPX] Fix ompx_ballot_sync test (#94140 ) The current test is not really correct because the mask is set to 0xffffffff even if it is on an AMDGPU whose wavefront size is 64. Besides, `__AMDGCN_WAVEFRONT_SIZE` is not set on host compilation so the verification happens to work.	2024-06-03 11:18:26 -04:00
Shilei Tian	b448efb8ea	Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" (#94139 )	2024-06-03 11:17:36 -04:00
Shilei Tian	cf9eeb67e5	Revert "Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )"" This reverts commit `7b48655822`.	2024-05-26 01:04:39 -04:00
Shilei Tian	7b48655822	Reapply "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" This reverts commit `9b31cc71d6`.	2024-05-26 00:57:50 -04:00
Joseph Huber	9b31cc71d6	Revert "[OpenMP][OMPX] Add shfl_down_sync (#93311 )" This reverts commit `098c6dfa81`. This reverts commit `8c718a3a91`. This reverts commit `4fb02de9d4`.	2024-05-24 19:07:53 -05:00
Shilei Tian	4fb02de9d4	[OpenMP][OMPX] Add shfl_down_sync (#93311 )	2024-05-24 14:00:43 -04:00

1 2

59 Commits