clang-p2996

Author	SHA1	Message	Date
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Billy Zhu	6f6336858e	[MLIR][LLVM] Add DebugNameTableKind to DICompileUnit (#87974 ) Add the DebugNameTableKind field to DICompileUnit, along with its importer & exporter.	2024-04-09 06:18:07 -07:00
Matthias Braun	4a812b5912	Verify threadlocal_address constraints (#87841 ) Check invariants for `llvm.threadlocal.address` intrinsic in IR Verifier.	2024-04-08 17:47:57 -07:00
Billy Zhu	81a7b6454e	[MLIR][LLVM] Recursion importer handle repeated self-references (#87295 ) Followup to this discussion: https://github.com/llvm/llvm-project/pull/80251#discussion_r1535599920. The previous debug importer was correct but inefficient. For cases with mutual recursion that contain more than one back-edge, each back-edge would result in a new translated instance. This is because the previous implementation never caches any translated result with unbounded self-references. This means all translation inside a recursive context is performed from scratch, which will incur repeated run-time cost as well as repeated attribute sub-trees in the translated IR (differing only in their `recId`s). This PR refactors the importer to handle caching inside a recursive context. - In the presence of unbound self-refs, the translation result is cached in a separate cache that keeps track of the set of dependent unbound self-refs. - A dependent cache entry is valid only when all the unbound self-refs are in scope. Whenever a cached entry goes out of scope, it will be removed the next time it is looked up.	2024-04-08 01:09:54 -07:00
Fabian Mora	a2c4b7c8e2	[mlir] Add `convertInstruction` and `getSupportedInstructions` to `LLVMImportInterface` (#86799 ) This patch adds the `convertInstruction` and `getSupportedInstructions` to `LLVMImportInterface`, allowing any non-LLVM dialect to specify how to import LLVM IR instructions and overriding the default import of LLVM instructions.	2024-04-07 08:46:21 +02:00
Jan Leyonberg	9708d09003	[MLIR][OpenMP] Skip host omp ops when compiling for the target device (#85239 ) This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code. This is an alternative approach to #84611, the new test in this PR was taken from there.	2024-04-05 09:25:28 -04:00
Tom Eccles	cc34ad91f0	[MLIR][OpenMP] Add cleanup region to omp.declare_reduction (#87377 ) Currently, by-ref reductions will allocate the per-thread reduction variable in the initialization region. Adding a cleanup region allows that allocation to be undone. This will allow flang to support reduction of arrays stored on the heap. This conflation of allocation and initialization in the initialization should be fixed in the future to better match the OpenMP standard, but that is beyond the scope of this patch.	2024-04-04 11:19:42 +01:00
Tom Eccles	099ecdf1ec	[mlir][OpenMP] map argument to reduction initialization region (#86979 ) The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification). Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region. We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument. This argument was never used previously because for most cases the initialized value depends only upon the type of the reduction, not on the original variable. It is needed now so that we can read the array extents for the local copy from the mold. Flang support for reductions on assumed shape arrays patch 2/3	2024-04-04 10:55:42 +01:00
Simon Camphausen	1f268092c7	[mlir][EmitC] Add support for pointer and opaque types to subscript op (#86266 ) For pointer types the indices are restricted to one integer-like operand. For opaque types no further restrictions are made.	2024-04-03 13:06:14 +02:00
Alex Voicu	ab7dba233a	[CodeGen][LLVM] Make the `va_list` related intrinsics generic. (#85460 ) Currently, the builtins used for implementing `va_list` handling unconditionally take their arguments as unqualified `ptr`s i.e. pointers to AS 0. This does not work for targets where the default AS is not 0 or AS 0 is not a viable AS (for example, a target might choose 0 to represent the constant address space). This patch changes the builtins' signature to take generic `anyptr` args, which corrects this issue. It is noisy due to the number of tests affected. A test for an upstream target which does not use 0 as its default AS (SPIRV for HIP device compilations) is added as well.	2024-03-27 11:41:34 +00:00
Victor Perez	77cbc9bf60	[MLIR][LLVM] Add `llvm.experimental.constrained.fptrunc` operation (#86260 ) Add operation mapping to the LLVM `llvm.experimental.constrained.fptrunc.*` intrinsic. The new operation implements the new `LLVM::FPExceptionBehaviorOpInterface` and `LLVM::RoundingModeOpInterface` interfaces. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-03-26 11:02:50 +01:00
Tobias Gysi	aeeb7d566c	[MLIR][LLVM] Make subprogram flags optional (#86433 ) This revision makes the subprogramFlags field in the DISubprogrammAttr optional. This is necessary since the DISubprogram attached to a declaration may have none of the subprogram flags set.	2024-03-25 12:19:22 +01:00
agozillon	8612fa0d84	[MLIR][OpenMP] Refactor bounds offsetting and fix to apply to all directives (#84349 ) This PR refactors bounds offsetting by combining the two differing implementations (one applying to initial derived type member map implementation for descriptors and the other for regular arrays, effectively allocatable array vs regular array in fortran) now that it's a little simpler to do. The PR also moves the utilization of createAlteredByCaptureMap into genMapInfoOp, where it will be correctly applied to all MapInfoData, appropriately offsetting and altering Pointer data set in the kernel argument structure on the host. This primarily means bounds offsets will now correctly apply to enter/exit/update map clauses as opposed to just the Target directive that is currently the case. A few fortran runtime tests have been added to verify this new behavior. This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and is an extraction of the larger derived type member map PR stack (so a requirement for it to land).	2024-03-22 15:32:39 +01:00
Tobias Gysi	adda597388	[MLIR] Add index bitwidth to the DataLayout (#85927 ) When importing from LLVM IR the data layout of all pointer types contains an index bitwidth that should be used for index computations. This revision adds a getter to the DataLayout that provides access to the already stored bitwidth. The function returns an optional since only pointer-like types have an index bitwidth. Querying the bitwidth of a non-pointer type returns std::nullopt. The new function works for the built-in Index type and, using a type interface, for the LLVMPointerType.	2024-03-21 09:07:57 +01:00
Christian Ulmann	4095a326c0	[MLIR][LLVM] Add extraData field to the DIDerivedType attribute (#85935 ) This commit extends the DIDerivedTypeAttr with the `extraData` field. For now, the type of it is limited to be a `DINodeAttr`, as extending the debug metadata handling to support arbitrary metadata nodes does not seem to be necessary so far.	2024-03-20 16:08:38 +01:00
Sergio Afonso	d84252e064	[MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393 ) This patch proposes the renaming of certain OpenMP dialect operations with the goal of improving readability and following a uniform naming convention for MLIR operations and associated classes. In particular, the following operations are renamed: - `omp.map_info` -> `omp.map.info` - `omp.target_update_data` -> `omp.target_update` - `omp.ordered_region` -> `omp.ordered.region` - `omp.cancellationpoint` -> `omp.cancellation_point` - `omp.bounds` -> `omp.map.bounds` - `omp.reduction.declare` -> `omp.declare_reduction` Also, the following MLIR operation classes have been renamed: - `omp::TaskLoopOp` -> `omp::TaskloopOp` - `omp::TaskGroupOp` -> `omp::TaskgroupOp` - `omp::DataBoundsOp` -> `omp::MapBoundsOp` - `omp::DataOp` -> `omp::TargetDataOp` - `omp::EnterDataOp` -> `omp::TargetEnterDataOp` - `omp::ExitDataOp` -> `omp::TargetExitDataOp` - `omp::UpdateDataOp` -> `omp::TargetUpdateOp` - `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp` - `omp::WsLoopOp` -> `omp::WsloopOp`	2024-03-20 11:19:38 +00:00
Tom Eccles	27534d69e2	[mlir][LLVM] erase call mappings in forgetMapping() (#84955 ) It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when in-lining a region containing call operations multiple times. OpenMP array reductions 4/6 Previous PR: https://github.com/llvm/llvm-project/pull/84954 Next PR: https://github.com/llvm/llvm-project/pull/84957	2024-03-20 10:03:26 +00:00
Billy Zhu	c242302492	[MLIR][LLVM] DI Recursive Type fix for recursion via scope of composites (#85850 ) Fixes this bug for the previous recursive DI type PR: https://github.com/llvm/llvm-project/pull/80251#issuecomment-2007254788. Drawing inspiration from how clang uses DIBuilder to build forward decls, this PR changes how placeholders are created & updated. Instead of requiring each recursive DIType to do in-place mutation, we simply ask for a temporary node as the placeholder, and run RAUW at the end when the concrete node is translated. This has the side effect of simplifying what's needed to add recursion support for a type. Now only one additional method needs to be created for exporting. Concretely, for this PR, `translateImpl` for DICompositeType is back to the state it was before the previous PR, and the only net addition for DICompositeType is `translateTemporaryImpl`. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2024-03-19 22:49:50 -07:00
Matthias Gehre	5cc228148e	TranslateToCpp: Emit floating point literals with suffix (#85392 ) Emits `2.0e+00f` instead of `(float)2.0e+00`. This helps consumers of the emitted code, especially when there are large numbers of floating point literals, to have a simple AST.	2024-03-18 09:58:57 +01:00
Billy Zhu	1e8dad3bef	[MLIR][LLVM] Support Recursive DITypes (#80251 ) Following the discussion from [this thread](https://discourse.llvm.org/t/handling-cyclic-dependencies-in-debug-info/67526/11), this PR adds support for recursive DITypes. This PR adds: 1. DIRecursiveTypeAttrInterface: An interface that DITypeAttrs can implement to indicate that it supports recursion. See full description in code. 2. Importer & exporter support (The only DITypeAttr that implements the interface is DICompositeTypeAttr, so the exporter is only implemented for composites too. There will be two methods that each llvm DI type that supports mutation needs to implement since there's nothing general). --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2024-03-15 09:58:25 -07:00
Matthias Gehre	01a31cee56	[MLIR] EmitC: Add subscript operator (#84783 ) Introduces a SubscriptOp that allows to write IR like ``` func.func @load_store(%arg0: !emitc.array<4x8xf32>, %arg1: !emitc.array<3x5xf32>, %arg2: index, %arg3: index) { %0 = emitc.subscript %arg0[%arg2, %arg3] : <4x8xf32>, index, index %1 = emitc.subscript %arg1[%arg2, %arg3] : <3x5xf32>, index, index emitc.assign %0 : f32 to %1 : f32 return } ``` which gets translated into the C++ code ``` v1[v2][v3] = v0[v1][v2]; ``` To make this happen, this - adds the SubscriptOp - allows the subscript op as rhs of emitc.assign - updates the emitter to print SubscriptOps The emitter prints emitc.subscript in a delayed fashing to allow it being used as lvalue. I.e. while processing ``` %0 = emitc.subscript %arg0[%arg2, %arg3] : <4x8xf32>, index, index ``` it will not emit any text, but record in the `valueMapper` that the name for `%0` is `v0[v1][v2]`, see `CppEmitter::getSubscriptName`. Only when that result is then used (here in `emitc.assign`), that name is inserted into the text.	2024-03-15 11:08:34 +01:00
Zahi Moudallal	8481fb1698	[MLIR][ROCDL] Fix BallotOp LLVM translation and add doc (#85116 ) This modifies the return type of the intrinsic call to handle 32 and 64 bits properly and document the MLIR operation.	2024-03-14 08:43:48 -07:00
Marius Brehler	071f72a8ec	[mlir][Target][Cpp] Remove unused dialects (#85102 ) Removes linking and registering dialects that are not support any more.	2024-03-14 07:26:57 +01:00
Tom Eccles	f46f5a01f4	[flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304 ) Previously reduction variables were always passed by value into and out of the initialization and combiner regions of the OpenMP reduction declare operation. This worked well for reductions of primitive types (and might perform better than passing by reference). But passing by reference will be useful for array and derived type reductions (e.g. to move allocation inside of the init region). Passing reductions by reference requires different LLVM-IR generation when lowering from MLIR because some of the loads/stores/allocations will now be moved inside of the init and combiner regions. This alternate code generation is requested using a new attribute to omp.wsloop and omp.parallel. Existing lowerings from mlir are unaffected (these will continue to use the by-value argument passing. Flang will continue to pass by-value argument passing for trivial types unless a (hidden) command line argument is supplied. Non-trivial types will always use the by-ref lowering. Array reductions are not ready yet (but are coming very soon). In the meantime, this is tested by forcing existing reductions to use by-ref. Commit series for by-ref OpenMP reductions 3/3 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>	2024-03-13 14:51:09 +00:00
Zahi Moudallal	accfbf4e49	[MLIR][ROCDL] Add BallotOp and lit test (#84856 )	2024-03-12 17:07:16 -07:00
Marius Brehler	19266ca389	[mlir][EmitC] Add an `emitc.conditional` operator (#84883 ) This adds an `emitc.conditional` operation for the ternary conditional operator. Furthermore, this adds a converion from `arith.select` to the new op.	2024-03-12 11:27:26 +01:00
Matthias Gehre	818af71b72	[mlir][emitc] Add ArrayType (#83386 ) This models a one or multi-dimensional C/C++ array. The type implements the `ShapedTypeInterface` and prints similar to memref/tensor: ``` %arg0: !emitc.array<1xf32>, %arg1: !emitc.array<10x20x30xi32>, %arg2: !emitc.array<30x!emitc.ptr<i32>>, %arg3: !emitc.array<30x!emitc.opaque<"int">> ``` It can be translated to a C array type when used as function parameter or as `emitc.variable` type.	2024-03-11 16:40:57 +01:00
Krzysztof Drewniak	b05c15259b	[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 ) On some architectures (currently gfx90a, gfx94, and gfx10*), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp	2024-03-11 10:06:49 -05:00
Tina Jung	0ddb122147	[mlir][emitc] Arith to EmitC conversion: constants (#83798 ) * Add a conversion from `arith.constant` to `emitc.constant`. * Drop the translation for `arith.constant`s.	2024-03-08 09:16:10 +01:00
Marius Brehler	df9be017b7	[mlir][EmitC] Add `unary_{minus,plus}` operators (#84329 ) This adds operations for the unary minus and the unary plus operator.	2024-03-08 08:34:56 +01:00
Kareem Ergawy	5c54f72901	[MLIR][OpenMP] Extend omp.private materialization support: `firstprivate` (#82164 ) Extends current support for delayed privatization during translation to LLVM IR. This adds support for one-block `firstprivate` `omp.private` ops.	2024-03-04 12:28:30 +01:00
Marius Brehler	7ac03e8a36	[mlir][EmitC] Add bitwise operators (#83387 ) This adds operations for bitwise operators. Furthermore, an UnaryOp class and a helper to print unary operations are introduced.	2024-03-01 13:21:11 +01:00
Marius Brehler	b81bb0e1d0	[mlir][EmitC] Add logical operators (#83123 ) This adds operations for the logical operators AND, NOT and OR.	2024-02-28 20:41:05 +01:00
Leandro Lupori	64422cf826	[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488 ) Use the new copyprivate list from omp.single to emit calls to __kmpc_copyprivate, during the creation of the single operation in OMPIRBuilder. This is patch 4 of 4, to add support for COPYPRIVATE in Flang. Original PR: https://github.com/llvm/llvm-project/pull/73128	2024-02-28 13:33:42 -03:00
Tobias Gysi	e39e30e952	[mlir][llvm] Fix access group translation (#83257 ) This commit fixes the translation of access group metadata to LLVM IR. Previously, it did not use a temporary metadata node to model the placeholder of the self-referencing access group nodes. This is dangerous since, the translation may produce a metadata list with a null entry that is later on changed changed with a self reference. At the same time, for example the debug info translation may create the same uniqued node, which after setting the self-reference the suddenly references the access group metadata. The commit avoids such breakages.	2024-02-28 14:45:18 +01:00
Kareem Ergawy	9d56be010c	[MLIR][OpenMP] Support basic materialization for `omp.private` ops (#81715 ) Adds basic support for materializing delayed privatization. So far, the restrictions on the implementation are: - Only `private` clauses are supported (`firstprivate` support will be added in a later PR).	2024-02-28 05:00:07 +01:00
Xiang Li	70a7b1e8df	Remove test since no test on --debug output. (#83189 )	2024-02-27 17:00:01 -05:00
Krzysztof Drewniak	563f414e04	[mlir][AMDGPU] Set uniform-work-group-size=true by default (#79077 ) GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choice. In the event it is needed, this commit adds `rocdl.uniform_work_group_size` as an attribute to be set on LLVM functions that can be used to override the default. In addition, add proper failure messages to translation	2024-02-27 12:35:48 -06:00
Xiang Li	f4fad827ca	[NFC] Add REQUIRES: asserts to limit the test to debug only. (#83145 )	2024-02-27 12:59:56 -05:00
Xiang Li	c11627c2f4	[MLIR][LLVM] Fix memory explosion when converting global variable bodies in ModuleTranslation (#82708 ) There is memory explosion when converting the body or initializer region of a large global variable, e.g. a constant array. For example, when translating a constant array of 100000 strings: llvm.mlir.global internal constant @cats_strings() {addr_space = 0 : i32, alignment = 16 : i64} : !llvm.array<100000 x ptr<i8>> { %0 = llvm.mlir.undef : !llvm.array<100000 x ptr<i8>> %1 = llvm.mlir.addressof @om_1 : !llvm.ptr<array<1 x i8>> %2 = llvm.getelementptr %1[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %3 = llvm.insertvalue %2, %0[0] : !llvm.array<100000 x ptr<i8>> %4 = llvm.mlir.addressof @om_2 : !llvm.ptr<array<1 x i8>> %5 = llvm.getelementptr %4[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %6 = llvm.insertvalue %5, %3[1] : !llvm.array<100000 x ptr<i8>> %7 = llvm.mlir.addressof @om_3 : !llvm.ptr<array<1 x i8>> %8 = llvm.getelementptr %7[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %9 = llvm.insertvalue %8, %6[2] : !llvm.array<100000 x ptr<i8>> %10 = llvm.mlir.addressof @om_4 : !llvm.ptr<array<1 x i8>> %11 = llvm.getelementptr %10[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %12 = llvm.insertvalue %11, %9[3] : !llvm.array<100000 x ptr<i8>> ... (ignore the remaining part) } where @om_1, @om_2, ... are string global constants. Each time an operation is converted to LLVM, a new constant is created. When it comes to llvm.insertvalue, a new constant array of 100000 elements is created and the old constant array (input) is not destroyed. This causes memory explosion. We observed that, on a system with 128 GB memory, the translation of 100000 elements got killed due to using up all the memory. On a system with 64 GB, 65536 elements was enough to cause the translation killed. There is a previous patch (https://reviews.llvm.org/D148487) which fix this issue but was reverted for https://github.com/llvm/llvm-project/issues/62802 The old patch checks generated constants and destroyed them if there is no use. But the check of use for the constant is too early, which cause the constant be removed before use. This new patch added a map was added a map to save expected use count for a constant. Then decrease when reach each use. And only erase the constant when the use count reach to zero With new patch, the repro in https://github.com/llvm/llvm-project/issues/62802 finished correctly.	2024-02-26 21:15:12 -05:00
Mehdi Amini	e5332482c6	[MLIR][Spirv] Use StringAttr for linkage_name (NFC) (#82953 ) std::string was used here, likely by mistake. The usual convention for attributes is to use StringAttr.	2024-02-25 20:47:08 -08:00
agozillon	dcf4ca558c	[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818 ) This patch seeks to add a mechanism to raise constant (not ConstantExpr or runtime/dynamic) sized allocations into the entry block for select functions that have been inserted into a list for processing. This processing occurs during the finalize call, after OutlinedInfo regions have completed. This currently has only been utilised for createOutlinedFunction, which is triggered for TargetOp generation in the OpenMP MLIR dialect lowering to LLVM-IR. This currently is required for Target kernels generated by createOutlinedFunction to avoid subsequent optimization passes doing some unintentional malformed optimizations for AMD kernels (unsure if it occurs for other vendors). If the allocas are generated inside of the kernel and are not in the entry block and are subsequently passed to a function this can lead to required instructions being erased or manipulated in a way that causes the kernel to run into a HSA access error. This fix is related to a series of problems found in: https://github.com/llvm/llvm-project/issues/74603 This problem primarily presents itself for Flang's HLFIR AssignOp currently, when utilised with a scalar temporary constant on the RHS and a descriptor type on the LHS. It will generate a call to a runtime function, wrap the RHS temporary in a newly allocated descriptor (an llvm struct), and pass both the LHS and RHS descriptor into the runtime function call. This will currently be embedded into the middle of the target region in the user entry block, which means the allocas are also embedded in the middle, which seems to pose issues when later passes are executed. This issue may present itself in other HLFIR operations or unrelated operations that generate allocas as a by product, but for the moment, this one test case is the only scenario I've found this problem. Perhaps this is not the appropriate fix, I am very open to other suggestions, I've tried a few others (at varying levels of the flang/mlir compiler flow), but this one is the smallest and least intrusive change set. The other two, that come to mind (but I've not fully looked into, the former I tried a little with blocks but it had a few issues I'd need to think through): - Having a proper alloca only block (or region) generated for TargetOps that we could merge into the entry block that's generated by convertTarget's createOutlinedFunction. - Or diverging a little from Clang's current target generation and using the CodeExtractor to generate the user code as an outlined function region invoked from the kernel we make, with our kernel arguments passed into it. Similar to the current parallel generation. I am not sure how well this would intermingle with the existing parallel generation though that's layered in. Both of these methods seem like quite a divergence from the current status quo, which I am not entirely sure is merited for the small test this change aims to fix.	2024-02-23 22:59:41 +01:00
Tobias Gysi	335d34d9ea	[MLIR][LLVM] Fix debug intrinsic import (#82637 ) This revision handles the case that the translation of a scope fails due to cyclic metadata. This mainly affects the import of debug intrinsics that indirectly take such a scope as metadata argument (e.g. via local variable or label metadata). This commit ensures we drop intrinsics with such a dependency on cyclic metadata.	2024-02-23 10:30:19 +01:00
Joseph Huber	cc374d8056	[OpenMP] Remove `register_requires` global constructor (#80460 ) Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.	2024-02-21 11:33:32 -06:00
Guray Ozen	b5d694ba14	[mlir][nvvm] Introduce `nvvm.barrier` OP (#81487 ) This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. I think we can deprecate `nvvm.barrier0` in favor of the more generic `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA) nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier id = 3 // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier id = 3 number_of_threads = %numberOfThreads ```	2024-02-14 08:28:45 +01:00
David Truby	be9f8ffd81	[mlir][flang][openmp] Rework wsloop reduction operations (#80019 ) This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>	2024-02-13 19:13:54 +00:00
Giuseppe Rossini	16140ff219	[mlir][ROCDL] Add synchronization primitives (#80888 ) This PR adds two LLVM intrinsics to MLIR: - llvm.amdgcn.s.setprio which sets the priority of a wave for the GPU scheduler - llvm.amdgcn.sched.barrier which sets a software barrier so that the scheduler cannot move instructions around	2024-02-13 12:29:49 -06:00
Rishi Surendran	fa6850a998	[mlir][nvvm]Add support for grid_constant attribute on LLVM function arguments (#78228 ) Add support for attribute nvvm.grid_constant on LLVM function arguments. The attribute can be attached only to arguments of type llvm.ptr that have llvm.byval attribute. Generate LLVM metadata for functions with nvvm.grid_constant arguments. The metadata node is a list of integers, where each integer n denotes that the nth parameter has the grid_constant annotation (numbering from 1). The generated metadata node will be handled by NVVM compiler. See https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties for documentation on grid_constant property. This patch also adds convertParameterAttr to LLVMTranslationDialectInterface for supporting the translation of derived dialect attributes on function parameters	2024-02-12 13:16:59 -08:00
David Truby	9ecf4d20bb	[mlir][flang][openmp] Rework parallel reduction operations (#79308 ) This patch reworks the way that parallel reduction operations function to better match the expected semantics from the OpenMP specification. Previously specific omp.reduction operations were used inside the region, meaning that the reduction only applied when the correct operation was used, whereas the specification states that any change to the variable inside the region should be taken into account for the reduction. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. This patch only makes the change for the `parallel` operation, the change for the `wsloop` operation will be in a separate patch. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>	2024-02-12 17:19:49 +00:00
Benjamin Maxwell	413e82a087	[mlir][ArmSVE] Add intrinsics for the SME2 multi-vector zips (#80985 ) These are added to the ArmSVE dialect for consistency with LLVM, which registers SME2 intrinsics that don't require ZA under SVE.	2024-02-09 13:33:09 +00:00

1 2 3 4 5 ...

882 Commits