clang-p2996

Author	SHA1	Message	Date
Joseph Huber	470aefb240	[Offload][NFC] Remove `omp_` prefix from offloading entries (#88071 ) Summary: These entires are generic for offloading with the new driver now. Having the `omp` prefix was a historical artifact and is confusing when used for CUDA. This patch just renames them for now, future patches will rework the binary format to make it more common.	2024-04-09 15:50:15 -05:00
Billy Zhu	6f6336858e	[MLIR][LLVM] Add DebugNameTableKind to DICompileUnit (#87974 ) Add the DebugNameTableKind field to DICompileUnit, along with its importer & exporter.	2024-04-09 06:18:07 -07:00
Matthias Braun	4a812b5912	Verify threadlocal_address constraints (#87841 ) Check invariants for `llvm.threadlocal.address` intrinsic in IR Verifier.	2024-04-08 17:47:57 -07:00
Billy Zhu	81a7b6454e	[MLIR][LLVM] Recursion importer handle repeated self-references (#87295 ) Followup to this discussion: https://github.com/llvm/llvm-project/pull/80251#discussion_r1535599920. The previous debug importer was correct but inefficient. For cases with mutual recursion that contain more than one back-edge, each back-edge would result in a new translated instance. This is because the previous implementation never caches any translated result with unbounded self-references. This means all translation inside a recursive context is performed from scratch, which will incur repeated run-time cost as well as repeated attribute sub-trees in the translated IR (differing only in their `recId`s). This PR refactors the importer to handle caching inside a recursive context. - In the presence of unbound self-refs, the translation result is cached in a separate cache that keeps track of the set of dependent unbound self-refs. - A dependent cache entry is valid only when all the unbound self-refs are in scope. Whenever a cached entry goes out of scope, it will be removed the next time it is looked up.	2024-04-08 01:09:54 -07:00
Fabian Mora	a2c4b7c8e2	[mlir] Add `convertInstruction` and `getSupportedInstructions` to `LLVMImportInterface` (#86799 ) This patch adds the `convertInstruction` and `getSupportedInstructions` to `LLVMImportInterface`, allowing any non-LLVM dialect to specify how to import LLVM IR instructions and overriding the default import of LLVM instructions.	2024-04-07 08:46:21 +02:00
Jan Leyonberg	9708d09003	[MLIR][OpenMP] Skip host omp ops when compiling for the target device (#85239 ) This patch separates the lowering dispatch for host and target devices. For the target device, if the current operation is not a top-level operation (e.g. omp.target) or is inside a target device code region it will be ignored, since it belongs to the host code. This is an alternative approach to #84611, the new test in this PR was taken from there.	2024-04-05 09:25:28 -04:00
Tom Eccles	cc34ad91f0	[MLIR][OpenMP] Add cleanup region to omp.declare_reduction (#87377 ) Currently, by-ref reductions will allocate the per-thread reduction variable in the initialization region. Adding a cleanup region allows that allocation to be undone. This will allow flang to support reduction of arrays stored on the heap. This conflation of allocation and initialization in the initialization should be fixed in the future to better match the OpenMP standard, but that is beyond the scope of this patch.	2024-04-04 11:19:42 +01:00
Tom Eccles	099ecdf1ec	[mlir][OpenMP] map argument to reduction initialization region (#86979 ) The argument to the initialization region of reduction declarations was never mapped. This meant that if this argument was accessed inside the initialization region, that mlir operation would be translated to an llvm operation with a null argument (failing verification). Adding the mapping ensures that the right LLVM value can be found when inlining and converting the initialization region. We have to separately establish and clean up these mappings for each use of the reduction declaration because repeated usage of the same declaration will inline it using a different concrete value for the block argument. This argument was never used previously because for most cases the initialized value depends only upon the type of the reduction, not on the original variable. It is needed now so that we can read the array extents for the local copy from the mold. Flang support for reductions on assumed shape arrays patch 2/3	2024-04-04 10:55:42 +01:00
Alex Voicu	ab7dba233a	[CodeGen][LLVM] Make the `va_list` related intrinsics generic. (#85460 ) Currently, the builtins used for implementing `va_list` handling unconditionally take their arguments as unqualified `ptr`s i.e. pointers to AS 0. This does not work for targets where the default AS is not 0 or AS 0 is not a viable AS (for example, a target might choose 0 to represent the constant address space). This patch changes the builtins' signature to take generic `anyptr` args, which corrects this issue. It is noisy due to the number of tests affected. A test for an upstream target which does not use 0 as its default AS (SPIRV for HIP device compilations) is added as well.	2024-03-27 11:41:34 +00:00
Victor Perez	77cbc9bf60	[MLIR][LLVM] Add `llvm.experimental.constrained.fptrunc` operation (#86260 ) Add operation mapping to the LLVM `llvm.experimental.constrained.fptrunc.*` intrinsic. The new operation implements the new `LLVM::FPExceptionBehaviorOpInterface` and `LLVM::RoundingModeOpInterface` interfaces. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-03-26 11:02:50 +01:00
Tobias Gysi	aeeb7d566c	[MLIR][LLVM] Make subprogram flags optional (#86433 ) This revision makes the subprogramFlags field in the DISubprogrammAttr optional. This is necessary since the DISubprogram attached to a declaration may have none of the subprogram flags set.	2024-03-25 12:19:22 +01:00
agozillon	8612fa0d84	[MLIR][OpenMP] Refactor bounds offsetting and fix to apply to all directives (#84349 ) This PR refactors bounds offsetting by combining the two differing implementations (one applying to initial derived type member map implementation for descriptors and the other for regular arrays, effectively allocatable array vs regular array in fortran) now that it's a little simpler to do. The PR also moves the utilization of createAlteredByCaptureMap into genMapInfoOp, where it will be correctly applied to all MapInfoData, appropriately offsetting and altering Pointer data set in the kernel argument structure on the host. This primarily means bounds offsets will now correctly apply to enter/exit/update map clauses as opposed to just the Target directive that is currently the case. A few fortran runtime tests have been added to verify this new behavior. This PR depends on: https://github.com/llvm/llvm-project/pull/84328 and is an extraction of the larger derived type member map PR stack (so a requirement for it to land).	2024-03-22 15:32:39 +01:00
Tobias Gysi	adda597388	[MLIR] Add index bitwidth to the DataLayout (#85927 ) When importing from LLVM IR the data layout of all pointer types contains an index bitwidth that should be used for index computations. This revision adds a getter to the DataLayout that provides access to the already stored bitwidth. The function returns an optional since only pointer-like types have an index bitwidth. Querying the bitwidth of a non-pointer type returns std::nullopt. The new function works for the built-in Index type and, using a type interface, for the LLVMPointerType.	2024-03-21 09:07:57 +01:00
Christian Ulmann	4095a326c0	[MLIR][LLVM] Add extraData field to the DIDerivedType attribute (#85935 ) This commit extends the DIDerivedTypeAttr with the `extraData` field. For now, the type of it is limited to be a `DINodeAttr`, as extending the debug metadata handling to support arbitrary metadata nodes does not seem to be necessary so far.	2024-03-20 16:08:38 +01:00
Sergio Afonso	d84252e064	[MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393 ) This patch proposes the renaming of certain OpenMP dialect operations with the goal of improving readability and following a uniform naming convention for MLIR operations and associated classes. In particular, the following operations are renamed: - `omp.map_info` -> `omp.map.info` - `omp.target_update_data` -> `omp.target_update` - `omp.ordered_region` -> `omp.ordered.region` - `omp.cancellationpoint` -> `omp.cancellation_point` - `omp.bounds` -> `omp.map.bounds` - `omp.reduction.declare` -> `omp.declare_reduction` Also, the following MLIR operation classes have been renamed: - `omp::TaskLoopOp` -> `omp::TaskloopOp` - `omp::TaskGroupOp` -> `omp::TaskgroupOp` - `omp::DataBoundsOp` -> `omp::MapBoundsOp` - `omp::DataOp` -> `omp::TargetDataOp` - `omp::EnterDataOp` -> `omp::TargetEnterDataOp` - `omp::ExitDataOp` -> `omp::TargetExitDataOp` - `omp::UpdateDataOp` -> `omp::TargetUpdateOp` - `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp` - `omp::WsLoopOp` -> `omp::WsloopOp`	2024-03-20 11:19:38 +00:00
Tom Eccles	27534d69e2	[mlir][LLVM] erase call mappings in forgetMapping() (#84955 ) It looks like the mappings for call instructions were forgotten here. This fixes a bug in OpenMP when in-lining a region containing call operations multiple times. OpenMP array reductions 4/6 Previous PR: https://github.com/llvm/llvm-project/pull/84954 Next PR: https://github.com/llvm/llvm-project/pull/84957	2024-03-20 10:03:26 +00:00
Billy Zhu	c242302492	[MLIR][LLVM] DI Recursive Type fix for recursion via scope of composites (#85850 ) Fixes this bug for the previous recursive DI type PR: https://github.com/llvm/llvm-project/pull/80251#issuecomment-2007254788. Drawing inspiration from how clang uses DIBuilder to build forward decls, this PR changes how placeholders are created & updated. Instead of requiring each recursive DIType to do in-place mutation, we simply ask for a temporary node as the placeholder, and run RAUW at the end when the concrete node is translated. This has the side effect of simplifying what's needed to add recursion support for a type. Now only one additional method needs to be created for exporting. Concretely, for this PR, `translateImpl` for DICompositeType is back to the state it was before the previous PR, and the only net addition for DICompositeType is `translateTemporaryImpl`. --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2024-03-19 22:49:50 -07:00
Billy Zhu	1e8dad3bef	[MLIR][LLVM] Support Recursive DITypes (#80251 ) Following the discussion from [this thread](https://discourse.llvm.org/t/handling-cyclic-dependencies-in-debug-info/67526/11), this PR adds support for recursive DITypes. This PR adds: 1. DIRecursiveTypeAttrInterface: An interface that DITypeAttrs can implement to indicate that it supports recursion. See full description in code. 2. Importer & exporter support (The only DITypeAttr that implements the interface is DICompositeTypeAttr, so the exporter is only implemented for composites too. There will be two methods that each llvm DI type that supports mutation needs to implement since there's nothing general). --------- Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>	2024-03-15 09:58:25 -07:00
Zahi Moudallal	8481fb1698	[MLIR][ROCDL] Fix BallotOp LLVM translation and add doc (#85116 ) This modifies the return type of the intrinsic call to handle 32 and 64 bits properly and document the MLIR operation.	2024-03-14 08:43:48 -07:00
Tom Eccles	f46f5a01f4	[flang][OpenMP][OMPIRBuilder][mlir] Optionally pass reduction vars by ref (#84304 ) Previously reduction variables were always passed by value into and out of the initialization and combiner regions of the OpenMP reduction declare operation. This worked well for reductions of primitive types (and might perform better than passing by reference). But passing by reference will be useful for array and derived type reductions (e.g. to move allocation inside of the init region). Passing reductions by reference requires different LLVM-IR generation when lowering from MLIR because some of the loads/stores/allocations will now be moved inside of the init and combiner regions. This alternate code generation is requested using a new attribute to omp.wsloop and omp.parallel. Existing lowerings from mlir are unaffected (these will continue to use the by-value argument passing. Flang will continue to pass by-value argument passing for trivial types unless a (hidden) command line argument is supplied. Non-trivial types will always use the by-ref lowering. Array reductions are not ready yet (but are coming very soon). In the meantime, this is tested by forcing existing reductions to use by-ref. Commit series for by-ref OpenMP reductions 3/3 --------- Co-authored-by: Mats Petersson <mats.petersson@arm.com>	2024-03-13 14:51:09 +00:00
Zahi Moudallal	accfbf4e49	[MLIR][ROCDL] Add BallotOp and lit test (#84856 )	2024-03-12 17:07:16 -07:00
Krzysztof Drewniak	b05c15259b	[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 ) On some architectures (currently gfx90a, gfx94, and gfx10*), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp	2024-03-11 10:06:49 -05:00
Kareem Ergawy	5c54f72901	[MLIR][OpenMP] Extend omp.private materialization support: `firstprivate` (#82164 ) Extends current support for delayed privatization during translation to LLVM IR. This adds support for one-block `firstprivate` `omp.private` ops.	2024-03-04 12:28:30 +01:00
Leandro Lupori	64422cf826	[llvm][mlir][OMPIRBuilder] Translate omp.single's copyprivate (#80488 ) Use the new copyprivate list from omp.single to emit calls to __kmpc_copyprivate, during the creation of the single operation in OMPIRBuilder. This is patch 4 of 4, to add support for COPYPRIVATE in Flang. Original PR: https://github.com/llvm/llvm-project/pull/73128	2024-02-28 13:33:42 -03:00
Tobias Gysi	e39e30e952	[mlir][llvm] Fix access group translation (#83257 ) This commit fixes the translation of access group metadata to LLVM IR. Previously, it did not use a temporary metadata node to model the placeholder of the self-referencing access group nodes. This is dangerous since, the translation may produce a metadata list with a null entry that is later on changed changed with a self reference. At the same time, for example the debug info translation may create the same uniqued node, which after setting the self-reference the suddenly references the access group metadata. The commit avoids such breakages.	2024-02-28 14:45:18 +01:00
Kareem Ergawy	9d56be010c	[MLIR][OpenMP] Support basic materialization for `omp.private` ops (#81715 ) Adds basic support for materializing delayed privatization. So far, the restrictions on the implementation are: - Only `private` clauses are supported (`firstprivate` support will be added in a later PR).	2024-02-28 05:00:07 +01:00
Xiang Li	70a7b1e8df	Remove test since no test on --debug output. (#83189 )	2024-02-27 17:00:01 -05:00
Krzysztof Drewniak	563f414e04	[mlir][AMDGPU] Set uniform-work-group-size=true by default (#79077 ) GPU kernels generated via typical MLIR mechanisms make the assumption that all workgroups are of uniform size, and so, as in OpenMP, it is appropriate to set the "uniform-work-group-size"="true" attribute on these functions by default. This commit makes that choice. In the event it is needed, this commit adds `rocdl.uniform_work_group_size` as an attribute to be set on LLVM functions that can be used to override the default. In addition, add proper failure messages to translation	2024-02-27 12:35:48 -06:00
Xiang Li	f4fad827ca	[NFC] Add REQUIRES: asserts to limit the test to debug only. (#83145 )	2024-02-27 12:59:56 -05:00
Xiang Li	c11627c2f4	[MLIR][LLVM] Fix memory explosion when converting global variable bodies in ModuleTranslation (#82708 ) There is memory explosion when converting the body or initializer region of a large global variable, e.g. a constant array. For example, when translating a constant array of 100000 strings: llvm.mlir.global internal constant @cats_strings() {addr_space = 0 : i32, alignment = 16 : i64} : !llvm.array<100000 x ptr<i8>> { %0 = llvm.mlir.undef : !llvm.array<100000 x ptr<i8>> %1 = llvm.mlir.addressof @om_1 : !llvm.ptr<array<1 x i8>> %2 = llvm.getelementptr %1[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %3 = llvm.insertvalue %2, %0[0] : !llvm.array<100000 x ptr<i8>> %4 = llvm.mlir.addressof @om_2 : !llvm.ptr<array<1 x i8>> %5 = llvm.getelementptr %4[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %6 = llvm.insertvalue %5, %3[1] : !llvm.array<100000 x ptr<i8>> %7 = llvm.mlir.addressof @om_3 : !llvm.ptr<array<1 x i8>> %8 = llvm.getelementptr %7[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %9 = llvm.insertvalue %8, %6[2] : !llvm.array<100000 x ptr<i8>> %10 = llvm.mlir.addressof @om_4 : !llvm.ptr<array<1 x i8>> %11 = llvm.getelementptr %10[0, 0] : (!llvm.ptr<array<1 x i8>>) -> !llvm.ptr<i8> %12 = llvm.insertvalue %11, %9[3] : !llvm.array<100000 x ptr<i8>> ... (ignore the remaining part) } where @om_1, @om_2, ... are string global constants. Each time an operation is converted to LLVM, a new constant is created. When it comes to llvm.insertvalue, a new constant array of 100000 elements is created and the old constant array (input) is not destroyed. This causes memory explosion. We observed that, on a system with 128 GB memory, the translation of 100000 elements got killed due to using up all the memory. On a system with 64 GB, 65536 elements was enough to cause the translation killed. There is a previous patch (https://reviews.llvm.org/D148487) which fix this issue but was reverted for https://github.com/llvm/llvm-project/issues/62802 The old patch checks generated constants and destroyed them if there is no use. But the check of use for the constant is too early, which cause the constant be removed before use. This new patch added a map was added a map to save expected use count for a constant. Then decrease when reach each use. And only erase the constant when the use count reach to zero With new patch, the repro in https://github.com/llvm/llvm-project/issues/62802 finished correctly.	2024-02-26 21:15:12 -05:00
agozillon	dcf4ca558c	[OpenMP][MLIR][OMPIRBuilder] Add a small optional constant alloca raise function pass to finalize, utilised in convertTarget (#78818 ) This patch seeks to add a mechanism to raise constant (not ConstantExpr or runtime/dynamic) sized allocations into the entry block for select functions that have been inserted into a list for processing. This processing occurs during the finalize call, after OutlinedInfo regions have completed. This currently has only been utilised for createOutlinedFunction, which is triggered for TargetOp generation in the OpenMP MLIR dialect lowering to LLVM-IR. This currently is required for Target kernels generated by createOutlinedFunction to avoid subsequent optimization passes doing some unintentional malformed optimizations for AMD kernels (unsure if it occurs for other vendors). If the allocas are generated inside of the kernel and are not in the entry block and are subsequently passed to a function this can lead to required instructions being erased or manipulated in a way that causes the kernel to run into a HSA access error. This fix is related to a series of problems found in: https://github.com/llvm/llvm-project/issues/74603 This problem primarily presents itself for Flang's HLFIR AssignOp currently, when utilised with a scalar temporary constant on the RHS and a descriptor type on the LHS. It will generate a call to a runtime function, wrap the RHS temporary in a newly allocated descriptor (an llvm struct), and pass both the LHS and RHS descriptor into the runtime function call. This will currently be embedded into the middle of the target region in the user entry block, which means the allocas are also embedded in the middle, which seems to pose issues when later passes are executed. This issue may present itself in other HLFIR operations or unrelated operations that generate allocas as a by product, but for the moment, this one test case is the only scenario I've found this problem. Perhaps this is not the appropriate fix, I am very open to other suggestions, I've tried a few others (at varying levels of the flang/mlir compiler flow), but this one is the smallest and least intrusive change set. The other two, that come to mind (but I've not fully looked into, the former I tried a little with blocks but it had a few issues I'd need to think through): - Having a proper alloca only block (or region) generated for TargetOps that we could merge into the entry block that's generated by convertTarget's createOutlinedFunction. - Or diverging a little from Clang's current target generation and using the CodeExtractor to generate the user code as an outlined function region invoked from the kernel we make, with our kernel arguments passed into it. Similar to the current parallel generation. I am not sure how well this would intermingle with the existing parallel generation though that's layered in. Both of these methods seem like quite a divergence from the current status quo, which I am not entirely sure is merited for the small test this change aims to fix.	2024-02-23 22:59:41 +01:00
Tobias Gysi	335d34d9ea	[MLIR][LLVM] Fix debug intrinsic import (#82637 ) This revision handles the case that the translation of a scope fails due to cyclic metadata. This mainly affects the import of debug intrinsics that indirectly take such a scope as metadata argument (e.g. via local variable or label metadata). This commit ensures we drop intrinsics with such a dependency on cyclic metadata.	2024-02-23 10:30:19 +01:00
Joseph Huber	cc374d8056	[OpenMP] Remove `register_requires` global constructor (#80460 ) Summary: Currently, OpenMP handles the `omp requires` clause by emitting a global constructor into the runtime for every translation unit that requires it. However, this is not a great solution because it prevents us from having a defined order in which the runtime is accessed and used. This patch changes the approach to no longer use global constructors, but to instead group the flag with the other offloading entires that we already handle. This has the effect of still registering each flag per requires TU, but now we have a single constructor that handles everything. This function removes support for the old `__tgt_register_requires` and replaces it with a warning message. We just had a recent release, and the OpenMP policy for the past four releases since we switched to LLVM is that we do not provide strict backwards compatibility between major LLVM releases now that the library is versioned. This means that a user will need to recompile if they have an old binary that relied on `register_requires` having the old behavior. It is important that we actively deprecate this, as otherwise it would not solve the problem of having no defined init and shutdown order for `libomptarget`. The problem of `libomptarget` not having a define init and shutdown order cascades into a lot of other issues so I have a strong incentive to be rid of it. It is worth noting that the current `__tgt_offload_entry` only has space for a 32-bit integer here. I am planning to overhaul these at some point as well.	2024-02-21 11:33:32 -06:00
Guray Ozen	b5d694ba14	[mlir][nvvm] Introduce `nvvm.barrier` OP (#81487 ) This PR that introduces the `nvvm.barrier` OP to the NVVM dialect. Currently, NVVM only supports the `nvvm.barrier0`, which synchronizes all threads using barrier resource 0. The new `nvvm.barrier` has two essential arguments: the barrier resource and the number of threads. This added flexibility allows for selective synchronization of threads within a CTA, aligning with the capabilities provided by LLVM intrinsics or the PTX model. I think we can deprecate `nvvm.barrier0` in favor of the more generic `nvvm.barrier`. ``` // Equivalent to nvvm.barrier0 (or __syncthreads() in CUDA) nvvm.barrier // Synchronize all threads using the 3rd barrier resource. nvvm.barrier id = 3 // Synchronize %numberOfThreads threads using the 3rd barrier resource. nvvm.barrier id = 3 number_of_threads = %numberOfThreads ```	2024-02-14 08:28:45 +01:00
David Truby	be9f8ffd81	[mlir][flang][openmp] Rework wsloop reduction operations (#80019 ) This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>	2024-02-13 19:13:54 +00:00
Giuseppe Rossini	16140ff219	[mlir][ROCDL] Add synchronization primitives (#80888 ) This PR adds two LLVM intrinsics to MLIR: - llvm.amdgcn.s.setprio which sets the priority of a wave for the GPU scheduler - llvm.amdgcn.sched.barrier which sets a software barrier so that the scheduler cannot move instructions around	2024-02-13 12:29:49 -06:00
Rishi Surendran	fa6850a998	[mlir][nvvm]Add support for grid_constant attribute on LLVM function arguments (#78228 ) Add support for attribute nvvm.grid_constant on LLVM function arguments. The attribute can be attached only to arguments of type llvm.ptr that have llvm.byval attribute. Generate LLVM metadata for functions with nvvm.grid_constant arguments. The metadata node is a list of integers, where each integer n denotes that the nth parameter has the grid_constant annotation (numbering from 1). The generated metadata node will be handled by NVVM compiler. See https://docs.nvidia.com/cuda/nvvm-ir-spec/index.html#supported-properties for documentation on grid_constant property. This patch also adds convertParameterAttr to LLVMTranslationDialectInterface for supporting the translation of derived dialect attributes on function parameters	2024-02-12 13:16:59 -08:00
David Truby	9ecf4d20bb	[mlir][flang][openmp] Rework parallel reduction operations (#79308 ) This patch reworks the way that parallel reduction operations function to better match the expected semantics from the OpenMP specification. Previously specific omp.reduction operations were used inside the region, meaning that the reduction only applied when the correct operation was used, whereas the specification states that any change to the variable inside the region should be taken into account for the reduction. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. This patch only makes the change for the `parallel` operation, the change for the `wsloop` operation will be in a separate patch. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>	2024-02-12 17:19:49 +00:00
Benjamin Maxwell	413e82a087	[mlir][ArmSVE] Add intrinsics for the SME2 multi-vector zips (#80985 ) These are added to the ArmSVE dialect for consistency with LLVM, which registers SME2 intrinsics that don't require ZA under SVE.	2024-02-09 13:33:09 +00:00
Kolya Panchenko	9f6c00565a	[MLIR][VCIX] Support VCIX intrinsics in LLVMIR dialect (#75875 ) The changeset extends LLVMIR intrinsics with VCIX intrinsics. The VCIX intrinsics allow MLIR users to interact with RISC-V co-processors that are compatible with `XSfvcp` extension Source: https://www.sifive.com/document-file/sifive-vector-coprocessor-interface-vcix-software	2024-02-07 15:23:28 -05:00
Kojo Acquah	16d890ced6	[mlir][ArmNeon] Adds Arm Neon SMMLA, UMMLA, and USMMLA Intrinsics (#80511 ) This adds the SMMLA, UMMLA, and USMMLA intrinsics to Neon dialect bringing it in line with the SVE dialect. These ops enable matrix multiply-accumulate instructions with two e 2x8 matrix inputs of respective signage into a 2x2 32-bit integer accumulator. This is equivalent to performing an 8-way dot product per destination element. Op details: https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&q=mmla	2024-02-06 14:12:40 -08:00
agozillon	95fe47ca7e	[Flang][OpenMP] Initial mapping of Fortran pointers and allocatables for target devices (#71766 ) This patch seeks to add an initial lowering for pointers and allocatable variables captured by implicit and explicit map in Flang OpenMP for Target operations that take map clauses e.g. Target, Target Update. Target Exit/Enter etc. Currently this is done by treating the type that lowers to a descriptor (allocatable/pointer/assumed shape) as a map of a record type (e.g. a structure) as that's effectively what descriptor types lower to in LLVM-IR and what they're represented as in the Fortran runtime (written in C/C++). The descriptor effectively lowers to a structure containing scalar and array elements that represent various aspects of the underlying data being mapped (lower bound, upper bound, extent being the main ones of interest in most cases) and a pointer to the allocated data. In this current iteration of the mapping we map the structure in it's entirety and then attach the underlying data pointer and map the data to the device, this allows most of the required data to be resident on the device for use. Currently we do not support the addendum (another block of pointer data), but it shouldn't be too difficult to extend this to support it. The MapInfoOp generation for descriptor types is primarily handled in an optimization pass, where it expands BoxType (descriptor types) map captures into two maps, one for the structure (scalar elements) and the other for the pointer data (base address) and links them in a Parent <-> Child relationship. The later lowering processes will then treat them as a conjoined structure with a pointer member map.	2024-02-05 18:45:07 +01:00
Sergio Afonso	bc82d1a6b7	[OpenMPIRBuilder][MLIR] Pass target-cpu and target-features to outlined functions (#80283 ) This patch adds support for forwarding the target-cpu and target-features attributes to functions outlined in the OpenMPIRBuilder. This, in turn, results in the addition of these attributes for functions created during the translation of the `omp.parallel`, `omp.task` and `omp.teams` operations, and for the `omp.wsloop` operation when doing codegen for an OpenMP target device.	2024-02-05 12:22:56 +00:00
Sergio Afonso	92bbf615f5	[Flang][MLIR][OpenMP] Use function-attached target attributes for OpenMP lowering (#78291 ) This patch removes the omp.target module attribute, since the information it held on the target CPU and features is available through the fir.target_cpu and fir.target_features module attributes. Target outlining during the MLIR to LLVM IR translation stage is updated, so that these attributes, at that point available as llvm.func attributes, are passed along to the newly created function.	2024-02-02 13:16:36 +00:00
Kai Sasaki	65ac8c16e0	[mlir] Skip invalid test on big endian platform (s390x) (#80246 ) The buildbot test running on s390x platform keeps failing since [this time](https://lab.llvm.org/buildbot/#/builders/199/builds/31136). This is because of the dependency on the endianness of the platform. It expects the format invalid in the big endian platform (s390x). We can simply skip it. See: https://discourse.llvm.org/t/mlir-s390x-linux-failure/76695	2024-02-02 00:07:44 -08:00
Sander de Smalen	d313614b60	[AArch64] Replace LLVM IR function attributes for PSTATE.ZA. (#79166 ) Since https://github.com/ARM-software/acle/pull/276 the ACLE defines attributes to better describe the use of a given SME state. Previously the attributes merely described the possibility of it being 'shared' or 'preserved', whereas the new attributes have more semantics and also describe how the data flows through the program. For ZT0 we already had to add new LLVM IR attributes: * aarch64_new_zt0 * aarch64_in_zt0 * aarch64_out_zt0 * aarch64_inout_zt0 * aarch64_preserves_zt0 We have now done the same for ZA, such that we add: * aarch64_new_za (previously `aarch64_pstate_za_new`) * aarch64_in_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_out_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_inout_za (more specific variation of `aarch64_pstate_za_shared`) * aarch64_preserves_za (previously `aarch64_pstate_za_shared, aarch64_pstate_za_preserved`) This explicitly removes 'pstate' from the name, because with SME2 and the new ACLE attributes there is a difference between "sharing ZA" (sharing the ZA matrix register with the caller) and "sharing PSTATE.ZA" (sharing either the ZA or ZT0 register, both part of PSTATE.ZA with the caller).	2024-02-01 13:37:37 +00:00
Dominik Adamski	b4370140b4	[OpenMPIRBuilder] Do not call host runtime for GPU teams codegen (#79984 ) Patch ensures that host runtime functions are not called for handling OpenMP teams clause on the device. GPU code for pragma `omp target teams distribute parallel do` will require only one call to OpenMP loop-worksharing GPU runtime. Support for it will be added later. This patch does not include changes required for handling `omp target teams` for the host side.	2024-01-31 12:16:35 +01:00
Cullen Rhodes	95ef8e3868	[mlir][ArmSME] Support 2-way widening outer products (#78975 ) This patch introduces support for 2-way widening outer products. This enables the fusion of 2 'arm_sme.outerproduct' operations that are chained via the accumulator into a 2-way widening outer product operation. Changes: - Add 'llvm.aarch64.sme.[us]mop[as].za32' intrinsics for 2-way variants. These map to instruction variants added in SME2 and use different intrinsics. Intrinsics are already implemented for widening variants from SME1. - Adds the following operations: - fmopa_2way, fmops_2way - smopa_2way, smops_2way - umopa_2way, umops_2way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Adds a pass 'arm-sme-outer-product-fusion' that fuses 'arm_sme.outerproduct' operations. For a detailed description of these operations see the 'arm_sme.fmopa_2way' description. The reason for introducing many operations rather than one is the signed/unsigned variants can't be distinguished with types (e.g., ui16, si16) since 'arith.extui' and 'arith.extsi' only support signless integers. A single operation would require this information and an attribute (for example) for the sign doesn't feel right if floating-point types are also supported where this wouldn't apply. Furthermore, the SME FP8 extensions (FEAT_SME_F8F16, FEAT_SME_F8F32) introduce FMOPA 2-way (FP8 to FP16) and 4-way (FP8 to FP32) variants but no subtract variant. Whilst these are not supported in this patch, it felt simpler to have separate ops for add/subtract given this.	2024-01-31 09:13:18 +00:00
Alex Bradbury	748c295908	[MLIR][LLVM] Add fast-math related function attribute support (#79812 ) Adds unsafe-fp-math, no-infs-fp-math, no-nans-fp-math, approx-func-fp-math, and no-signed-zeros-fp-math function attributes. This allows code generators using the LLVMIR dialect to match the codegen of Clang.	2024-01-30 14:03:51 +00:00
Mirko Brkušanin	7fdf608cef	[AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795 ) Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>	2024-01-24 13:43:07 +01:00

1 2 3 4 5 ...

584 Commits