Commit Graph

789 Commits

Author SHA1 Message Date
Dominik Adamski
ffabf73553 [NFC][OpenMP][MLIR] Add test for lowering parallel workshare GPU loop (#76144)
This test checks if MLIR code is lowered according to schema presented
below:

func1() {
    call __kmpc_parallel_51(..., func2, ...)
}

func2() {
   call __kmpc_for_static_loop_4u(..., func3, ...)
}

func3() {
   //loop body
}
2023-12-22 11:58:04 +01:00
Jakub Kuderski
a03c53c530 [mlir][spirv] Add physical storage buffer extension test. NFC. (#76196)
This test demonstrates how the PhysicalStorageBuffer extension can be
used end-2-end in a spir-v module.

This module has been verified to pass serialization, deserialization,
and validation with spirv-val.
2023-12-21 23:06:26 -05:00
Gil Rapaport
d9803841f2 [mlir][emitc] Add op modelling C expressions (#71631)
Add an emitc.expression operation that models C expressions, and provide
transforms to form and fold expressions. The translator emits the body
of
emitc.expression ops as a single C expression.
This expression is emitted by default as the RHS of an EmitC SSA value,
but if
possible, expressions with a single use that is not another expression
are
instead inlined. Specific expression's inlining can be fine tuned by
lowering
passes and transforms.
2023-12-20 15:04:46 +02:00
Oleksandr "Alex" Zinenko
9519e3ecbf [mlir] support dialect attribute translation to LLVM IR (#75309)
Extend the `amendOperation` mechanism for translating dialect attributes
attached to operations from another dialect when translating MLIR to
LLVM IR. Previously, this mechanism would have no knowledge of the LLVM
IR instructions created for the given operation, making it impossible
for it to perform local modifications such as attaching operation-level
metadata. Collect instructions inserted by the LLVM IR builder and pass
them to `amendOperation`.
2023-12-19 14:18:16 +01:00
Dominik Adamski
6deb5d4e44 [NFC][OpenMP][MLIR] Verify if empty workshare loop is lowered correctly (#75518)
Check if workshare loop without loop body is lowered correctly i.e.:
  1) null pointer is passed to OpenMP device RTL function as a
     parameter which denotes loop function body aggregated parameters
  2) Outlined loop function body has only one parameter - loop counter
2023-12-18 11:59:35 +01:00
Kareem Ergawy
d777504355 [MLIR][OpenMP][Offload] Lower target update op to DeviceRT (#75159)
Adds support for lowring `UpdateDataOp` to the DeviceRT. This reuses the
existing utils used by other device directive.
2023-12-18 11:14:46 +01:00
Jessica Del
32f9983c06 [AMDGPU] - Add address space for strided buffers (#74471)
This is an experimental address space for strided buffers. These buffers
can have structs as elements and
a stride > 1.
These pointers allow the indexed access in units of stride, i.e., they
point at `buffer[index * stride]`.
Thus, we can use the `idxen` modifier for buffer loads.

We assign address space 9 to 192-bit buffer pointers which contain a
128-bit descriptor, a 32-bit offset and a 32-bit index. Essentially,
they are fat buffer pointers with an additional 32-bit index.
2023-12-15 15:49:25 +01:00
Tobias Gysi
25d942403c [mlir][llvm] Add invariant intrinsics (#75354)
This commit implements the LLVM IR invariant intrinsics in LLVM dialect.
These intrinsics can be used to mark a program regions in which the
contents of a specific memory object will not change.

The LLVM dialect implementation also implements the
PromotableOpInterface to ensure Mem2Reg & SROA are able to promote
pointers that are marked using the invariant intrinsics.
2023-12-14 14:58:45 +01:00
Tom Eccles
79524ba527 [mlir][ArmSME] Add sve streaming compatible attribute (#75222)
Following the same path already used for ArmStreaming and
ArmLocallyStreaming.

This should correspond to clang's __arm_streaming_compatible attribute.
2023-12-13 13:53:01 +00:00
Christian Ulmann
eab62971cd [MLIR][LLVM] Support nameless and scopeless global constants (#75307)
This commit ensures that we model DI information for global constants
correctly. These constructs can lack scopes, names, and linkage names,
so these parameters were made optional for the DIGlobalVariable
attribute.
2023-12-13 10:47:59 +01:00
Dominik Adamski
b730703726 [NFC][MLIR][OpenMP] Add test to check lowering omp.wsloop for GPU (#74857)
This test checks if proper OpenMP device RTL function is called to
handle workshare loop for GPU.

The code generation for GPU worksharing loops is implemented by the
patch: https://github.com/llvm/llvm-project/pull/73360
2023-12-12 15:05:26 +01:00
Ivan R. Ivanov
d5fb4c0f11 [MLIR][NVVM] Enable nvvm intrinsics import to LLVMIR (#68843)
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
2023-12-12 13:31:55 +09:00
Tom Eccles
e9e1c411b6 [mlir][LLVM] Add nsw and nuw flags (#74508)
The implementation of these are modeled after the existing fastmath
flags for floating point arithmetic.
2023-12-07 10:35:00 +00:00
Billy Zhu
2ea60f4197 [MLIR][LLVM] Fuse Scope into CallsiteLoc Callee (#74546)
There's an issue in the translator today where, for a CallsiteLoc, if
the callee does not have a DI scope (perhaps due to compile options or
optimizations), it may get propagated the DI scope of its callsite's
parent function, which will create a non-existent DILocation combining
line & col number from one file, and the filename from another.

The root problem is we cannot propagate the parent scope when
translating the callee location, as it no longer applies to inlined
locations (see code diff and hopefully this will make sense).

To facilitate this, the importer is also changed so that callee scopes
are fused with the callee FileLineCol loc, instead of on the Callsite
loc itself. This comes with the benefit that we now have a symmetric
Callsite loc representation. If we required the callee scope be always
annotated on the Callsite loc, it would be hard for generic inlining
passes to maintain that, since it would have to somehow understand the
semantics of the fused metadata and pull it out while inlining.
2023-12-06 09:13:12 +01:00
Sang Ik Lee
7fc792cba7 [MLIR] Enable GPU Dialect to SYCL runtime integration (#71430)
GPU Dialect lowering to SYCL runtime is driven by spirv.target_env
attached to gpu.module. As a result of this, spirv.target_env remains as
an input to LLVMIR Translation.
A SPIRVToLLVMIRTranslation without any actual translation is added to
avoid an unregistered error in mlir-cpu-runner.
SelectObjectAttr.cpp is updated to
1) Pass binary size argument to getModuleLoadFn
2) Pass parameter count to getKernelLaunchFn
This change does not impact CUDA and ROCM usage since both
mlir_cuda_runtime and mlir_rocm_runtime are already updated to accept
and ignore the extra arguments.
2023-12-05 16:55:24 -05:00
Billy Zhu
12e5148f9c [MLIR][LLVM] Fix CallOp asm parser for attr-dict (#74372)
Currently the parser & printer of `CallOp` do not match when both
varargs and attr-dict are present (round tripping is broken). This fixes
the parser so that it conforms to the written asm format in the
comments.
2023-12-05 21:18:52 +01:00
Rik Huijzer
13da9a58c5 [mlir][llvm] Fix verifier for const int and dense (#74340)
Continuation of https://github.com/llvm/llvm-project/pull/74247 to fix
https://github.com/llvm/llvm-project/issues/56962. Fixes verifier for
(Integer Attr):
```mlir
llvm.mlir.constant(1 : index) : f32
```
and (Dense Attr):
```mlir
llvm.mlir.constant(dense<100.0> : vector<1xf64>) : f32
```

## Integer Attr

The addition that this PR makes to `LLVM::ConstantOp::verify` is meant
to be exactly verifying the code in
`mlir::LLVM::detail::getLLVMConstant`:


9f78edbd20/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp (L350-L353)

One failure mode is when the `type` (`llvm.mlir.constant(<value>) :
<type>`) is not an `Integer`, because then the `cast` in
`getIntegerBitWidth` will crash:


dca432cb7b/llvm/include/llvm/IR/DerivedTypes.h (L97-L99)

So that's now caught in the verifier.

Apart from that, I don't see anything we could check for. `sextOrTrunc`
means "Sign extend or truncate to width" and that one is quite
permissive. For example, the following doesn't have to be caught in the
verifier as it doesn't crash during `mlir-translate -mlir-to-llvmir`:

```mlir
llvm.func @main() -> f32 {
  %cst = llvm.mlir.constant(100 : i64) : f32
  llvm.return %cst : f32
}
```

## Dense Attr

Crash if not either a MLIR Vector type or one of these:


9f78edbd20/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp (L375-L391)
2023-12-05 12:31:49 +01:00
Benjamin Maxwell
17de468df1 [mlir][llvm] Add llvm.target_features features attribute (#71510)
This patch adds a target_features (TargetFeaturesAttr) to the LLVM
dialect to allow setting and querying the features in use on a function.

The motivation for this comes from the Arm SME dialect where we would
like a convenient way to check what variants of an operation are
available based on the CPU features.

Intended usage:

The target_features attribute is populated manually or by a pass:

```mlir
func.func @example() attributes {
   target_features = #llvm.target_features<["+sme", "+sve", "+sme-f64f64"]>
} {
 // ...
}
```

Then within a later rewrite the attribute can be checked, and used to
make lowering decisions.

```c++
// Finds the "target_features" attribute on the parent
// FunctionOpInterface.
auto targetFeatures = LLVM::TargetFeaturesAttr::featuresAt(op);

// Check a feature.
// Returns false if targetFeatures is null or the feature is not in
// the list.
if (!targetFeatures.contains("+sme-f64f64"))
    return failure();
```

For now, this is rather simple just checks if the exact feature is in
the list, though it could be possible to extend with implied features
using information from LLVM.
2023-12-05 11:29:31 +00:00
Radu Salavat
3257e4ca16 [MLIR] Add support for frame pointers in MLIR (#72145)
Add support for frame pointers in MLIR.

---------

Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
Co-authored-by: Christian Ulmann <christianulmann@gmail.com>
2023-12-05 11:52:13 +01:00
Billy Zhu
fd870c6fa9 [MLIR][LLVM] Translate Debug EmissionKind (#74376)
Translate debug emission kind into LLVM (the importer already supports
this).
2023-12-05 11:05:21 +01:00
Durga
f81eb7daf4 [MLIR][NVVM]: Add cp.async.mbarrier.arrive Op (#74241)
Add:
* an Op for 'cp.async.mbarrier.arrive', targeting the
nvvm_cp_async_mbarrier_arrive* family of intrinsics.
* The 'noinc' intrinsic property is modelled as a default-valued-attr of
type I1.
* Test cases are added to verify the Op as well as the intrinsic
lowering.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2023-12-04 10:04:58 -08:00
Justin Wilson
6da578cec1 [mlir] Add support for DIGlobalVariable and DIGlobalVariableExpression (#73367)
This PR introduces DIGlobalVariableAttr and
DIGlobalVariableExpressionAttr so that ModuleTranslation can emit the
required metadata needed for debug information about global variable.
The translator implementation for debug metadata needed to be refactored
in order to allow translation of nodes based on MDNode
(DIGlobalVariableExpressionAttr and DIExpression) in addition to
DINode-based nodes.

A DIGlobalVariableExpressionAttr can now be passed to the GlobalOp
operation directly and ModuleTranslation will create the respective
DIGlobalVariable and DIGlobalVariableExpression nodes. The compile unit
that DIGlobalVariable is expected to be configured with will be updated
with the created DIGlobalVariableExpression.
2023-12-04 15:52:02 +01:00
Rik Huijzer
67f9cd4670 [mlir][llvm] Fix verifier for const float (#74247)
Fixes one of the cases of
https://github.com/llvm/llvm-project/issues/56962.

This PR basically moves some code from
`mlir::LLVM::detail::getLLVMConstant`
([source](9f78edbd20/mlir/lib/Target/LLVMIR/ModuleTranslation.cpp (L354-L371)))
over to the verifier of `LLVM::ConstantOp`. For now, I focused just on
the case where the attribute is a float and ignored the integer case of
https://github.com/llvm/llvm-project/issues/56962. Note that without
this patch, both added tests will crash inside `getLLVMConstant` during
`mlir-translate -mlir-to-llvmir`.
2023-12-04 08:06:02 +01:00
Benjamin Maxwell
eaff02f28e [mlir][ArmSME] Switch to an attribute-based tile allocation scheme (#73253)
This reworks the ArmSME dialect to use attributes for tile allocation.
This has a number of advantages and corrects some issues with the
previous approach:

* Tile allocation can now be done ASAP (i.e. immediately after
`-convert-vector-to-arm-sme`)
* SSA form for control flow is now supported (e.g.`scf.for` loops that
yield tiles)
* ArmSME ops can be converted to intrinsics very late (i.e. after
lowering to control flow)
 * Tests are simplified by removing constants and casts
* Avoids correctness issues with representing LLVM `immargs` as MLIR
values
- The tile ID on the SME intrinsics is an `immarg` (so is required to be
a compile-time constant), `immargs` should be mapped to MLIR attributes
(this is already the case for intrinsics in the LLVM dialect)
- Using MLIR values for `immargs` can lead to invalid LLVM IR being
generated (and passes such as -cse making incorrect optimizations)

As part of this patch we bid farewell to the following operations:

```mlir
arm_sme.get_tile_id : i32
arm_sme.cast_tile_to_vector : i32 to vector<[4]x[4]xi32>
arm_sme.cast_vector_to_tile : vector<[4]x[4]xi32> to i32
```

These are now replaced with:
```mlir
// Allocates a new tile with (indeterminate) state:
arm_sme.get_tile : vector<[4]x[4]xi32>
// A placeholder operation for lowering ArmSME ops to intrinsics:
arm_sme.materialize_ssa_tile : vector<[4]x[4]xi32>
```

The new tile allocation works by operations implementing the
`ArmSMETileOpInterface`. This interface says that an operation needs to
be assigned a tile ID, and may conditionally allocate a new SME tile.

Operations allocate a new tile by implementing...
```c++
std::optional<arm_sme::ArmSMETileType> getAllocatedTileType()
```
...and returning what type of tile the op allocates (ZAB, ZAH, etc).

Operations that don't allocate a tile return `std::nullopt` (which is
the default behaviour).

Currently the following ops are defined as allocating:
```mlir
arm_sme.get_tile
arm_sme.zero
arm_sme.tile_load
arm_sme.outerproduct // (if no accumulator is specified)
```

Allocating operations become the roots for the tile allocation pass,
which currently just (naively) assigns all transitive uses of a root
operation the same tile ID. However, this is enough to handle current
use cases.

Once tile IDs have been allocated subsequent rewrites can forward the
tile IDs to any newly created operations.
2023-11-30 10:22:22 +00:00
Guray Ozen
edf5cae739 [mlir][gpu] Support Cluster of Thread Blocks in gpu.launch_func (#72871)
NVIDIA Hopper architecture introduced the Cooperative Group Array (CGA).
It is a new level of parallelism, allowing clustering of Cooperative
Thread Arrays (CTA) to synchronize and communicate through shared memory
while running concurrently.

This PR enables support for CGA within the `gpu.launch_func` in the GPU
dialect. It extends `gpu.launch_func` to accommodate this functionality.

The GPU dialect remains architecture-agnostic, so we've added CGA
functionality as optional parameters. We want to leverage mechanisms
that we have in the GPU dialects such as outlining and kernel launching,
making it a practical and convenient choice.

An example of this implementation can be seen below:

```
gpu.launch_func @kernel_module::@kernel
                clusters in (%1, %0, %0) // <-- Optional
                blocks in (%0, %0, %0)
                threads in (%0, %0, %0)
```

The PR also introduces index and dimensions Ops specific to clusters,
binding them to NVVM Ops:

```
%cidX = gpu.cluster_id  x
%cidY = gpu.cluster_id  y
%cidZ = gpu.cluster_id  z

%cdimX = gpu.cluster_dim  x
%cdimY = gpu.cluster_dim  y
%cdimZ = gpu.cluster_dim  z
```

We will introduce cluster support in `gpu.launch` Op in an upcoming PR. 

See [the
documentation](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#cluster-of-cooperative-thread-arrays)
provided by NVIDIA for details.
2023-11-27 11:05:07 +01:00
Oleksandr "Alex" Zinenko
8735b7dcc9 [mlir] do not inject malloc/free in to-LLVM translation (#73224)
In the early days of MLIR-to-LLVM IR translation, it had to forcefully
inject declarations of `malloc` and `free` functions as then-standard
(now `memref`) dialect ops were unconditionally lowering to libc calls.
This is no longer the case. Even when they do lower to libc calls, the
signatures of those methods are injected at lowering since calls must
target declared functions in valid IR. Don't inject those declarations
anymore.
2023-11-23 13:38:25 +01:00
Dominik Adamski
bda723f5ba [NFC][OpenMP][MLIR] Add MLIR test for lowering parallel if (#71788)
Add test for clause omp target parallel if (parallel : cond )

Test checks if corresponding MLIR construct is correctly lowered to LLVM IR.
2023-11-23 13:29:55 +01:00
Benjamin Maxwell
dbb8643333 [mlir][LLVM] Support immargs in LLVM_IntrOpBase intrinsics (#73013)
This extends `LLVM_IntrOpBase` so that it can be passed a list of
`immArgPositions` and a list (of the same length) of `immArgAttrNames`.
`immArgPositions` contains the positions of `immargs` on the LLVM IR
intrinsic, and `immArgAttrNames` maps those to a corresponding MLIR
attribute.

This allows modeling LLVM `immargs` as MLIR attributes, which is the
closest match semantically (and had already been done manually for the
LLVM dialect intrinsics).

This has two upsides:
* It's slightly easier to implement intrinsics with immargs now
(especially if they make use of other features, such as overloads)
* It clearly defines that `immargs` should map to attributes, before
there was no mention of `immargs` in LLVMOpBase.td, so implementing them
was unclear

This works with other features of the `LLVM_IntrOpBase`, so `immargs`
can be marked as overloaded too (which is used in some intrinsics).

As part of this patch (and to test correctness) existing intrinsics have
been updated to use these new parameters.

This also uncovered a few issues with the
`llvm.intr.vector.insert/extract` intrinsics. First, the argument order
for insert did not match the LLVM intrinsic, and secondly, both were
missing a mlirBuilder (so failed to import from LLVM IR). This is
corrected with this patch (and a test case added).
2023-11-23 10:12:12 +00:00
Oleksandr "Alex" Zinenko
8134a8fc3f [mlir] use TypeSize and uint64_t in DataLayout (#72874)
Data layout queries may be issued for types whose size exceeds the range
of 32-bit integer as well as for types that don't have a size known at
compile time, such as scalable vectors. Use best practices from LLVM IR
and adopt `llvm::TypeSize` for size-related queries and `uint64_t` for
alignment-related queries.

See #72678.
2023-11-21 16:12:27 +01:00
Ivan Butygin
b84fe8ff16 [mlir][spirv] Add some op decorations (#72809)
NoSignedWrap, NoUnsignedWrap, FPFastMathMode.
2023-11-21 15:31:23 +03:00
Sam Tebbs
f7b5c25507 [AArch64][SME] Remove immediate argument restriction for svldr and svstr (#68565)
The svldr_vnum and svstr_vnum builtins always modify the base register
and tile slice and provide immediate offsets of zero, even when the
offset provided to the builtin is an immediate. This patch optimises the
output of the builtins when the offset is an immediate, to pass it
directly to the instruction and to not need the base register and tile
slice updates.
2023-11-20 09:57:29 +00:00
Marius Brehler
c4fd1fd6d4 [mlir][emitc] Rename call op to call_opaque (#72494)
This renames the `emitc.call` op to `emitc.call_opaque` as the existing
call op does not refer to the callee by symbol. The rename allows to
introduce a new call op alongside with a future `emitc.func` op to model
and facilitate functions and function calls.
2023-11-17 10:22:15 +01:00
Jakub Kuderski
c6f7b631a9 [mlir][spirv] Fix VectorShuffle assembly format (#72568)
Align with the rest of the spirv dialect by using a functional type
syntax.

Regex for updating existing code:
`spirv\.VectorShuffle (\[.+\]) (%[^:]+): ([^,]+), (%[^:]+): ([^\s]+) ->(.+)`
 ==>
`spirv.VectorShuffle $1 $2, $4 : $3, $5 ->$6`
2023-11-16 19:34:00 -05:00
Billy Zhu
0ab6b20c36 [MLIR] Add DIExpression to LLVM dialect (#72462)
Add initial support for DIExpression in LLVM dialect.

Similar to LLVM IR, DI Expression is encoded as a list of uint64. The
difference is that LLVM IR has helpers for understanding the expression
(e.g. for verification and pretty printing), whereas the current support
added by this PR treats the expression elements as opaque.
2023-11-16 11:32:02 -08:00
agozillon
718793ce6a [OpenMP][OMPIRBuilder] Handle replace uses of ConstantExpr's inside of Target regions (#71891)
Currently there's an edge cases where constant indexing in target
regions can lead to incorrect results as we do not correctly replace
uses of mapped variables in generated target functions with the target
arguments (and accessor instructions) that replace them. This patch
seeks to fix that by extending the current logic in the OMPIRBuilder.

Things like GEP's can come in the form of Constants/ConstantExprs,
Constants and ConstantExpr's do not have access to the knowledge of what
they're contained in, so we must dig a little to find an instruction so
we can tell if they're used inside of the function we're outlining so we
can be sure they are replaceable and we are not accidentally replacing a
usage somewhere else in the module that's still necessary.

This patch handles these by replacing the original constant expression
with a new instruction equivalent; an instruction as it allows easy
modification in the following loop, as we can now know the constant
(instruction) is owned by our target function (as it holds this
knowledge) and replaceUsesOfWith can now be invoked on it (cannot do
this with constants it seems), a brand new one also allows us to be
cautious as it is perhaps possible the old expression was used inside of
the function but exists and is used externally (unlikely by the nature
of a Constant, but still a positive side affect).
2023-11-15 15:45:32 +01:00
Benjamin Maxwell
783ac3b6fb [mlir][ArmSME] Make use of backend function attributes for enabling ZA storage (#71044)
Previously, we were inserting za.enable/disable intrinsics for functions
with the "arm_za" attribute (at the MLIR level), rather than using the
backend attributes. This was done to avoid a dependency on the SME ABI
functions from compiler-rt (which have only recently been implemented).

Doing things this way did have correctness issues, for example, calling
a streaming-mode function from another streaming-mode function (both
with ZA enabled) would lead to ZA being disabled after returning to the
caller (where it should still be enabled). Fixing issues like this would
require re-doing the ABI work already done in the backend within MLIR.

Instead, this patch switches to use the "arm_new_za" (backend) attribute
for enabling ZA for an MLIR function. For the integration tests, this
requires some way of linking the SME ABI functions. This is done via the
`%arm_sme_abi_shlib` lit substitution. By default, this expands to a
stub implementation of the SME ABI functions, but this can be overridden
by providing the `ARM_SME_ABI_ROUTINES_SHLIB` CMake cache variable
(pointing it at an alternative implementation). For now, the ArmSME
integration tests pass with just stubs, as we don't make use of nested
ZA-enabled calls.

A future patch may add an option to compiler-rt to build the SME
builtins into a standalone shared library to allow easily
building/testing with the actual implementation.
2023-11-14 12:50:38 +00:00
Shraiysh
c9626e6264 [OpenMP][mlir] Add enter capture attribute to declare target (#72062)
This patch adds support for enter attribute in declare target. As the
enter attribute is a replacement for `to` attribute, it has the same
tests.
2023-11-13 14:51:20 -06:00
David Truby
a72e034f13 [mlir] Add llvm.linker.options operation to the LLVM IR Dialect (#71720)
This patch adds a `llvm.linker.options` operation taking a list of
strings to pass to the linker when the resulting object file is linked.
This is particularly useful on Windows to specify the CRT version to use
for this object file.
2023-11-13 14:13:05 +00:00
Dominik Adamski
f2f5f1bfb6 [OMPIRBuilder] Do not call __kmpc_push_num_threads for device parallel (#71934)
Function __kmpc_push_num_threads should be called only if we specify
number of threads for host parallel region.

Number of threads specified by the user should be passed as one of
arguments of __kmpc_parallel_51 function.
2023-11-10 20:38:56 +01:00
Dominik Adamski
cee3b5ef85 [NFC][MLIR][OpenMP] Add test for lowering omp target parallel (#70795)
Added MLIR test which checks if MLIR sample code with omp target
parallel construct is correctly lowered to LLVM IR for the device.
2023-11-07 14:36:29 +01:00
Nikita Popov
d1ee26ba9f [MLIR] Use different constant expression in test (NFC)
Mul expressions will be removed, so use something else.
2023-11-07 11:10:13 +01:00
Johannes Doerfert
3de645efe3 [OpenMP][NFC] Split the reduction buffer size into two components
Before we tracked the size of the teams reduction buffer in order to
allocate it at runtime per kernel launch. This patch splits the number
into two parts, the size of the reduction data (=all reduction
variables) and the (maximal) length of the buffer. This will allow us to
allocate less if we need less, e.g., if we have less teams than the
maximal length. It also allows us to move code from clangs codegen into
the runtime as we now know how large the reduction data is.
2023-11-06 11:50:41 -08:00
Simon Camphausen
68b071d9a2 [mlir][emitc] Fix corner case in translation of literal ops (#71375)
Fix a corner case missed in #71296 when operands generated by literals
are mixed with the args attribute of a call op.

Additionally remove a range check that is already handled by the CallOp
verifier.
2023-11-06 16:17:20 +01:00
Akash Banerjee
72e2387c05 [OpenMP][MLIR] Add "IsolatedFromAbove" trait to omp.target
This patch adds the MLIR translation changes required for add the IsolatedFromAbove and OutlineableOpenMPOpInterface traits to omp.target. It links the newly added block arguments to their corresponding llvm values.

Depends on #67164.
2023-11-06 13:24:02 +00:00
Gil Rapaport
6c59f0e1b0 [mlir][emitc] Fix literal translation (#71296)
- Do not emit variables-at-top for literals
- Do not emit an error for a missing name for literals used as call
operands.
2023-11-05 17:06:24 -08:00
Gil Rapaport
1bb48c440b [mlir][emitc] Add literal op testing (NFC)
- Literal ops are emitted as unused variables under declare-variables-at-top
- Translator fails to emit literals used as emitc.call arguments
2023-11-04 19:47:38 +02:00
Christian Ulmann
fcc26bad82 [MLIR][LLVM] Remove typed pointer remnants from target tests (#71210)
This commit removes all LLVM dialect typed pointers from the target
tests. Typed pointers have been deprecated for a while now and it's
planned to soon remove them from the LLVM dialect.

Related PSA:
https://discourse.llvm.org/t/psa-removal-of-typed-pointers-from-the-llvm-dialect/74502
2023-11-03 21:21:45 +01:00
Johannes Doerfert
b8cbc5c02c [OpenMP] Introduce the KernelLaunchEnvironment as implicit argument (#70401)
The KernelEnvironment is for compile time information about a kernel. It
allows the compiler to feed information to the runtime. The
KernelLaunchEnvironment is for dynamic information *per* kernel launch.
It allows the rutime to feed information to the kernel that is not
shared with other invocations of the kernel. The first use case is to
replace the globals that synchronize teams reductions with per-launch
versions. This allows concurrent teams reductions. More uses cases will
follow, e.g., per launch memory pools.

Fixes: https://github.com/llvm/llvm-project/issues/70249
2023-10-31 19:38:43 -07:00
Christian Ulmann
4ce93d531d [MLIR][LLVM] Avoid creating invalid DICompositeTypes in import (#70797)
This commit ensures that the debug info import skips `DICompositeTypes`
that have an array type tag and failed to translate the base type. This
is necessary because array `DICompositeTypes` require a base type to be
valid.
Note that this is currently not verified in LLVM, instead it leads to an
explosion of the `ASMPrinter`.
2023-10-31 14:30:31 +01:00
agozillon
6a62707c04 [Flang][OpenMP][MLIR] Initial array section mapping MLIR -> LLVM-IR lowering utilising omp.bounds (#68689)
This patch seeks to add initial lowering of OpenMP array sections within
target region map clauses from MLIR to LLVM IR.

This patch seeks to support fixed sized contiguous (don't think OpenMP
supports anything other than contiguous sections from my reading but i
could be wrong) arrays initially, before looking toward assumed size and
shaped arrays. The patch also currently does not include stride, it's
left for future work.

Although, assumed size works in some fashion (dummy arguments) with some
minor alterations to the OMPEarlyOutliner, so it is possible changes
made in the IsolatedFromAbove series may allow this to work with no
further required patches.

It utilises the generated omp.bounds to calculate the size of the mapped
OpenMP array (both for sectioned and un-sectioned arrays) as well as the
offset to be passed to the kernel argument structure.

Alongside these changes some refactoring of how map data is handled is
attempted, using a new MapData structure to keep track of information
utilised in the lowering of mapped values.

The initial addition of a more complex createDeviceArgumentAccessor that
utilises capture kinds similarly to (and loosely based on) Clang to
generate different kernel argument accesses is also added.

A similar function for altering how the kernel argument is passed to the
kernel argument structure on the host is also utilised
(createAlteredByCaptureMap), which allows modification of the
pointer/basePointer based on their capture (and bounds information).
It's of note ByRef, is the default for explicit mappings and ByCopy will
be the default for implicit captures, so the former is currently tested
in this patch and the latter is not for the moment.
2023-10-30 16:00:23 +01:00