Commit Graph

261 Commits

Author SHA1 Message Date
Kareem Ergawy
84c3b05e5e [OpenMP][flang][MLIR] Decouple alloc, init, and copy regions for omp.private|declare_reduction ops (#125699)
This PR changes the emitted block structure of alloc, init, and copy
regions for `omp.private` and `omp.declare_reduction` ops a little bit.
In particular, this decouples init and copy regions from the alloca
insertion-point. The main motivation is fix "Instruction does not
dominate all uses!" errors that happen specially when an init region
uses a value from the OpenMP region it is being inlined into. The issue
happens because, previous to this PR, we inline the init region right
after the latest alloc block (since we used the alloca IP); which in
some cases (see exmaple below), is too early and causes the use
dominance issue.

Example that would break without this PR (when delayed privatization is
enabled for `omp.wsloop`s):
```fortran
subroutine test2 (xyz)
  integer :: i
  integer :: xyz(:)

  !$omp target map(from:xyz)
    !$omp do private(xyz)
      do i = 1, 10
        xyz(i) = i
      end do
  !$omp end target
end subroutine
```
2025-02-06 11:45:40 +01:00
Abid Qadeer
5f7acf7259 [flang][OMPIRbuilder] Set debug loc on terminator created by splitBB. (#125897)
Fixes #125088.

When splitBB is called with createBranch=true, it creates a branch
instruction in the old block. But no debug loc is set on that branch
instruction. If that is used as InsertPoint in the restoreIP, it has the
potential to set the current debug location to null and subsequent
instruction will come out without a debug location. This caused the
verification check to fail as shown in the bug report.

This PR changes splitBB and spliceBB function to also take a debugLoc
parameter which can be used to set the debug location of the branch
instruction.
2025-02-05 22:35:43 +00:00
Abid Qadeer
e151b1d1f6 [MLIR][OpenMP] Use correct DebugLoc in target construct callbacks. (#125856)
This is same as PR #125106 which somehow is stuck in a "Processing
Update" loop for many hours now. I am going to close that one and push
this one instead.

While working on https://github.com/llvm/llvm-project/issues/125088, I
noticed a problem with the TargetBodyGenCallbackTy and
TargetGenArgAccessorsCallbackTy. The OMPIRBuilder and MLIR side Both
maintain their own IRBuilder and when control goes from one to other, we
have to take care to not use a stale debug location. The code currently
rely on restoreIP to set the insertion point and the debug location. But
if the passes InsertPointTy has an empty block, then the debug location
will not be updated (see SetInsertPoint). This can cause invalid debug
location to be attached to instruction and the verifier will complain.

Similarly when we exit the callback, the debug location of the Builder
is not set to what it was before the callback. This again can cause
verification failures.

This PR resets the debug location at the start and also uses an
InsertPointGuard to restore the debug location at exit.

Both of these problems would have been caught by the unit tests but they
were not setting the debug location of the builder before calling the
createTarget so the problem was hidden. I have updated the tests
accordingly.
2025-02-05 14:59:37 +00:00
Tom Eccles
9ad4ebd82b [mlir][OpenMP][NFC] break out priv var init into helper (#125303) 2025-02-03 09:10:44 +00:00
Tom Eccles
aeaafce464 [mlir][OpenMP][flang] make private variable allocation implicit in omp.private (#124019)
The intention of this work is to give MLIR->LLVMIR conversion freedom to
control how the private variable is allocated so that it can be
allocated on the stack in ordinary cases or as part of a structure used
to give closure context for tasks which might outlive the current stack
frame. See RFC:

https://discourse.llvm.org/t/rfc-openmp-supporting-delayed-task-execution-with-firstprivate-variables/83084

For example, a privatizer for an integer used to look like
```mlir
  omp.private {type = private} @x.privatizer : !fir.ref<i32> alloc {
  ^bb0(%arg0: !fir.ref<i32>):
    %0 = ... allocate proper memory for the private clone ...
    omp.yield(%0 : !fir.ref<i32>)
  }
```

After this change, allocation become implicit in the operation:
```mlir
  omp.private {type = private} @x.privatizer : i32
```

For more complex types that require initialization after allocation, an
init region can be used:
``` mlir
  omp.private {type = private} @x.privatizer : !some.type init {
  ^bb0(%arg0: !some.pointer<!some.type>, %arg1: !some.pointer<!some.type>):
    // initialize %arg1, using %arg0 as a mold for allocations
    omp.yield(%arg1 : !some.pointer<!some.type>)
  } dealloc {
    ^bb0(%arg0: !some.pointer<!some.type>):
    ... deallocate memory allocated by the init region ...
    omp.yield
  }
```

This patch lays the groundwork for delayed task execution but is not
enough on its own.

After this patch all gfortran tests which previously passed still pass.
There
are the following changes to the Fujitsu test suite:
- 0380_0009 and 0435_0009 are fixed
- 0688_0041 now fails at runtime. This patch is testing firstprivate
variables with tasks. Previously we got lucky with the undefined
behavior and won the race. After these changes we no longer get lucky.
This patch lays the groundwork for a proper fix for this issue.

In flang the lowering re-uses the existing lowering used for reduction
init and dealloc regions.

In flang, before this patch we hit a TODO with the same wording when
generating the copy region for firstprivate polymorphic variables. After
this patch the box-like fir.class is passed by reference into the copy
region, leading to a different path that didn't hit that old TODO but
the generated code still didn't work so I added a new TODO in
DataSharingProcessor.
2025-01-31 09:35:26 +00:00
agozillon
2428b6ec40 [Flang][MLIR][OpenMP] Fix Target Data if (present(...)) causing LLVM-IR branching error (#123771)
Currently if we generate code for the below target data map that uses an
optional mapping:

       !$omp target data if(present(a)) map(alloc:a)
            do i = 1, 10
                a(i) = i
            end do
       !$omp end target data

We yield an LLVM-IR error as the branch for the else path is not
generated. This occurs because we enter the NoDupPriv path of the call
back function when generating the else branch, however, the emitBranch
function needs to be set to a block for it to functionally generate and
link in a follow up branch. The NoDupPriv path currently doesn't do
this, while it's not supposed to generate anything (as far as I am
aware) we still need to at least set the builders placement back so that
it emits the appropriate follow up branch. This avoids the missing
terminator LLVM-IR verification error by correctly generating the follow
up branch.
2025-01-30 17:33:36 +01:00
Tom Eccles
2bde7a1b7c [mlir][OpenMP][NFC] Remove dead uses of OpenMPVarMappingStackFrame (#125061)
This is left over from the old way reductions were implemented.
OpenMPVarMappingStackFrame doesn't actually do anything anymore so these
uses can go away.
2025-01-30 14:35:10 +00:00
agozillon
e0054e984c [MLIR][OpenMP] Emit nullary check for mapped pointer members and appropriate size select based on results (#124604)
This PR aims to fix a mapping error when trying to map nullary elements
of a record type (primary example is allocatables/pointer types in
Fortran at the moment). This should be legal to map, just not write to
without pointing to anything within the target region. A common Fortran
OpenMP idiom/example where this is useful can be found in the added
Fortran offload example.

The runtime error arises when we try to map the pointer member utilising
a prescribed constant size that we receive from the lowered type,
resulting in mapping of data that will be non-existent when there is no
allocated data. The fix in this case is to emit a runtime check to see
if the data has been allocated, if it hasn't been we select a size of 0,
if it has we emit the usual type size.
2025-01-29 17:51:33 +01:00
Jeremy Morse
749443a307 [NFC][DebugInfo] Mop up final instruction-insertion call sites (#124289)
These are the final places in the monorepo that make use of instruction
insertion for methods like insertBefore and moveBefore. As part of the
RemoveDIs project, instead use iterators for insertion. (see:
https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
).
2025-01-27 16:07:27 +00:00
Anchu Rajendran S
afcbcae668 [mlir][OpenMP] inscan reduction modifier and scan op mlir support (#114737)
Scan directive allows to specify scan reductions within an worksharing
loop, worksharing loop simd or simd directive which should have an
`InScan` modifier associated with it. This change adds the mlir support
for the same.

Related PR: [Parsing and Semantic Support for
scan](https://github.com/llvm/llvm-project/pull/102792)
2025-01-22 09:53:54 -08:00
Kareem Ergawy
937cbce14c Revert "[flang][OpenMP] Enable delayed privatization by default omp.wsloop (#122471)" (#123324)
This seems to have caused some regressions in Fujitsu's test-suite:
https://linaro.atlassian.net/browse/LLVM-1521

This reverts commit 6f82408bb5.
2025-01-22 10:16:40 +01:00
Thirumalai Shaktivel
c2aa11d148 [Flang] Add LLVM lowering support for UNTIED clause in Task (#121052)
Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.


Resubmitting the PR with fix for the failure, as it was reverted here:
927a70daf3
and previously merged here: https://github.com/llvm/llvm-project/pull/115283
2025-01-21 09:10:25 +05:30
Kareem Ergawy
6b3ba6677d [flang][OpenMP] Unconditionally create after_alloca block in allocatePrivateVars (#123168)
While https://github.com/llvm/llvm-project/pull/122866 fixed some
issues, it introduced a regression in worksharing loops. The new bug
comes from the fact that we now conditionally created the `after_alloca`
block based on the number of sucessors of the alloca insertion point.
This is unneccessary, we can just alway create the block. If we do this,
we respect the post condtions expected after calling
`allocatePrivateVars` (i.e. that the `afterAlloca` block has a single
predecessor.
2025-01-16 19:08:38 +01:00
Kareem Ergawy
6f82408bb5 [flang][OpenMP] Enable delayed privatization by default omp.wsloop (#122471)
This enable delayed privatization by default for `omp.wsloop` ops, with
one caveat! I had to workaround the "impure" alloc region issue that
being resolved at the moment. The workaround detects whether the alloc
region's argument is used in the region and at the same time defined in
block that does not dominate the chosen alloca insertion point. If so,
we move the alloca insertion point below the defining instruction of the
alloc region argument. This basically reverts to the
non-delayed-privatizaiton behavior.
2025-01-16 15:44:59 +01:00
Thirumalai Shaktivel
1d890b06ee [Flang, OpenMP] Add LLVM lowering support for PRIORITY in TASK (#120710)
Implementation details:
The PRIORITY clause is recognized by setting the flags = 32 to the 
`__kmpc_omp_task_alloc` runtime call. Also, store the priority-value 
to the `kmp_task_t` struct member
2025-01-16 10:02:30 +05:30
Kareem Ergawy
a32c45631b [flang][OpenMP] Generalize fixing alloca IP pre-condition for private ops (#122866)
This PR generalizes a fix that we implemented previously for
`omp.wsloop`s. The fix makes sure the pre-condtion that the `alloca`
block has a single successor whenever we inline delayed privatizers is
respected. I simply moved the fix to `allocatePrivateVars` so that it
kicks in for any op not just `omp.wsloop`.

This handles a bug uncovered by [a
test](https://github.com/OpenMP-Validation-and-Verification/OpenMP_VV/blob/master/tests/4.5/target_simd/test_target_simd_safelen.F90)
in the OpenMP_VV test suite.
2025-01-15 14:52:10 +01:00
Sergio Afonso
9bc8828093 [OMPIRBuilder][MLIR] Add support for target 'if' clause (#122478)
This patch implements support for handling the 'if' clause of OpenMP
'target' constructs in the OMPIRBuilder and updates MLIR to LLVM IR
translation of the `omp.target` MLIR operation to make use of this new
feature.
2025-01-15 10:16:19 +00:00
Sergio Afonso
d2d4c3bd59 [MLIR][OpenMP] LLVM IR translation of host_eval (#116052)
This patch adds support for processing the `host_eval` clause of
`omp.target` to populate default and runtime kernel launch attributes.
Specifically, these related to the `num_teams`, `thread_limit` and
`num_threads` clauses attached to operations nested inside of
`omp.target`. As a result, the `thread_limit` clause of `omp.target` is
also supported.

The implementation of `initTargetDefaultAttrs()` is intended to reflect
clang's own processing of multiple constructs and clauses in order to
define a default number of teams and threads to be used as kernel
attributes and to populate global variables in the target device module.

One side effect of this change is that it is no longer possible to
translate to LLVM IR target device MLIR modules unless they have a
supported target triple. This is because the local `getGridValue()`
function in the `OpenMPIRBuilder` only works for certain architectures,
and it is called whenever the maximum number of threads has not been
explicitly defined. This limitation also matches clang.

Evaluating the collapsed loop trip count of SPMD and Generic-SPMD
kernels remains unsupported.
2025-01-14 13:07:38 +00:00
Sergio Afonso
fabc443e93 [OMPIRBuilder] Support runtime number of teams and threads, and SPMD mode (#116051)
This patch introduces a `TargetKernelRuntimeAttrs` structure to hold
host-evaluated `num_teams`, `thread_limit`, `num_threads` and trip count
values passed to the runtime kernel offloading call.

Additionally, kernel type information is used to influence target device
code generation and the `IsSPMD` flag is replaced by `ExecFlags`, which
provides more granularity.
2025-01-14 12:34:37 +00:00
Sergio Afonso
27bc6bdaba [OMPIRBuilder] Introduce struct to hold default kernel teams/threads (#116050)
This patch introduces the `OpenMPIRBuilder::TargetKernelDefaultAttrs`
structure used to simplify passing default and constant values for
number of teams and threads, and possibly other target kernel-related
information in the future.

This is used to forward values passed to `createTarget` to
`createTargetInit`, which previously used a default unrelated set of
values.
2025-01-14 11:08:55 +00:00
Sergio Afonso
9d7d8d2c87 [MLIR][OpenMP] Add host_eval clause to omp.target (#116049)
This patch adds the `host_eval` clause to the `omp.target` operation.
Additionally, it updates its op verifier to make sure all uses of block
arguments defined by this clause fall within one of the few cases where
they are allowed.

MLIR to LLVM IR translation fails on translation of this clause with a
not-yet-implemented error.
2025-01-14 10:21:46 +00:00
Kareem Ergawy
42da12063f [flang][OpenMP] Extend delayed privatization for omp.simd (#122156)
Adds support for delayed privatization for `simd` directives. This PR
includes PFT down to LLVM IR lowering.
2025-01-12 07:46:58 +01:00
Kareem Ergawy
6f9e688203 [flang][OpenMP] Fix reduction init region block management (#122079)
Replaces https://github.com/llvm/llvm-project/pull/121886
Fixes https://github.com/llvm/llvm-project/issues/120254 (hopefully 🤞)

## Problem

Consider the following example:
```fortran
program test
  real :: x(1)
  integer :: i
  !$omp parallel do reduction(+:x)
    do i = 1,1
      x = 1
    end do
  !$omp end parallel do
end program
```

The HLFIR+OMP IR for this example looks like this:
```mlir
  func.func @_QQmain() {
    ...
    omp.parallel {
      %5 = fir.embox %4#0(%3) : (!fir.ref<!fir.array<1xf32>>, !fir.shape<1>) -> !fir.box<!fir.array<1xf32>>
      %6 = fir.alloca !fir.box<!fir.array<1xf32>>
      ...
      omp.wsloop private(@_QFEi_private_ref_i32 %1#0 -> %arg0 : !fir.ref<i32>) reduction(byref @add_reduction_byref_box_1xf32 %6 -> %arg1 : !fir.ref<!fir.box<!fir.array<1xf32>>>) {
        omp.loop_nest (%arg2) : i32 = (%c1_i32) to (%c1_i32_0) inclusive step (%c1_i32_1) {
          ...
          omp.yield
        }
      }
      omp.terminator
    }
    return
  }
```

The problem addressed by this PR is related to: the `alloca` in the
`omp.parallel` region + the related `reduction` clause on the
`omp.wsloop` op. When we try translate the reduction from MLIR to LLVM,
we have to choose an `alloca` insertion point. This happens in
`convertOmpWsloop` where at entry to that function, this is what the
LLVM module looks like:

```llvm
define void @_QQmain() {
  %tid.addr = alloca i32, align 4
  ...

entry:
  %omp_global_thread_num = call i32 @__kmpc_global_thread_num(ptr @1)
  br label %omp.par.entry

omp.par.entry:
  %tid.addr.local = alloca i32, align 4
  ...
  br label %omp.par.region

omp.par.region:
  br label %omp.par.region1

omp.par.region1:
  ...
  %5 = alloca { ptr, i64, i32, i8, i8, i8, i8, [1 x [3 x i64]] }, align 8
```

Now, when we choose an `alloca` insertion point for the reduction, this
is the chosen block `omp.par.entry` (without the changes in this PR).
The problem is that the allocation needed for the reduction needs to
reference the `%5` SSA value. This results in inserting allocations in
`omp.par.entry` that reference allocations in a later block
`omp.par.region1` which causes the `Instruction does not dominate all
uses!` error.

## Possible solution - take 2:

This PR contains a more localized solution than
https://github.com/llvm/llvm-project/pull/121886. It makes sure that on
entry to `initReductionVars`, the IR builder is at a point where we can
starting inserting initialization region; to make things cleaner, we
still split the builder insertion point to a dedicated
`omp.reduction.init`. This way we avoid splitting after the latest
allocation block; which is what causing the issue.
2025-01-09 16:11:18 +01:00
agozillon
fa56e8bb64 [OpenMP][MLIR] Fix threadprivate lowering when compiling for target when target operations are in use (#119310)
Currently the compiler will ICE in programs like the following on the
device lowering pass:

```
program main
    implicit none

    type i1_t
       integer :: val(1000)
    end type i1_t
    integer :: i
    type(i1_t), pointer :: newi1
    type(i1_t), pointer :: tab=>null()

    integer, dimension(:), pointer :: tabval

!$omp THREADPRIVATE(tab)

allocate(newi1)

tab=>newi1
tab%val(:)=1
tabval=>tab%val

!$omp target teams distribute parallel do
  do i = 1, 1000
   tabval(i) = i
 end do
!$omp end target teams distribute parallel do

end program main
```

This is due to the fact that THREADPRIVATE returns a result operation,
and this operation can actually be used by other LLVM dialect (or other
dialect) operations. However, we currently skip the lowering of
threadprivate, so we effectively never generate and bind an LLVM-IR
result to the threadprivate operation result. So when we later go on to
lower dependent LLVM dialect operations, we are missing the required
LLVM-IR result, try to access and use it and then ICE. The fix in this
particular PR is to allow compilation of threadprivate for device as
well as host, and simply treat the device compilation as a no-op,
binding the LLVM-IR result of threadprivate with no alterations and
binding it, which will allow the rest of the compilation to proceed,
where we'll eventually discard the host segment in any case.

The other possible solution to this I can think of, is doing something
similar to Flang's passes that occur prior to CodeGen to the LLVM
dialect, where they erase/no-op certain unrequired operations or
transform them to lower level series of operations. And we would
erase/no-op threadprivate on device as we'd never have these in target
regions.

The main issues I can see with this are that we currently do not
specialise this stage based on wether we're compiling for device or
host, so it's setting a precedent and adding another point of having to
understand the separation between target and host compilation. I am also
not sure we'd necessarily want to enforce this at a dialect level incase
someone else wishes to add a different lowering flow or translation
flow. Another possible issue is that a target operation we have/utilise
would depend on the result of threadprivate, meaning we'd not be allowed
to entirely erase/no-op it, I am not sure of any situations where this
may be an issue currently though.
2025-01-03 18:01:01 +01:00
Kaviya Rajendiran
d3eb65f15d [MLIR][OpenMP] Lowering aligned clause to LLVM IR for SIMD directive (#119536)
This patch,
- Added a translation support for aligned clause in SIMD directive by passing the alignment details to "llvm.assume" intrinsic.
- Updated the insertion point for llvm.assume intrinsic call in "OMPIRBuilder.cpp".
- Added a check in aligned clause MLIR lowering, to ensure that the alignment value must be a power of 2.
2025-01-03 16:22:38 +05:30
Thirumalai Shaktivel
cbe583b0bd [Flang] Add translation support for MutexInOutSet and InOutSet (#120715)
Implementatoin details:
Both Mutexinoutset and Inoutset is recognized as flag=0x4 
and 0x8 respectively, the flags is set to `kmp_depend_info` and 
passed as argument to `__kmpc_omp_task_with_deps` runtime call
2024-12-26 15:02:09 +05:30
Muhammad Omair Javaid
927a70daf3 Revert "[Flang OpenMP] Add LLVM translation support for UNTIED in Task (#115283)"
This reverts commit 919aead1db.
It breaks following LLVM bots:
https://lab.llvm.org/buildbot/#/builders/199
https://lab.llvm.org/buildbot/#/builders/143
https://lab.llvm.org/buildbot/#/builders/17
2024-12-24 01:47:24 +05:00
Thirumalai Shaktivel
919aead1db [Flang OpenMP] Add LLVM translation support for UNTIED in Task (#115283)
Implementation details:
The UNTIED clause is recognized by setting the flag=0 for the default
case or performing logical OR to flag if other clauses are specified,
and this flag is passed as an argument to the `__kmpc_omp_task_alloc`
runtime call.
2024-12-20 16:36:51 +05:30
Ivan R. Ivanov
7c9404c279 [flang][OpenMP] Add frontend support for ompx_bare clause (#111106) 2024-12-13 21:44:43 +09:00
Jie Fu
46ec271e03 [mlir] Fix -Wunused-variable in OpenMPToLLVMIRTranslation.cpp (NFC)
/llvm-project/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp:3921:12:
 error: unused variable 'varType' [-Werror,-Wunused-variable]
      Type varType = mapInfoOp.getVarType();
           ^
1 error generated.
2024-12-12 22:11:41 +08:00
Kareem Ergawy
f9734b9df1 [mlir][OpenMP] - MLIR to LLVMIR translation support for delayed privatization of allocatables in omp.target ops (#116576)
This PR adds support to translate the `private` clause from MLIR to
LLVMIR when used on allocatables in the context of an `omp.target` op.

This replaces https://github.com/llvm/llvm-project/pull/113208.

Parent PR: https://github.com/llvm/llvm-project/pull/116770. Only the
latest commit is relevant to the PR.
2024-12-12 14:39:58 +01:00
Kareem Ergawy
0e70e0edd5 [reapply (#118463)][OpenMP][OMPIRBuilder] Add delayed privatization support for wsloop (#119170)
This reapplies PR #118463 after introducing a fix for a bug uncovere by
the test suite. The problem is that when the alloca block is terminated
with a conditional branch, this violates a pre-condition of
`allocatePrivateVars` (which assumes the alloca block has a single
successor). This new PR includes a test that reproduces the issue.

Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc` regions.
2024-12-09 14:32:04 +01:00
NimishMishra
9eb4056144 [mlir][llvm] Translation support for task detach (#116601)
This PR adds translation support for task detach. Essentially, if the
`detach` clause is present on a task, emit a
`__kmpc_task_allow_completion_event` on it, and store its return (of
type `kmp_event_t*`) into the `event_handle`.
2024-12-08 06:09:52 -08:00
Kareem Ergawy
c54616ea48 Revert "[OpenMP][OMPIRBuilder] Add delayed privatization support for wsloop (#118463)" (#118848) 2024-12-05 20:49:13 +01:00
Kareem Ergawy
0993335134 [OpenMP][OMPIRBuilder] Add delayed privatization support for wsloop (#118463)
Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc`
regions.

Parent PR: https://github.com/llvm/llvm-project/pull/118447. Only latest
commit is relevant for this PR.
2024-12-05 05:59:52 +01:00
Kareem Ergawy
7f72d71de7 [OpenMP][OMPIRBuilder] Refactor reduction initialization logic into one util (#118447)
This refactors the logic needed to emit init logic for reductions by
moving some duplicated code into a shared util. The logic for doing is
quite involved and is needed for any construct that has reductions.
Moreover, when a construct has both private and reduction clauses, both
sets of clauses need to cooperate with each other when emitting the
logic needed for allocation and initialization. Therefore, this PR
clearly sets the boundaries for the logic needed to initialize
reductions.
2024-12-05 05:23:49 +01:00
NimishMishra
b9e3a769b9 [flang][mlir][llvm][OpenMP] Add lowering and translation support for mergeable clause on task (#114662)
Add FIR generation and LLVMIR translation support for mergeable clause
on task construct. If mergeable clause is present on a task, the
relevant flag in `ompt_task_flag_t` is set and passed to
`__kmpc_omp_task_alloc`.
2024-11-26 02:40:26 -08:00
Tom Eccles
a6385a3fc8 [mlir][OpenMP][NFC] use llvm::zip_equal for firstprivate copy region translation (#116416)
I think this is a bit easier to read.
2024-11-18 10:25:19 +00:00
agozillon
b5db75bfce [OpenMP][MLIR] Descriptor explicit member map lowering changes (#113556)
This is one of 3 PRs in a PR stack that aims to add support for explicit
mapping of allocatable members in derived types.

The primary changes in this PR are the OpenMPToLLVMIRTranslation.cpp
changes, which are small and seek to alter the current member mapping to
add an additional map insertion for pointers. Effectively, if the member
is a pointer (currently indicated by having a varPtrPtr field) we add an
additional map for the pointer and then alter the subsequent mapping of
the member (the data) to utilise the member rather than the parents base
pointer. This appears to be necessary in certain cases when mapping
pointer data within record types to avoid segfaulting on device (due to
incorrect data mapping). In general this record type mapping may be
simplifiable in the future.

There are also additions of tests which should help to showcase the
affect of the changes above.
2024-11-16 12:26:29 +01:00
agozillon
d84d0caf28 [Flang][OpenMP] Update MapInfoFinalization to use BlockArgs Interface and modify use_device_ptr/addr to be order independent (#113919)
This patch primarily updates the MapInfoFinalization pass to utilise the
BlockArgument interface. It also shuffles newly added arguments the
MapInfoFinalization passes to the end of the BlockArg/Relevant MapInfo
lists, instead of one prior to the owning descriptor type.

During this it was noted that the use_device_ptr/addr handling of target
data was a little bit too order dependent so I've attempted to make it
less so, as we cannot depend on argument ordering to be the same as
Fortran for any future frontends.
2024-11-14 15:47:37 +01:00
Tom Eccles
8269c400b4 [mlir][OpenMP][NFC] delayed privatisation cleanup (#115298)
Upstreaming some code cleanups ahead of supporting delayed task
execution.
- Make allocatePrivateVars not need to be a template (it will need to
operate separately on firstprivate and private variables for delayed
task execution so it can't index into lists of all variables in the
operation).
 - Use llvm::SmallVectorImpl for function arguments
- collectPrivatizationDecls already reserves size for privateDecls so we
don't need to do that in callers
 - Use llvm::zip_equal instead of C-style array indexing
2024-11-07 12:27:31 +00:00
Tom Eccles
28452acac0 [mlir][OpenMP] delayed privatisation for TASK (#114785)
This uses essentially an identical implementation to that used for
ParallelOp. The private variable allocation and deallocation use shared
functions to avoid code duplication. FIRSTPRIVATE variable copying uses
duplicated code for now because I anticipate the implementation
diverging in the near future once I store data for firstprivate
variables in the task description structure.

After enabling delayed privatisation for TASK in flang, one more test in
the fujitsu test suite passes (I haven't looked into why).
2024-11-06 13:19:12 +00:00
Sergio Afonso
d3e796c2d0 [MLIR][OpenMP] Update not-yet-implemented errors, NFC (#114966)
This patch improves not-yet-implemented error diagnostics to more
closely follow the format used by Flang lowering for the same kind of
errors. This helps keep some level of uniformity from a user
perspective.
2024-11-05 12:48:54 +00:00
Sergio Afonso
6c28530ed0 [Flang][OpenMP] Properly bind arguments of composite operations (#113682)
When composite constructs are lowered, clauses for each leaf construct
are lowered before creating the set of loop wrapper operations, using
these outside values to populate their operand lists. Then, when the
loop nest associated to that composite construct is lowered, the binding
of Fortran symbols to the entry block arguments defined by these loop
wrappers is performed, resulting in the creation of `hlfir.declare`
operations in the entry block of the `omp.loop_nest`.

This approach prevents `hlfir.declare` operations related to the binding
and other operations resulting from the evaluation of the clauses from
being inserted between loop wrapper operations, which would be an
illegal MLIR representation. However, this introduces the problem of
entry block arguments defined by a wrapper that then should be used by
one of its nested wrappers, because the corresponding Fortran symbol
would still be mapped to an outside value at the time of gathering the
list of operands for the nested wrapper.

This patch adds operand re-mapping logic to update wrappers without
changing when clauses are evaluated or where the `hlfir.declare`
creation is performed.
2024-10-31 16:39:53 +00:00
Sergio Afonso
bd6c21460f [MLIR][OpenMP] Emit descriptive errors for all unsupported clauses (#114037)
This patch improves error reporting in the MLIR to LLVM IR translation
pass for the 'omp' dialect by emitting descriptive errors when
encountering clauses not yet supported by that pass.

Additionally, not-yet-implemented errors previously missing for some
clauses are added, to avoid silently ignoring them.

Error messages related to inlining of `omp.private` and
`omp.declare_reduction` regions have been updated to use the same
format.
2024-10-31 11:59:51 +00:00
Sergio Afonso
21a6032eca [MLIR][OpenMP] Simplify translation to LLVM IR error handling (#114036)
This patch unifies the handling of errors passed through the
OpenMPIRBuilder and removes some redundant error messages through the
introduction of a custom `ErrorInfo` subclass.

Additionally, the current list of operations and clauses unsupported by
the MLIR to LLVM IR translation pass is added to a new Lit test to check
they are being reported to the user.
2024-10-31 11:34:24 +00:00
Sergio Afonso
a1f2fb6078 [MLIR][OpenMP] Prevent composite omp.simd related crashes (#113680)
This patch updates the translation of `omp.wsloop` with a nested
`omp.simd` to prevent uses of block arguments defined by the latter from
triggering null pointer dereferences.

This happens because the inner `omp.simd` operation representing
composite `do simd` constructs is currently skipped and not translated,
but this results in block arguments defined by it not being mapped to an
LLVM value. The proposed solution is to map these block arguments to the
LLVM value associated to the corresponding operand, which is defined
above.
2024-10-29 17:05:12 +00:00
Sergio Afonso
d87964de78 [OpenMP][OMPIRBuilder] Error propagation across callbacks (#112533)
This patch implements an approach to communicate errors between the
OMPIRBuilder and its users. It introduces `llvm::Error` and
`llvm::Expected` objects to replace the values returned by callbacks
passed to `OMPIRBuilder` codegen functions. These functions then check
the result for errors when callbacks are called and forward them back to
the caller, which has the flexibility to recover, exit cleanly or dump a
stack trace.

This prevents a failed callback to leave the IR in an invalid state and
still continue the codegen process, triggering unrelated assertions or
segmentation faults. In the case of MLIR to LLVM IR translation of the
'omp' dialect, this change results in the compiler emitting errors and
exiting early instead of triggering a crash for not-yet-implemented
errors. The behavior in Clang and openmp-opt stays unchanged, since
callbacks will continue always returning 'success'.
2024-10-25 11:30:16 +01:00
Kareem Ergawy
ad70f3e095 [flang][OpenMP] Support target enter|update|exit .. nowait (#113305)
Extends `nowait` support for other device directives. This PR refactors
the task generation utils used for the `target` directive so that they
are general enough to be reused for other device directives as well.
2024-10-23 10:48:54 +02:00
Tom Eccles
621fcf892b [mlir][OpenMP] rewrite conversion of privatisation for omp.parallel (#111844)
The existing conversion inlined private alloc regions and firstprivate
copy regions in mlir, then undoing the modification of the mlir module
before completing the conversion. To make this work, LLVM IR had to be
generated using the wrong mapping for privatised values and then later
fixed inside of OpenMPIRBuilder.

This approach violated an assumption in OpenMPIRBuilder that private
variables would be values not constants. Flang sometimes generates code
where private variables are promoted to globals, the address of which is
treated as a constant in LLVM IR. This caused the incorrect values for
the private variable from being replaced by OpenMPIRBuilder: ultimately
resulting in programs producing incorrect results.

This patch rewrites delayed privatisation for omp.parallel to work more
similarly to reductions: translating directly into LLVMIR with correct
mappings for private variables.

RFC:
https://discourse.llvm.org/t/rfc-openmp-fix-issue-in-mlir-to-llvmir-translation-for-delayed-privatisation/81225

Tested against the gfortran testsuite and our internal test suite.
Linaro's post-commit bots will check against the fujitsu test suite.

I decided to add the new tests as flang integration tests rather than in
mlir/test/Target/LLVMIR:
- The regression test is for an issue filed against flang. i wanted to
keep the reproducer similar to the code in the ticket.
- I found the "worst case" CFG test difficult to reason about in
abstract it helped me to think about what was going on in terms of a
Fortran program.

Fixes #106297
2024-10-16 14:43:57 +01:00