Commit Graph

2412 Commits

Author SHA1 Message Date
Valentin Clement (バレンタイン クレメン)
ab1ee912be [flang][cuda] Remove the need of special compile definition for CUFInit (#124965)
This patch addresses post commit review comments from #124859. 

The extra compile definition is not necessary and goes against the
effort to separate the runtimes from the flang compiler itself. The
function declaration for `CUFInit` can be accessed anyway since the
header are always present. The insertion of the call is only based on
the language feature options from the folding context.

A program compiled with cuda enabled but no cufruntime would just fail
at link time as expected.
2025-01-29 13:12:04 -08:00
Krzysztof Parzyszek
15ab7be2e0 [flang][OpenMP] Parse WHEN, OTHERWISE, MATCH clauses plus METADIRECTIVE (#121817)
Parse METADIRECTIVE as a standalone executable directive at the moment.
This will allow testing the parser code.

There is no lowering, not even clause conversion yet. There is also no
verification of the allowed values for trait sets, trait properties.
2025-01-29 15:07:20 -06:00
Slava Zakharin
bac9575274 [flang] Reset all extents to zero for empty hlfir.elemental loops. (#124867)
An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE`
number of iterations when expanded into a loop nest.
HLFIR transformational operations inlined as hlfir.elemental
may execute slower comparing to Fortran runtime implementation.
This patch adds an option for BufferizeHLFIR pass to reset all
upper bounds in the elemental loop nests to zero, if the result
is an empty array.

A separate patch will enable this option in the driver after I do
more performance testing. The option is off by default now.
2025-01-29 12:03:05 -08:00
Krzysztof Parzyszek
c8593239a3 [flang][OpenMP] Make parsing of trait properties more context-sensitive (#122900)
A trait poperty can be one of serveral alternatives (name, expression,
etc.), and each property in a list was parsed as if it could be any of
these alternatives independently from other properties. This made the
parsing vulnerable to certain ambiguities in the trait grammar (provided
in the OpenMP spec).
At the same time the OpenMP spec gives the expected types of properties
for almost every trait: all properties listed for a given trait are
usually of the same type, e.g. names, clauses, etc.

Incorporate these restrictions into the parser, and additionally use
property extensions as the fallback if the parsing of the expected
property type failed. This is intended to allow the parser to succeed,
and instead let the semantic-checking code emit a more user-friendly
message.
2025-01-29 11:54:52 -06:00
Jean-Didier PAILLEUX
ecc71de53f [flang] Implement IERRNO intrinsic (#124281)
Add the implementation of the IERRNO intrinsic to get the last system
error number, as given by the C errno variable.
This intrinsic is also used in RAMSES
(https://github.com/ramses-organisation/ramses/).
2025-01-29 12:01:07 +01:00
Jean-Didier PAILLEUX
5a34e6fdce [flang] Implement CHDIR intrinsic (#124280)
This intrinsic is a gnu extension
(https://gcc.gnu.org/onlinedocs/gfortran/CHDIR.html) and is used in
FLEUR (https://github.com/JuDFTteam/FLEUR).
2025-01-29 09:44:58 +01:00
Jean-Didier PAILLEUX
e811cb00e5 [flang] Implement !DIR$ UNROLL [N] (#123331)
This patch implements support for the UNROLL directive to control how
many times a loop should be unrolled.
It must be placed immediately before a `DO LOOP` and applies only to the
loop that follows. N is an integer that specifying the unrolling factor.
This is done by adding an attribute to the branch into the loop in LLVM
to indicate that the loop should unrolled.
The code pushed to support the directive `VECTOR ALWAYS` has been
modified to take account of the fact that several directives can be used
before a `DO LOOP`.
2025-01-29 09:44:09 +01:00
Valentin Clement (バレンタイン クレメン)
654b76321a [flang][cuda] Allow to set the stack limit size (#124859)
This patch adds a call to the CUFInit function just after `ProgramStart`
when CUDA Fortran is enabled to initialize the CUDA context. This allows
us to set up some context information like the stack limit that can be
defined by an environment variable `ACC_OFFLOAD_STACKSIZE=<value>`.
2025-01-28 20:57:33 -08:00
vdonaldson
9d8dc45d17 [flang] IEEE underflow control for Arm (#124807)
Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and
IEEE_SET_UNDERFLOW_MODE code for Arm.
2025-01-28 15:41:04 -05:00
Renaud Kauffmann
56a0a7f6d1 [flang][cuda] Adding support for more atomic calls (#124671)
The PR follows the earlier
https://github.com/llvm/llvm-project/pull/123840 PR for atomic operation
support in CUF
2025-01-28 08:36:43 -08:00
Muhammad Omair Javaid
c0861e9cbb Revert "[flang] IEEE underflow control for Arm (#124617)"
This reverts commit c4c76eabb8.

This breaks LLVM build on Windows:
https://lab.llvm.org/buildbot/#/builders/161/builds/4322
2025-01-28 17:32:45 +05:00
Slava Zakharin
c489108912 [flang] Added hlfir.reshape definition/lowering/codegen. (#124226)
Lower Fortran RESHAPE intrinsic into hlfir.reshape, and then lower
hlfir.reshape into a runtime call.
A later patch will add hlfir.reshape inlining as hlfir.elemental.
2025-01-27 18:14:02 -08:00
vdonaldson
c4c76eabb8 [flang] IEEE underflow control for Arm (#124617)
Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and
IEEE_SET_UNDERFLOW_MODE code for Arm.
2025-01-27 15:11:24 -05:00
Peter Klausler
d732c86c92 [flang] Don't take corank from actual intrinsic argument (#124029)
When constructing the characteristics of a particular reference to an
intrinsic procedure that was passed a non-coindexed reference to local
coarray data as an actual argument, don't add the corank of the actual
argument to those characteristics.

Also clean up the TypeAndShape characteristics class a little; the
Attr::Coarray is redundant since the corank() accessor can be used to
the same effect.
2025-01-27 11:57:01 -08:00
Peter Klausler
e252c40210 [flang] Fix spurious error due to bad expression shape calculation (#124323)
GetShape() needed to be called with a FoldingContext in order to
properly construct an extent expression for the shape of an array
constructor whose elements (nested in an implied DO loop) were not
scalars.

Fixes https://github.com/llvm/llvm-project/issues/124191.
2025-01-27 08:59:43 -08:00
Peter Klausler
2625510ef8 [flang] Refine EVENT_TYPE/LOCK_TYPE usage checks (#123244)
The event variable in an EVENT POST/WAIT statement can be a coarray
reference, and need not be an entire coarray.

Variables and potential subobject components with EVENT_TYPE/LOCK_TYPE
must be coarrays, unless they are potential subobjects nested within
coarrays or pointers.
2025-01-27 08:45:11 -08:00
Peter Klausler
038b42ba5b [flang] Safer hermetic module file reading (#121002)
When a hermetic module file is read, use a new scope to hold its
dependent modules so that they don't conflict with any modules in the
global scope.
2025-01-27 08:43:41 -08:00
vdonaldson
3322ba493a Revert "[flang] IEEE underflow control for Arm" (#124570)
Reverts llvm/llvm-project#124170
2025-01-27 10:40:23 -05:00
vdonaldson
3684ec4259 [flang] IEEE underflow control for Arm (#124170)
Update IEEE_SUPPORT_UNDERFLOW_CONTROL, IEEE_GET_UNDERFLOW_MODE, and
IEEE_SET_UNDERFLOW_MODE code for Arm.
2025-01-27 09:18:47 -05:00
Mats Petersson
8035d38daa [Flang][OpenMP]Add parsing support for DISPATCH construct (#121982)
This allows the Flang parser to accept the !$OMP DISPATCH and related
clauses.

Lowering is currently not implemented. Tests for unparse and parse-tree
dump is provided, and one for checking that the lowering ends in a "not
yet implemented"

---------

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
2025-01-26 09:44:04 +00:00
Valentin Clement (バレンタイン クレメン)
48657bf29b [flang][cuda] Handle launch of cooperative kernel (#124362)
Add `CUFLaunchCooperativeKernel` entry points and lower gpu.launch_func
with grid_global attribute to this entry point.
2025-01-24 15:52:05 -08:00
Slava Zakharin
3da7de34a2 [flang][runtime] Disable optimization for traceback related functions. (#124172)
The backtrace may at least print the backtrace name in the call stack,
but this does not happen with the release builds of the runtime.
Surprisingly, specifying "no-omit-frame-pointer" did not work
with GCC, so I decided to fall back to -O0 for these functions.
2025-01-24 08:49:35 -08:00
Jianjian Guan
990837f91d [mlir][arith][tensor] Disable index type for bitcast (#121455)
Fixes #121397.
2025-01-24 16:53:04 +08:00
Valentin Clement (バレンタイン クレメン)
67a8857989 [flang][cuda] Handle pointer allocation with double descriptors (#124183) 2025-01-23 15:54:22 -08:00
Valentin Clement (バレンタイン クレメン)
8c138bee6e [flang][cuda] Handle pointer allocation with source (#124070) 2025-01-23 09:24:06 -08:00
Renaud Kauffmann
652ff20140 [flang][cuda] Adding atomicadd as a cudadevice intrinsic and converting it LLVM dialect (#123840)
With these changes, CUF atomic operations are handled as cudadevice
intrinsics and are converted straight to the LLVM dialect with the
`llvm.atomicrw` operation.

I am only submitting changes for `atomicadd` to gather feedback. If we
are to proceed with these changes I will add support for all other
applicable atomic operations following this pattern.
2025-01-22 19:06:28 -08:00
Valentin Clement (バレンタイン クレメン)
6e498bc2cd [flang][cuda] Handle simple device pointer allocation (#123996) 2025-01-22 15:59:32 -08:00
jeanPerier
662133a278 [flang][OpenMP][OpenACC] remove libEvaluate dependency in passes (#123784)
Move OpenACC/OpenMP helpers from Lower/DirectivesCommon.h that are also
used in OpenACC and OpenMP mlir passes into a new
Optimizer/Builder/DirectivesCommon.h so that parser and evaluate headers
are not included in Optimizer libraries (this both introduce
compile-time and link-time pointless overheads).

This should fix https://github.com/llvm/llvm-project/issues/123377
2025-01-21 20:32:42 +01:00
Kiran Chandramohan
ce32625966 Reland "[Flang][Driver] Add a flag to control zero initialization" (#123606)
Reverts llvm/llvm-project#123330
2025-01-21 07:57:44 +00:00
Valentin Clement (バレンタイン クレメン)
2523d3b102 [flang][cuda] Perform scalar assignment of c_devptr inlined (#123407)
Because `c_devptr` has a `c_ptr` field, any assignment were done via the
Assign runtime function. This leads to stack overflow on the device and
taking too much memory. As we know the c_devptr can be directly copied
on assignment, make it a special case.
2025-01-17 14:34:47 -08:00
Slava Zakharin
71ff486bee Reland "[flang] Inline hlfir.dot_product. (#123143)" (#123385)
This reverts commit afc43a7b62.
+Fixed declaration of hlfir::genExtentsVector().

Some good results for induct2, where dot_product is applied
to a vector of unknow size and a known 3-element vector:
the inlining ends up generating a 3-iteration loop, which
is then fully unrolled. With late FIR simplification
it is not happening even when the simplified intrinsics
implementation is inlined by LLVM (because the loop bounds
are not known).

This change just follows the current approach to expose
the loops for later worksharing application.
2025-01-17 12:09:44 -08:00
Kiran Chandramohan
8a229f595a Revert "Revert "Revert "[Flang][Driver] Add a flag to control zero initializa…" (#123330)
Reverts llvm/llvm-project#123097

Reverting due to buildbot failure
https://lab.llvm.org/buildbot/#/builders/89/builds/14577.
2025-01-17 12:27:58 +00:00
Kiran Chandramohan
8c63648117 Revert "Revert "[Flang][Driver] Add a flag to control zero initializa… (#123097)
…tion of global v…" (#123067)"

This reverts commit 44ba43aa2b.

Adds the flag to bbc as well.
2025-01-17 12:14:20 +00:00
Philip Reames
afc43a7b62 Revert "[flang] Inline hlfir.dot_product. (#123143)"
This reverts commit 9a6433f0ff.  ninja check-flang on x86 host fails to compile.
2025-01-16 17:38:40 -08:00
Slava Zakharin
9a6433f0ff [flang] Inline hlfir.dot_product. (#123143)
Some good results for induct2, where dot_product is applied
to a vector of unknow size and a known 3-element vector:
the inlining ends up generating a 3-iteration loop, which
is then fully unrolled. With late FIR simplification
it is not happening even when the simplified intrinsics
implementation is inlined by LLVM (because the loop bounds
are not known).

This change just follows the current approach to expose
the loops for later worksharing application.
2025-01-16 12:52:59 -08:00
Valentin Clement (バレンタイン クレメン)
12ba74e181 [flang] Do not produce result for void runtime call (#123155)
Runtime function call to a void function are producing a ssa value
because the FunctionType result is set to NoneType with is later
translated to a empty struct. This is not an issue when going to LLVM IR
but it breaks when lowering a gpu module to PTX. This patch update the
RTModel to correctly set the FunctionType result type to nothing.

This is one runtime call before this patch at the LLVM IR dialect step.
```
%45 = llvm.call @_FortranAAssign(%arg0, %1, %44, %4) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> !llvm.struct<()>
```

After the patch the call would be correctly formed
```
llvm.call @_FortranAAssign(%arg0, %1, %44, %4) : (!llvm.ptr, !llvm.ptr, !llvm.ptr, i32) -> ()
```

Without the patch it would lead to error like:
```
ptxas /tmp/mlir-cuda_device_mod-nvptx64-nvidia-cuda-sm_60-e804b6.ptx, line 10; error   : Output parameter cannot be an incomplete array.
ptxas /tmp/mlir-cuda_device_mod-nvptx64-nvidia-cuda-sm_60-e804b6.ptx, line 125; error   : Call has wrong number of parameters
```

The change is pretty much mechanical.
2025-01-16 12:34:38 -08:00
Kareem Ergawy
a0406ce823 [flang][OpenMP] Add hostIsSource paramemter to copyHostAssociateVar (#123162)
This fixes a bug when the same variable is used in `firstprivate` and
`lastprivate` clauses on the same construct. The issue boils down to the
fact that `copyHostAssociateVar` was deciding the direction of the copy
assignment (i.e. the `lhs` and `rhs`) based on whether the
`copyAssignIP`
parameter is set. This is not the best way to do it since it is not
related to whether we doing a copy from host to localized copy or the
other way around. When we set the insertion for `firstprivate` in
delayed privatization, this resulted in switching the direction of the
copy assignment. Instead, this PR adds a new paramter to explicitely
tell
the function the direction of the assignment.

This is a follow up PR for
https://github.com/llvm/llvm-project/pull/122471, only the latest commit
is relevant.
2025-01-16 19:10:12 +01:00
Matthias Springer
f023da12d1 [mlir][IR] Remove factory methods from FloatType (#123026)
This commit removes convenience methods from `FloatType` to make it
independent of concrete interface implementations.

See discussion here:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361

Note for LLVM integration: Replace `FloatType::getF32(` with
`Float32Type::get(` etc.
2025-01-16 08:56:09 +01:00
David Truby
0195ec452e [flang] Add -f[no-]unroll-loops flag (#122906) 2025-01-16 06:43:32 +00:00
Slava Zakharin
3bb969f3eb [flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
  * Get rid of any overhead related to calling runtime MATMUL
    (such as descriptors creation).
  * Use CPU-specific vectorization cost model for matmul loops,
    which Fortran runtime cannot currently do.
  * Optimize matmul of known-size arrays by complete unrolling.

One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.

Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
under an enigneering option for experiments.

At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental`
seems to be a good approach, e.g. it allows getting rid of a temporay
array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`.

This patch improves performance of galgel and tonto a little bit.
2025-01-15 08:42:57 -08:00
vdonaldson
ff862d6de9 [flang] Modifications to ieee floating point environment procedures (#121949)
Intrinsic module procedures ieee_get_modes, ieee_set_modes,
ieee_get_status, and ieee_set_status store and retrieve opaque data
values whose size varies by machine and OS environment. These data
values are usually, but not always small. Their sizes are not directly
known in a cross compilation environment. Address this issue by
implementing two mechanisms for processing these data values.
Environments that use typical small data sizes can access storage
defined at compile time. When this is not valid, data storage of any
size can be allocated at runtime.
2025-01-15 10:55:09 -05:00
Kiran Chandramohan
44ba43aa2b Revert "[Flang][Driver] Add a flag to control zero initialization of global v…" (#123067)
Reverts llvm/llvm-project#122144

Reverting due to CI failure
https://lab.llvm.org/buildbot/#/builders/89/builds/14422
2025-01-15 15:23:34 +00:00
Kiran Chandramohan
c593e3d0f7 [Flang][Driver] Add a flag to control zero initialization of global v… (#122144)
…ariables

Patch adds a flag to control zero initialization of global variables
without default initialization. The default is to zero initialize.
2025-01-15 15:06:57 +00:00
Valentin Clement (バレンタイン クレメン)
a19919f4cd [flang][cuda] Add cuf.device_address operation (#122975)
Introduce a new op to get the device address from a host symbol. This
simplify the current conversion and this is also in preparation for some
legalization work that need to be done in cuf kernel and cuf kernel
launch similar to
https://github.com/llvm/llvm-project/pull/122802
2025-01-14 17:38:04 -08:00
Peter Klausler
ecf264d3b4 [flang] Fix spurious error message due to inaccessible generic binding (#122810)
Generic operator/assignment checks for distinguishable specific
procedures must ignore inaccessible generic bindings.

Fixes https://github.com/llvm/llvm-project/issues/122764.
2025-01-14 13:02:21 -08:00
Peter Klausler
9f0f54a629 [flang] Improve error messages for module self-USE (#122747)
A module can't USE itself, either directly within the top-level module
or from one of its submodules. Add a test for this case (which we
already caught), and improve the diagnostic for the more confusing case
involving a submodule.
2025-01-14 13:00:03 -08:00
Peter Klausler
9696355484 [flang] Better messages and error recovery for a bad RESHAPE (#122604)
Add tests for negative array extents where necessary, motivated by a
compiler crash exposed by yet another fuzzer test, and improve overall
error message quality for RESHAPE().

Fixes https://github.com/llvm/llvm-project/issues/122060.
2025-01-14 12:57:49 -08:00
Peter Klausler
ebec4d6369 [flang] Fix use-after-free cases found by valgrind (#122394)
The expression traversal library needs to use interfaces into triplets
(and substrings) that return pointers to nested expressions, rather than
optional copies of them, since at least one semantic analysis collects a
set of references to some subexpression representation class instances,
and those references obviously can't point to local copies of objects.

Fixes https://github.com/llvm/llvm-project/issues/121999.
2025-01-14 12:56:33 -08:00
Peter Klausler
b720b6cbe9 [flang] Fix crash from fuzzy test. (#122364)
Fixes https://github.com/llvm/llvm-project/issues/122002.
2025-01-14 12:56:03 -08:00
Razvan Lupusoru
01a0d212a6 [flang][acc] Implement MappableType interfaces for fir.box and fir.array (#122495)
The newly introduced MappableType interface in `acc` dialect was
primarily intended to allow variables with non-materialized storage to
be used in acc data clauses (previously everything was required to be
`pointer-like`). One motivator for this was `fir.box` since it is
possible to be passed to functions without a wrapping `fir.ref` and also
it can be generated directly via operations like `fir.embox` - and
unlike other variable representations in FIR, the underlying storage for
it does not get materialized until LLVM codegen.

The new interface is being attached to both `fir.box` and `fir.array`.
Strictly speaking, attaching to the latter is primarily for consistency
since the MappableType interface requires implementation of utilities to
compute byte size - and it made sense that a
`fir.box<fir.array<10xi32>>` and `fir.array<10xi32>` would have a
consistently computable size. This decision may be revisited as
MappableType interface evolves.

The new interface attachments are made in a new library named
`FIROpenACCSupport`. The reason for this is to avoid circular
dependencies since the implementation of this library is reusing code
from lowering of OpenACC. More specifically, the types are defined in
`FIRDialect` and `FortranLower` depends on it. Thus we cannot attach
these interfaces in `FIRDialect`.
2025-01-14 10:42:57 -08:00