Commit Graph

134 Commits

Author SHA1 Message Date
Slava Zakharin
36fdeb2ade [flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)
This change is inspired by a case in facerec benchmark, where
performance
of scalar code may improve by about 6%@aarch64 due to getting rid of
redundant
loads from Fortran descriptors. These descriptors are corresponding
to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest
in LocalMove subroutine contains call to Fortran runtime IO functions,
and LLVM globals-aa analysis cannot prove that these calls do not modify
the globalized descriptors with internal linkage.

This patch sets and propagates llvm.memory_effects attribute for
fir.call
operations calling Fortran runtime functions. In particular, it tries
to set the Other memory effect to NoModRef. The Other memory effect
includes accesses to globals and captured pointers, so we cannot set
it for functions taking Fortran descriptors with one exception
for calls where the Fortran descriptor arguments are all null.

As long as different calls to the same Fortran runtime function may have
different attributes, I decided to attach the attributes to the calls
rather than functions. Moreover, attaching the attributes to func.func
will require propagating these attributes to llvm.func, which is not
happening right now.

In addition to llvm.memory_effects, the new pass sets llvm.nosync
and llvm.nocallback attributes that may also help LLVM alias analysis
(e.g. see #127707). These attributes are ignored currently.
I will support them in LLVM IR dialect in a separate patch.

I also added another pass for developers to be able to print
declarations/calls of all Fortran runtime functions that are recognized
by the attributes setting pass. It should help with maintenance
of the LIT tests.
2025-02-24 09:27:48 -08:00
Krzysztof Parzyszek
d553e5d4b6 [flang] Fix build break after bac9575274
.../flang/lib/Optimizer/Builder/FIRBuilder.cpp: In function ‘llvm::Small
Vector<mlir::Value> fir::factory::updateRuntimeExtentsForEmptyArrays(fir
::FirOpBuilder&, mlir::Location, mlir::ValueRange)’:
.../flang/lib/Optimizer/Builder/FIRBuilder.cpp:1786:10: error: could not
 convert ‘newExtents’ from ‘SmallVector<[...],15>’ to ‘SmallVector<[...]
,6>’
 1786 |   return newExtents;
      |          ^~~~~~~~~~
      |          |
      |          SmallVector<[...],15>

Remove size from template parameters in the declaration of `newExtents`.
2025-01-30 07:14:16 -06:00
Slava Zakharin
bac9575274 [flang] Reset all extents to zero for empty hlfir.elemental loops. (#124867)
An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE`
number of iterations when expanded into a loop nest.
HLFIR transformational operations inlined as hlfir.elemental
may execute slower comparing to Fortran runtime implementation.
This patch adds an option for BufferizeHLFIR pass to reset all
upper bounds in the elemental loop nests to zero, if the result
is an empty array.

A separate patch will enable this option in the driver after I do
more performance testing. The option is off by default now.
2025-01-29 12:03:05 -08:00
Valentin Clement (バレンタイン クレメン)
05fd4d5775 [flang][cuda] Perform inlined assignment when field is c_devptr (#124322)
When a field in a derived type is `c_devptr`, keep check if we can do a
memcpy instead of falling back to the runtime assignment.

Many internal CUDA Fortran derived type have a `c_devptr` field and this
would lead to stack overflow on the device if the assignment is
performed by the runtime function.
2025-01-24 14:32:07 -08:00
Valentin Clement (バレンタイン クレメン)
2523d3b102 [flang][cuda] Perform scalar assignment of c_devptr inlined (#123407)
Because `c_devptr` has a `c_ptr` field, any assignment were done via the
Assign runtime function. This leads to stack overflow on the device and
taking too much memory. As we know the c_devptr can be directly copied
on assignment, make it a special case.
2025-01-17 14:34:47 -08:00
Matthias Springer
f023da12d1 [mlir][IR] Remove factory methods from FloatType (#123026)
This commit removes convenience methods from `FloatType` to make it
independent of concrete interface implementations.

See discussion here:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361

Note for LLVM integration: Replace `FloatType::getF32(` with
`Float32Type::get(` etc.
2025-01-16 08:56:09 +01:00
Slava Zakharin
3bb969f3eb [flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
  * Get rid of any overhead related to calling runtime MATMUL
    (such as descriptors creation).
  * Use CPU-specific vectorization cost model for matmul loops,
    which Fortran runtime cannot currently do.
  * Optimize matmul of known-size arrays by complete unrolling.

One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.

Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
under an enigneering option for experiments.

At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental`
seems to be a good approach, e.g. it allows getting rid of a temporay
array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`.

This patch improves performance of galgel and tonto a little bit.
2025-01-15 08:42:57 -08:00
Valentin Clement (バレンタイン クレメン)
878a57468b [flang][cuda] Add c_devloc as intrinsic and inline it during lowering (#120648)
Add `c_devloc` as intrinsic and inline it during lowering. `c_devloc` is
used in CUDA Fortran to get the address of device variables.

For the moment, we borrow almost all semantic checks from `c_loc` except
for the pointer or target restriction. The specifications of `c_devloc`
are are pretty vague and we will relax/enforce the restrictions based on
library and apps usage comparing them to the reference compiler.
2025-01-08 11:23:05 -08:00
agozillon
e508bacce4 [Flang][OpenMP] Derived type explicit allocatable member mapping (#113557)
This PR is one of 3 in a PR stack, this is the primary change set which
seeks to extend the current derived type explicit member mapping support
to handle descriptor member mapping at arbitrary levels of nesting. The
PR stack seems to do this reasonably (from testing so far) but as you
can create quite complex mappings with derived types (in particular when
adding allocatable derived types or arrays of allocatable derived types)
I imagine there will be hiccups, which I am more than happy to address.
There will also be further extensions to this work to handle the
implicit auto-magical mapping of descriptor members in derived types and
a few other changes planned for the future (with some ideas on
optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and the
OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of these
types or do a "deep copy" for lack of a better term, where we map both
the base address and the descriptor as without the copying of both of
these we lack the information in the case of the descriptor to access
the member or attach the pointers data to the pointer and in the latter
case we require the base address to map the chunk of data. Currently we
do not segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future
as it adds a fair bit of a performance penalty to having nestings of
allocatable derived types as an example. The process of mapping all
intermediate descriptor members in a members path only occurs if a
member has an allocatable or object parent in its symbol path or the
member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate
the appropriate address for the allocatable member within the derived
type to use as a the varPtr field of the map (for intermediate
allocatable maps and final allocatable mappings). In this case it's
necessary as we can't utilise the usual Fortran::lower functionality
such as gatherDataOperandAddrAndBounds without causing issues later in
the lowering due to extra allocas being spawned which seem to affect the
pointer attachment (at least this is my current assumption, it results
in memory access errors on the device due to incorrect map information
generation). This is similar to why we do not use the MLIR value
generated for this and utilise the original symbol provided when mapping
descriptor types external to derived types. Hopefully this can be
rectified in the future so this function can be simplified and more
closely aligned to the other type mappings. We also make use of
fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version
doesn't support the appropriate lowering to FIR necessary at the moment,
we also cannot use a single CoordinateOp (similarly to a single GEP) as
when we index through a descriptor operation (BoxType) we encounter
issues later in the lowering, however in either case we need access to
intermediate descriptors so individual CoordinateOp's aid this
(although, being able to compress them into a smaller amount of
CoordinateOp's may simplify the IR and perhaps result in a better end
product, something to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box types
(or multiple nestings of box types) within derived types, or box type
derived types. This requires expanding each BoxType mapping from one
into two maps and then modifying all of the existing member indices of
the overarching parent mapping to account for the addition of these new
members alongside adjusting the existing member indices to support the
addition of these new maps which extend the original member indices (as
a base address of a box type is currently considered a member of the box
type at a position of 0 as when lowered to LLVM-IR it's a pointer
contained at this position in the descriptor type, however, this means
extending mapped children of this expanded descriptor type to
additionally incorporate the new member index in the correct location in
its own index list). I believe there is a reasonable amount of comments
that should aid in understanding this better, alongside the test
alterations for the pass.

A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr containing the member
indices shareable across the lowering and OMPMapInfoFinalization, this
required moving some functions to the Lower/Support/Utils.h header, and
transforming the lowering structure containing the member index data
into something more similar to the version used in
OMPMapInfoFinalization. There we also some other attempts at tidying
things up in relation to the member index data generation in the
lowering, some of which required creating a logical operator for the
OpenMP ID class so it can be utilised as a map key (it simply utilises
the symbol address for the moment as ordering isn't particularly
important).

Otherwise I have added a set of new tests encompassing some of the
mappings currently supported by this PR (unfortunately as you can have
arbitrary nestings of all shapes and types it's not very feasible to
cover them all).
2024-11-16 12:28:37 +01:00
Leandro Lupori
390943f25b [flang] Implement conversion of compatible derived types (#111165)
With some restrictions, BIND(C) derived types can be converted to
compatible BIND(C) derived types.
Semantics already support this, but ConvertOp was missing the
conversion of such types.

Fixes https://github.com/llvm/llvm-project/issues/107783
2024-10-09 10:37:46 -03:00
jeanPerier
1753de2d95 [flang][FIR] remove fir.complex type and its fir.real element type (#111025)
Final patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292

Since fir.real was only still used as fir.complex element type, this
patch removes it at the same time.
2024-10-04 09:57:03 +02:00
jeanPerier
c4204c0b29 [flang] replace fir.complex usages with mlir complex (#110850)
Core patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292.
After that, the last step is to remove fir.complex from FIR types.
2024-10-03 17:10:57 +02:00
Yusuke MINATO
b91a25ef58 [flang] add nsw to operations in subscripts (#110060)
This patch adds nsw to operations when lowering subscripts.

See also the discussion in the following discourse post.
https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/9
2024-10-03 10:56:01 +09:00
jeanPerier
e6618aae43 [flang] fix ignore_tkr(tk) with character dummy (#108168)
The test code with ignore_tkr(tk) on character dummy passed by
fir.boxchar<> was crashing the compiler in [an
assert](2afe678f0a/flang/lib/Optimizer/Dialect/FIRType.cpp (L632))
in `changeElementType`.

It makes little sense to call changeElementType on a fir.boxchar since
this type is lossy (the shape is not part of it). Just skip it in the
code dealing with ignore(tk) when hitting this case
2024-09-16 16:27:11 +02:00
Tom Eccles
5aaf384b16 [flang][NFC] use llvm.intr.stacksave/restore instead of opaque calls (#108562)
The new LLVM stack save/restore intrinsic operations are more convenient
than function calls because they do not add function declarations to the
module and therefore do not block the parallelisation of passes.
Furthermore they could be much more easily marked with memory effects
than function calls if that ever proved useful.

This builds on top of #107879.

Resolves #108016
2024-09-16 12:33:37 +01:00
Valentin Clement (バレンタイン クレメン)
e67a6667dc [flang][cuda] Avoid extra load in c_f_pointer lowering with c_devptr (#108090)
Remove unnecessary load of the `cptr` component when getting the
`__address`. `fir.coordinate_of` operation can be chained so the load is
not needed.
2024-09-10 19:33:33 -07:00
Valentin Clement (バレンタイン クレメン)
cd8229bb4b [flang][cuda] Support c_devptr in c_f_pointer intrinsic (#107470)
This is an extension of CUDA Fortran. The iso_c_binding intrinsic can
accept a `TYPE(c_devptr)` as its first argument. This patch relax the
semantic check to accept it and update the lowering to unwrap the cptr
field from the c_devptr.
2024-09-09 10:32:35 -07:00
Valentin Clement (バレンタイン クレメン)
5bb379f6f0 [flang][cuda] Fix allocation of descriptor for cray pointer (#103474)
The cray pointee descriptor with device attribute was not allocated with
cuf.alloc so it leads to error on deallocation with cuf.free.
2024-08-13 16:52:00 -07:00
khaki3
26d92826a5 [mlir][flang] Add an interface of OpenACC compute regions for further getAllocaBlock support (#100675)
This PR implements `ComputeRegionOpInterface` to define `getAllocaBlock`
of OpenACC loop and compute constructs (parallel/kernels/serial). The
primary objective here is to accommodate local variables in OpenACC
compute regions. The change in `fir::FirOpBuilder::getAllocaBlock`
allows local variable allocation inside loops and kernels.
2024-07-26 13:52:27 -07:00
jeanPerier
1ead51a86c [flang] fix C_PTR function result lowering (#100082)
Functions returning C_PTR were lowered to function returning intptr (i64
on 64bit arch). This caused conflicts when these functions were defined
as returning !fir.ref<none>/llvm.ptr in other compiler generated
contexts (e.g., malloc).

Lower them to return !fir.ref<none>.

This should deal with https://github.com/llvm/llvm-project/issues/97325
and https://github.com/llvm/llvm-project/issues/98644.
2024-07-24 10:24:04 +02:00
jeanPerier
31087c5e4c [flang] handle alloca outside of entry blocks in MemoryAllocation (#98457)
This patch generalizes the MemoryAllocation pass (alloca -> heap) to
handle fir.alloca regardless of their postion in the IR. Currently, it
only dealt with fir.alloca in function entry blocks. The logic is placed
in a utility that can be used to replace alloca in an operation on
demand to whatever kind of allocation the utility user wants via
callbacks (allocmem, or custom runtime calls to instrument the code...).

To do so, a concept of ownership, that was already implied a bit and
used in passes like stack-reclaim, is formalized. Any operation with the
LoopLikeInterface, AutomaticAllocationScope, or IsolatedFromAbove owns
the alloca directly nested inside its regions, and they must not be used
after the operation.

The pass then looks for the exit points of region with such interface,
and use that to insert deallocation. If dominance is not proved, the
pass fallbacks to storing the new address into a C pointer variable
created in the entry of the owning region which allows inserting
deallocation as needed, included near the alloca itself to avoid leaks
when the alloca is executed multiple times due to block CFGs loops.

This should fix https://github.com/llvm/llvm-project/issues/88344.

In a next step, I will try to refactor lowering a bit to introduce
lifetime operation for alloca so that the deallocation points can be
inserted as soon as possible.
2024-07-17 09:15:47 +02:00
jeanPerier
66d5ca2a3d Reland "[flang] add extra component information in fir.type_info" (#97404)
Reland #96746 with the proper Support/CMakelist.txt change.

fir.type does not contain all Fortran level information about
components. For instance, component lower bounds and default initial
value are lost. For correctness purpose, this does not matter because
this information is "applied" in lowering (e.g., when addressing the
components, the lower bounds are reflected in the hlfir.designate).

However, this "loss" of information will prevent the generation of
correct debug info for the type (needs to know about lower bounds). The
initial value could help building some optimization pass to get rid of
initialization runtime calls.

This patch adds lower bound and initial value information into
fir.type_info via a new fir.dt_component operation. This operation is
generated only for component that needs it, which helps keeping the IR
small for "boring" types.

In general, adding Fortran level info in fir.type_info will allow
delaying the generation of "type descriptors" gobals that are very
verbose in FIR and make it hard to work with FIR dumps from applications
with many derived types.
2024-07-02 15:19:49 +02:00
jeanPerier
6a66b8224d Revert "[flang] add extra component information in fir.type_info" (#96937)
Reverts llvm/llvm-project#96746
Breaking shared library buillds:
https://lab.llvm.org/buildbot/#/builders/89/builds/931
2024-06-27 19:22:48 +02:00
jeanPerier
1448ed2000 [flang] add extra component information in fir.type_info (#96746)
fir.type does not contain all Fortran level information about
components. For instance, component lower bounds and default initial
value are lost. For correctness purpose, this does not matter because
this information is "applied" in lowering (e.g., when addressing the
components, the lower bounds are reflected in the hlfir.designate).

However, this "loss" of information will prevent the generation of
correct debug info for the type (needs to know about lower bounds). The
initial value could help building some optimization pass to get rid of
initialization runtime calls.

This patch adds lower bound and initial value information into
fir.type_info via a new fir.dt_component operation. This operation is
generated only for component that needs it, which helps keeping the IR
small for "boring" types.

In general, adding Fortran level info in fir.type_info will allow
delaying the generation of "type descriptors" gobals that are very
verbose in FIR and make it hard to work with FIR dumps from applications
with many derived types.
2024-06-27 18:59:03 +02:00
Kareem Ergawy
d0413438ec [flang][OpenMP] Handle omp.private in FirOpBuilder::getAllocaBlock() (#93927)
Fixes a crash uncovered by
[pr89651](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr89651.f90)
in the test suite.

Fixes a crash caused by missing handling of `omp.private` ops in
`FirOpBuilder::getAllocaBlock()`.
2024-06-04 05:03:39 +02:00
Valentin Clement (バレンタイン クレメン)
45daa4fdc6 [flang][cuda] Move CUDA Fortran operations to a CUF dialect (#92317)
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.

The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
2024-05-17 09:37:53 -07:00
Valentin Clement (バレンタイン クレメン)
26060de063 [flang][cuda] Lower device/managed/unified allocation to cuda ops (#90623)
Lower locals allocation of cuda device, managed and unified variables to
fir.cuda_alloc. Add fir.cuda_free in the function context finalization.

@vzakhari For some reason the PR #90526 has been closed when I merged PR
#90525. Just reopening one.
2024-05-02 14:32:53 -07:00
Christian Sigg
fac349a169 Reapply "[mlir] Mark isa/dyn_cast/cast/... member functions depreca… (#90406)
…ted. (#89998)" (#90250)

This partially reverts commit 7aedd7dc75.

This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
2024-04-28 22:01:42 +02:00
dyung
7aedd7dc75 Revert "[mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)" (#90250)
This reverts commit 950b7ce0b8.

This change is causing build failures on a bot
https://lab.llvm.org/buildbot/#/builders/216/builds/38157
2024-04-26 12:09:13 -07:00
Christian Sigg
950b7ce0b8 [mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)
See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-26 16:28:30 +02:00
Tom Eccles
8cc34fadec [flang][OpenMP] Support reduction of allocatable variables (#88392)
Both arrays and trivial scalars are supported. Both cases must use
by-ref reductions because both are boxed.

My understanding of the standards are that OpenMP says that this should
follow the rules of the intrinsic reduction operators in fortran, and
fortran says that unallocated allocatable variables can only be
referenced to allocate them or test if they are already allocated.
Therefore we do not need a null pointer check in the combiner region.
2024-04-23 10:34:28 +01:00
jeanPerier
8ddfb66903 [flang] Fix MASKR/MASKL lowering for INTEGER(16) (#87496)
The all one masks was not properly created for i128 types because
builder.createIntegerConstant ended-up truncating -1 to something
positive.

Add a builder.createAllOnesInteger/createMinusOneInteger helpers and use
them where createIntegerConstant(..., -1) was used.
Add an assert in createIntegerConstant to catch negative numbers for
i128 type.
2024-04-08 10:18:56 +02:00
jeanPerier
a4798bb0b6 [flang][NFC] use mlir::SymbolTable in lowering (#86673)
Whenever lowering is checking if a function or global already exists in
the mlir::Module, it was doing module->lookup.

On big programs (~5000 globals and functions), this causes important
slowdowns because these lookups are linear. Use mlir::SymbolTable to
speed-up these lookups. The SymbolTable has to be created from the
ModuleOp and maintained in sync. It is therefore placed in the
converter, and FirOPBuilders can take a pointer to it to speed-up the
lookups.

This patch does not bring mlir::SymbolTable to FIR/HLFIR passes, but
some passes creating a lot of runtime calls could benefit from it too.
More analysis will be needed.

As an example of the speed-ups, this patch speeds-up compilation of
Whizard compare_amplitude_UFO.F90 from 5 mins to 2 mins on my machine
(there is still room for speed-ups).
2024-04-02 14:29:29 +02:00
Sergio Afonso
d84252e064 [MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393)
This patch proposes the renaming of certain OpenMP dialect operations with the
goal of improving readability and following a uniform naming convention for
MLIR operations and associated classes. In particular, the following operations
are renamed:

- `omp.map_info` -> `omp.map.info`
- `omp.target_update_data` -> `omp.target_update`
- `omp.ordered_region` -> `omp.ordered.region`
- `omp.cancellationpoint` -> `omp.cancellation_point`
- `omp.bounds` -> `omp.map.bounds`
- `omp.reduction.declare` -> `omp.declare_reduction`

Also, the following MLIR operation classes have been renamed:

- `omp::TaskLoopOp` -> `omp::TaskloopOp`
- `omp::TaskGroupOp` -> `omp::TaskgroupOp`
- `omp::DataBoundsOp` -> `omp::MapBoundsOp`
- `omp::DataOp` -> `omp::TargetDataOp`
- `omp::EnterDataOp` -> `omp::TargetEnterDataOp`
- `omp::ExitDataOp` -> `omp::TargetExitDataOp`
- `omp::UpdateDataOp` -> `omp::TargetUpdateOp`
- `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp`
- `omp::WsLoopOp` -> `omp::WsloopOp`
2024-03-20 11:19:38 +00:00
Tom Eccles
e12b46fef7 [flang] support fir.alloca operations inside of omp reduction ops (#84952)
Advise to place the alloca at the start of the first block of whichever
region (init or combiner) we are currently inside.

It probably isn't safe to put an alloca inside of a combiner region
because this will be executed multiple times. But that would be a bug to
fix in Lower/OpenMP.cpp, not here.

OpenMP array reductions 1/6
Next PR: https://github.com/llvm/llvm-project/pull/84953
2024-03-15 11:46:12 +00:00
Tom Eccles
860a40057d [flang][NFC] move loadIfRef to FIRBuilder (#84306)
This will be useful for OpenMP too.

I changed the definition slightly to use `fir::isa_ref_type` (which also
includes llvm pointers) because I think it reads better using the common
type helpers. There shouldn't be any llvm pointers in lowering so this
isn't a functional change.
2024-03-08 10:33:43 +00:00
jeanPerier
06f775a82f [flang] Give internal linkage to internal procedures (#81929)
Internal procedures cannot be called directly from outside the host
procedure, so there is no point giving them external linkage. The only
reason flang did is because it is the default in MLIR.

Giving external linkage to them:
- prevents deleting them when not used/inlined by LLVM
- causes bugs with shared libraries (at least on linux x86-64) because
the call to the internal function could lead to a dynamic loader call
that would overwrite r10 register (the static chain pointer) due to
system calls and did not restore (it seems it does not expect r10 to be
used for PLT calls).

This patch gives internal linkage to internal procedures:

Note: the llvm.linkage attribute name cannot be obtained via a
getLinkageAttrName since it is not the same name as the one used in the
LLVM dialect. It is just a placeholder defined in
mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp until the func dialect
gets a real linkage model. So simply avoid hard coding it too many times
in lowering.
2024-02-28 14:30:29 +01:00
Valentin Clement (バレンタイン クレメン)
7ff488708c [flang][cuda][NFC] Rename CUDAAttribute to CUDADataAttribute (#81323)
The newly introduced `CUDAAttribute` is meant for CUDA attributes
associated with variable. In order to not clash with the future
attribute for function/subroutine, rename `CUDAAttribute` to
`CUDADataAttribute`.
2024-02-09 13:57:26 -08:00
Valentin Clement (バレンタイン クレメン)
314ef9617e [flang][cuda] Lower attribute for module variables (#81226)
Propagate the CUDA attribute to fir.global operation for simple module
variables.
2024-02-09 10:41:37 -08:00
jeanPerier
a49f630cf6 [flang] Lower passing non assumed-rank/size to assumed-ranks (#79145)
Start implementing assumed-rank support as described in
https://github.com/llvm/llvm-project/blob/main/flang/docs/AssumedRank.md

This commit holds the minimal support for lowering calls to procedure
with assumed-rank arguments where the procedure implementation is done
in C.

The case for passing assumed-size to assumed-rank is left TODO since it
will be done a change in assumed-size lowering that is better done in
another patch.

Care is taken to set the lower bounds to zero when passing non allocatable no pointer as descriptor
to a BIND(C) procedure as required per 18.5.3 point 3. This was not done before while the requirements also applies to non assumed-rank descriptors. This change  required special attention with IGNORE_TKR(t) to avoid emitting invalid fir.rebox operations (the actual argument type must be used in this case as the output type). 

Implementation of Fortran procedure with assumed-rank arguments is still
TODO.
2024-01-26 16:01:51 +01:00
Daniel Chen
af09219edd [Flang] Add partial support for lowering procedure pointer assignment. (#70461)
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.

**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.

**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.

Note: this commit is actually resubmitting the original commit from
PR #70461 that was reverted. See PR #73221.
2023-11-23 13:43:35 +01:00
Muhammad Omair Javaid
49f55d1075 Revert "[Flang] Add partial support for lowering procedure pointer assignment. (#70461)"
This reverts commit e07fec10ac.

This change appears to have broken following buildbots:
https://lab.llvm.org/buildbot/#/builders/176
https://lab.llvm.org/buildbot/#/builders/179
https://lab.llvm.org/buildbot/#/builders/184
https://lab.llvm.org/buildbot/#/builders/197
https://lab.llvm.org/buildbot/#/builders/198

All bots fails in testsuite where following tests seems broken:
(eg: https://lab.llvm.org/buildbot/#/builders/176/builds/7131)

test-suite::gfortran-regression-compile-regression__proc_ptr_46_f90.test
test-suite::gfortran-regression-compile-regression__proc_ptr_37_f90.test
2023-11-23 12:30:40 +05:00
Daniel Chen
e07fec10ac [Flang] Add partial support for lowering procedure pointer assignment. (#70461)
**Scope of the PR:**
1. Lowering global and local procedure pointer declaration statement
with explicit or implicit interface. The explicit interface can be from
an interface block, a module procedure or an internal procedure.
2. Lowering procedure pointer assignment, where the target procedure
could be external, module or internal procedures.
3. Lowering reference to procedure pointers so that it works end to end.

**PR notes:**
1. The first commit of the PR does not include testing. I would like to
collect some comments first, which may alter the output. Once I confirm
the implementation, I will add some testing as a follow up commit to
this PR.
2. No special handling of the host-associated entities when an internal
procedure is the target of a procedure pointer assignment in this PR.

**Implementation notes:**
1. The implementation is using the HLFIR path.
2. Flang currently uses `getUntypedBoxProcType` to get the
`fir::BoxProcType` for `ProcedureDesignator` when getting the address of
a procedure in order to pass it as an actual argument. This PR inherits
the same design decision for procedure pointer as the `fir::StoreOp`
requires the same memory type.
2023-11-22 11:51:12 -05:00
Fabian Mora
fd389f46de [flang] Change uniqueCGIdent separator from . to X (#71338)
Change the separator in the `uniqueCGIdent` method to `X`. This change
is required to enable OpenMP offloading for the NVPTX target, as dots
are not valid identifiers in PTX and `uniqueCGIdent` is used to mangle
some literals. Follow up patches will change the remainder of `.`
appearances in names to `X` and add support for the NVPTX target.
2023-11-08 15:04:00 -05:00
Slava Zakharin
86b44f3760 [flang][openacc] Added acc::RecipeInterface for getting alloca insertion point. (#68464)
Conversion of `hlfir.assign` operations inside OpenACC recipe operations
may result in `fir.alloca` insertion. FIRBuilder can only handle
alloca insertion inside FuncOp's and outlineable OpenMP operations.
I added a simple interface for OpenACC recipe operations that have
executable code inside all their regions, and alloca may be inserted
into the entry blocks of those regions always.

With our current approach the OptimizedBufferization pass is supposed
to lower these `hlfir.assign` operations into loops, because there
should not be conflicts between lhs/rhs. The pass is currently
only working on FuncOp, and this is why it does not optimize
`hlfir.assign` inside the recipes. I will fix it in a separate commit.

Since we run OptimizedBufferization only at >O0, these changes
should still be useful.

Note that the OpenACC codegen that applies the recipes should be aware
of potential alloca operations and produce appropriate stack clean-ups.
2023-10-09 10:49:52 -07:00
jeanPerier
87b2682ad2 [flang][hlfir] use fir.type_info to skip runtime call if nofinal is set (#68397)
HLFIR was always calling Destroy runtime when doing derived type scalar
assignments because the IR did not contain the info of whether
finalization was needed or not.

This info is now available in fir.type_info which allow skipping the
runtime call when not needed.

Also, when finalization is needed, simply use Assign runtime. This makes
no difference from a semantic point of view with the current code that
generated a call to Destroy and did the assignment inline, but if some
piece of runtime must be called anyway, it is simpler to just call
Assign that deals with everything.
2023-10-09 09:27:08 +02:00
jeanPerier
4ccd57ddb1 [flang][nfc] replace fir.dispatch_table with more generic fir.type_info (#68309)
The goal is to progressively propagate all the derived type info that is
currently in the runtime type info globals into a FIR operation that can
be easily queried and used by FIR/HLFIR passes.

When this will be complete, the last step will be to stop generating the
runtime info global in lowering, but to do that later in or just before
codegen to keep the FIR files readable (on the added type-info.f90
tests, the lowered runtime info globals takes a whooping 2.6 millions
characters on 1600 lines of the FIR textual output. The fir.type_info that
contains all the info required to generate those globals for such
"trivial" types takes 1721 characters on 9 lines).

So far this patch simply starts by replacing the fir.dispatch_table
operation by the fir.type_info operation and to add the noinit/
nofinal/nodestroy flags to it. These flags will soon be used in HLFIR to
better rewrite hlfir.assign with derived types.
2023-10-06 09:29:57 +02:00
Kiran Chandramohan
90f58eb37b [Flang][OpenMP] Fix loop index privatisation in Parallel region in HLFIR
HLFIR lowering always adds hlfir.declare when symbols are bound to their
address allocated on the stack. Ensure that the declare is placed along
with the alloca if it is hoisted. And always return the mlir value that
is bound to the symbol (i.e the alloca in FIR lowering and the declare
in HLFIR lowering).

Context: Loop index variables in OpenMP parallel regions should be
privatised to work correctly.

Reviewed By: tblah

Differential Revision: https://reviews.llvm.org/D158594
2023-09-01 10:59:14 +00:00
Slava Zakharin
b63698727d [flang][hlfir] Fixed finalization in hlfir.assign codegen.
When hlfir.assign is lowered into simple load/store,
we may still need to finalize the LHS.
The patch passes `needFinalization` to `genScalarAssignment`
for LHS of any derived type, so some `Destroy` calls
might be redundant. They can be removed later by propagating/deducing
IsFinalizable information about the LHS type.

Reviewed By: clementval

Differential Revision: https://reviews.llvm.org/D155664
2023-07-19 14:38:31 -07:00
David Truby
8cb0c3bb21 [flang] Add COMDAT to global variables where needed
On platforms which support COMDAT sections we should use them when
linkonce or linkonce_odr linkage is requested. This is required on
Windows (PE/COFF) and provides better behaviour than weak symbols on
ELF-based platforms.

This patch also reverts string literals to use linkonce instead of
internal linkage now that comdats are supported.

Differential Revision: https://reviews.llvm.org/D153768
2023-06-28 13:49:30 +01:00