Commit Graph

152 Commits

Author SHA1 Message Date
Kareem Ergawy
b1774222c7 [flang] Emit fir.global in the global address space (#146653)
Instead of emitting globals in the program/default address space, emit
them in the global address space. This also requires changes how address
of code-gen is handled, we need to cast to the default address space to
prevent code-gen issues.
2025-07-02 17:15:22 +02:00
Kareem Ergawy
59d6fbb8ff [flang][fir] Provide allocation block for fir.local when required (#144521)
Extends `fir::FirOpBuilder::getAllocaBlock()` to support `fir.local`.
This allows us to retrieve an allocation block when needed for
`fir.local`.
2025-06-18 10:24:08 +02:00
jeanPerier
6a41f53c39 [flang][hlfir] do not propagate polymorphic temporary as allocatables (#142609)
Polymorphic temporary are currently propagated as
fir.ref<fir.class<fir.heap<>>> because their allocation may be delayed
to the hlfir.assign copy (using realloc).

This patch moves away from this and directly allocate the temp and
propagate it as a fir.class.

The polymorphic temporaries creating is also simplified by avoiding the
need to call the runtime to setup the descriptor altogether (the runtime
is still call for the allocation currently because alloca/allocmem do
not support polymorphism).
2025-06-06 09:53:41 +02:00
jeanPerier
1f5b6ae89f [flang] optionally add lifetime markers to alloca created in stack-arrays (#140901)
Flang at Ofast usually produces executables that consume more stack that
other Fortran compilers.
This is in part because the alloca created from temporary heap
allocation by the StackArray pass are created at the function scope
level without lifetimes, and LLVM does not/is not able to merge alloca
that do not have overlapping lifetimes.

This patch adds an option to generate LLVM lifetime in the StackArray
pass at the previous heap allocation/free using the LLVM dialect
operation for it.
2025-05-22 09:26:14 +02:00
Kareem Ergawy
2fb288d4b8 [flang][fir] Lower do concurrent loop nests to fir.do_concurrent (#137928)
Adds support for lowering `do concurrent` nests from PFT to the new
`fir.do_concurrent` MLIR op as well as its special terminator
`fir.do_concurrent.loop` which models the actual loop nest.

To that end, this PR emits the allocations for the iteration variables
within the block of the `fir.do_concurrent` op and creates a region for
the `fir.do_concurrent.loop` op that accepts arguments equal in number
to the number of the input `do concurrent` iteration ranges.

For example, given the following input:
```fortran
   do concurrent(i=1:10, j=11:20)
   end do
```
the changes in this PR emit the following MLIR:
```mlir
    fir.do_concurrent {
      %22 = fir.alloca i32 {bindc_name = "i"}
      %23:2 = hlfir.declare %22 {uniq_name = "_QFsub1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      %24 = fir.alloca i32 {bindc_name = "j"}
      %25:2 = hlfir.declare %24 {uniq_name = "_QFsub1Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      fir.do_concurrent.loop (%arg1, %arg2) = (%18, %20) to (%19, %21) step (%c1, %c1_0) {
        %26 = fir.convert %arg1 : (index) -> i32
        fir.store %26 to %23#0 : !fir.ref<i32>
        %27 = fir.convert %arg2 : (index) -> i32
        fir.store %27 to %25#0 : !fir.ref<i32>
      }
    }
```
2025-05-07 12:52:25 +02:00
Asher Mancinelli
8836bce842 [flang] Add lowering of volatile references (#132486)
[RFC on
discourse](https://discourse.llvm.org/t/rfc-volatile-representation-in-flang/85404/1)

Flang currently lacks support for volatile variables. For some cases,
the compiler produces TODO error messages and others are ignored. Some
of our tests are like the example from _C.4 Clause 8 notes: The VOLATILE
attribute (8.5.20)_ and require volatile variables.

Prior commits:
```
c9ec1bc753 [flang] Handle volatility in lowering and codegen (#135311)
e42f860985 [flang][nfc] Support volatility in Fir ops (#134858)
b2711e1526 [flang][nfc] Support volatile on ref, box, and class types (#134386)
```
2025-04-30 08:46:33 -07:00
Kareem Ergawy
30990c09c9 Revert "[flang][fir] Lower do concurrent loop nests to fir.do_concurrent (#132904)" (#135904)
This reverts commit 04b87e15e4.

The reasons for reverting is that the following:
1. I still need need to upstream some part of the do concurrent to
OpenMP pass from our downstream implementation and taking this in
downstream will make things more difficult.
2. I still need to work on a solution for modeling locality specifiers
on `hlfir.do_concurrent` ops. I would prefer to do that and merge the
entire stack together instead of having a partial solution.

After merging the revert I will reopen the origianl PR and keep it
updated against main until I finish the above.
2025-04-16 07:20:27 -05:00
Kareem Ergawy
04b87e15e4 [flang][fir] Lower do concurrent loop nests to fir.do_concurrent (#132904)
Adds support for lowering `do concurrent` nests from PFT to the new
`fir.do_concurrent` MLIR op as well as its special terminator
`fir.do_concurrent.loop` which models the actual loop nest.

To that end, this PR emits the allocations for the iteration variables
within the block of the `fir.do_concurrent` op and creates a region for
the `fir.do_concurrent.loop` op that accepts arguments equal in number
to the number of the input `do concurrent` iteration ranges.

For example, given the following input:
```fortran
   do concurrent(i=1:10, j=11:20)
   end do
```
the changes in this PR emit the following MLIR:
```mlir
    fir.do_concurrent {
      %22 = fir.alloca i32 {bindc_name = "i"}
      %23:2 = hlfir.declare %22 {uniq_name = "_QFsub1Ei"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      %24 = fir.alloca i32 {bindc_name = "j"}
      %25:2 = hlfir.declare %24 {uniq_name = "_QFsub1Ej"} : (!fir.ref<i32>) -> (!fir.ref<i32>, !fir.ref<i32>)
      fir.do_concurrent.loop (%arg1, %arg2) = (%18, %20) to (%19, %21) step (%c1, %c1_0) {
        %26 = fir.convert %arg1 : (index) -> i32
        fir.store %26 to %23#0 : !fir.ref<i32>
        %27 = fir.convert %arg2 : (index) -> i32
        fir.store %27 to %25#0 : !fir.ref<i32>
      }
    }
```
2025-04-16 06:14:38 +02:00
Asher Mancinelli
c9ec1bc753 [flang] Handle volatility in lowering and codegen (#135311)
* Enable lowering and conversion patterns to pass volatility information
from higher level operations to lower level ones.
* Enable codegen to pass volatility to LLVM dialect ops by setting an
attribute on loads, stores, and memory intrinsics.
* Add utilities for passing along the volatility from an input type to
an output type.

To introduce volatile types into the IR, entities with the volatile
attribute will be given a volatile type in the bridge; this is not
enabled in this patch. User code should not result in IR with volatile
types yet, so this patch contains no tests with Fortran source, only IR
that already contains volatile types.

Part 3 of #132486.
2025-04-14 11:02:23 -07:00
Asher Mancinelli
8f23d4296c Reland "[flang][nfc] Support volatility in Fir ops" (#135039)
#134858 had an extraneous include which caused the shared library builds
to break.
2025-04-09 12:45:55 -07:00
David Spickett
fb73086dd2 Revert "[flang][nfc] Support volatility in Fir ops" (#135034)
Reverts llvm/llvm-project#134858

Fails to build when shared libraries are enabled:
https://lab.llvm.org/buildbot/#/builders/80/builds/12361
```
: && /usr/local/bin/c++ -fPIC -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -Wno-deprecated-copy -Wno-string-conversion -Wno-ctad-maybe-unsupported -Wno-unused-command-line-argument -Wstring-conversion           -Wcovered-switch-default -Wno-nested-anon-types -O3 -DNDEBUG  -Wl,-z,defs -Wl,-z,nodelete   -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/./lib  -Wl,--gc-sections -shared -Wl,-soname,libFIRDialect.so.21.0git -o lib/libFIRDialect.so.21.0git tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRAttr.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRDialect.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIRType.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FirAliasTagOpInterface.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FortranVariableInterface.cpp.o tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/Inliner.cpp.o  -Wl,-rpath,"\$ORIGIN/../lib:/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/lib:"  lib/libCUFAttrs.so.21.0git  lib/libFIRDialectSupport.so.21.0git  lib/libLLVMAsmPrinter.so.21.0git  lib/libMLIRBuiltinToLLVMIRTranslation.so.21.0git  lib/libMLIROpenMPToLLVM.so.21.0git  lib/libMLIRLLVMToLLVMIRTranslation.so.21.0git  lib/libMLIRFuncToLLVM.so.21.0git  lib/libMLIRArithToLLVM.so.21.0git  lib/libMLIRArithAttrToLLVMConversion.so.21.0git  lib/libMLIRArithTransforms.so.21.0git  lib/libMLIRBufferizationTransforms.so.21.0git  lib/libMLIRBufferizationDialect.so.21.0git  lib/libMLIRSparseTensorDialect.so.21.0git  lib/libMLIRSCFDialect.so.21.0git  lib/libMLIRFuncTransforms.so.21.0git  lib/libMLIRShardingInterface.so.21.0git  lib/libMLIRMeshDialect.so.21.0git  lib/libMLIRVectorDialect.so.21.0git  lib/libMLIRTensorDialect.so.21.0git  lib/libMLIRParallelCombiningOpInterface.so.21.0git  lib/libMLIRMaskableOpInterface.so.21.0git  lib/libMLIRMaskingOpInterface.so.21.0git  lib/libMLIRVectorInterfaces.so.21.0git  lib/libMLIRControlFlowToLLVM.so.21.0git  lib/libMLIRControlFlowDialect.so.21.0git  lib/libMLIRMemRefToLLVM.so.21.0git  lib/libMLIRLLVMCommonConversion.so.21.0git  lib/libMLIRMemRefUtils.so.21.0git  lib/libMLIRAffineDialect.so.21.0git  lib/libMLIRMemRefDialect.so.21.0git  lib/libMLIRArithUtils.so.21.0git  lib/libMLIRComplexDialect.so.21.0git  lib/libMLIRArithDialect.so.21.0git  lib/libMLIRCastInterfaces.so.21.0git  lib/libMLIRInferIntRangeCommon.so.21.0git  lib/libMLIRShapedOpInterfaces.so.21.0git  lib/libMLIRDialect.so.21.0git  lib/libMLIRDialectUtils.so.21.0git  lib/libMLIROpenMPDialect.so.21.0git  lib/libMLIROpenACCMPCommon.so.21.0git  lib/libMLIRTargetLLVMIRExport.so.21.0git  lib/libMLIRDLTIDialect.so.21.0git  lib/libMLIRLLVMIRTransforms.so.21.0git  lib/libMLIRTransforms.so.21.0git  lib/libMLIRUBDialect.so.21.0git  lib/libMLIRRuntimeVerifiableOpInterface.so.21.0git  lib/libMLIRFuncDialect.so.21.0git  lib/libMLIRNVVMDialect.so.21.0git  lib/libMLIRTranslateLib.so.21.0git  lib/libMLIRParser.so.21.0git  lib/libMLIRBytecodeReader.so.21.0git  lib/libMLIRAsmParser.so.21.0git  lib/libMLIRTransformUtils.so.21.0git  lib/libMLIRSubsetOpInterface.so.21.0git  lib/libMLIRValueBoundsOpInterface.so.21.0git  lib/libMLIRDestinationStyleOpInterface.so.21.0git  lib/libMLIRRewrite.so.21.0git  lib/libMLIRRewritePDL.so.21.0git  lib/libMLIRPDLToPDLInterp.so.21.0git  lib/libMLIRPass.so.21.0git  lib/libMLIRAnalysis.so.21.0git  lib/libMLIRInferIntRangeInterface.so.21.0git  lib/libMLIRLoopLikeInterface.so.21.0git  lib/libMLIRPresburger.so.21.0git  lib/libMLIRViewLikeInterface.so.21.0git  lib/libMLIRPDLInterpDialect.so.21.0git  lib/libMLIRPDLDialect.so.21.0git  lib/libLLVMFrontendOpenMP.so.21.0git  lib/libLLVMTransformUtils.so.21.0git  lib/libMLIRLLVMDialect.so.21.0git  lib/libMLIRInferTypeOpInterface.so.21.0git  lib/libMLIRControlFlowInterfaces.so.21.0git  lib/libMLIRDataLayoutInterfaces.so.21.0git  lib/libMLIRFunctionInterfaces.so.21.0git  lib/libMLIRCallInterfaces.so.21.0git  lib/libMLIRMemorySlotInterfaces.so.21.0git  lib/libMLIRSideEffectInterfaces.so.21.0git  lib/libMLIRIR.so.21.0git  lib/libLLVMBitWriter.so.21.0git  lib/libLLVMAnalysis.so.21.0git  lib/libLLVMAsmParser.so.21.0git  lib/libLLVMBitReader.so.21.0git  lib/libMLIRSupport.so.21.0git  lib/libLLVMCore.so.21.0git  lib/libLLVMRemarks.so.21.0git  lib/libLLVMBinaryFormat.so.21.0git  lib/libLLVMTargetParser.so.21.0git  lib/libLLVMSupport.so.21.0git  -Wl,-rpath-link,/home/tcwg-buildbot/worker/flang-aarch64-sharedlibs/build/lib && :
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::CharBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir12CharBoxValue4dumpEv[_ZNK3fir12CharBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::CharBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::PolymorphicValue::dump() const':
FIROps.cpp:(.text._ZNK3fir16PolymorphicValue4dumpEv[_ZNK3fir16PolymorphicValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::PolymorphicValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::ArrayBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir13ArrayBoxValue4dumpEv[_ZNK3fir13ArrayBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::ArrayBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::CharArrayBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir17CharArrayBoxValue4dumpEv[_ZNK3fir17CharArrayBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::CharArrayBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::ProcBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir12ProcBoxValue4dumpEv[_ZNK3fir12ProcBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::ProcBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::BoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir8BoxValue4dumpEv[_ZNK3fir8BoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::BoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::MutableBoxValue::dump() const':
FIROps.cpp:(.text._ZNK3fir15MutableBoxValue4dumpEv[_ZNK3fir15MutableBoxValue4dumpEv]+0x20): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::MutableBoxValue const&)'
/usr/bin/ld: tools/flang/lib/Optimizer/Dialect/CMakeFiles/FIRDialect.dir/FIROps.cpp.o: in function `fir::ExtendedValue::dump() const':
FIROps.cpp:(.text._ZNK3fir13ExtendedValue4dumpEv[_ZNK3fir13ExtendedValue4dumpEv]+0x18): undefined reference to `fir::operator<<(llvm::raw_ostream&, fir::ExtendedValue const&)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
```
2025-04-09 15:41:45 +01:00
Asher Mancinelli
e42f860985 [flang][nfc] Support volatility in Fir ops (#134858)
Part two of merging #132486. Support volatility in fir ops.

* Introduce a new operation fir.volatile_cast, whose only purpose is to
add or take away the volatility of an SSA value's type. The types must
be otherwise identical, and any other type conversions must be handled
by fir.convert. fir.convert will give an error if the volatility of the
inputs does not match, such that all changes to volatility must be
handled explicitly through fir.volatile_cast.
* Add memory effects to ops that read from or write to memory. The
precedent for this comes from the LLVM dialect (feb7beaf70) where
llvm.load/store ops with the volatile attribute report read/write
effects to a generic memory resource. This change is similar in spirit
but different in two ways: the volatility of an operation is determined
by the type of its memref, not an attribute on the op, and the memory
effects of a load- or store-like operation on a volatile reference type
are reported against a particular memory resource,
`VolatileMemoryResource`. This is so MLIR optimizations are able to
reorder operations that are not volatile around operations that are,
which we believe more precisely models LLVM's volatile memory semantics.
@vzakhari suggested this in #132486 citing LangRef. See
https://llvm.org/docs/LangRef.html#volatile-memory-accesses

Changes needed to generate IR with volatile types are not included in
this change, so it should be non-functional, containing only the changes
to Fir ops and op utilities that will be needed once we enable lowering
to generate volatile types.
2025-04-09 05:55:24 -07:00
Asher Mancinelli
b2711e1526 [flang][nfc] Support volatile on ref, box, and class types (#134386)
Part one of merging #132486. Add support for representing volatility in
the type system for reference, box, and class types. Don't do anything
with volatile just yet, only support and test their representation and
utility functions.

The naming convention is a little goofy - `fir::isa_volatile_type` and
`fir::updateTypeWithVolatility` use different capitalization, but I put
them near similar functions and tried to match the surrounding
conventions and [the
docs](https://github.com/llvm/llvm-project/blob/main/flang/docs/C%2B%2Bstyle.md#naming)
best I could.
2025-04-07 06:51:02 -07:00
Slava Zakharin
5f268d04f9 [flang] Code generation for fir.pack/unpack_array. (#132080)
The code generation relies on `ShallowCopyDirect` runtime
to copy data between the original and the temporary arrays
(both directions). The allocations are done by the compiler
generated code. The heap allocations could have been passed
to `ShallowCopy` runtime, but I decided to expose the allocations
so that the temporary descriptor passed to `ShallowCopyDirect`
has `nocapture` - maybe this will be better for LLVM optimizations.
2025-03-31 11:42:17 -07:00
jeanPerier
3ff3b29dd6 [flang] lower remaining cases of pointer assignments inside forall (#130772)
Implement handling of `NULL()` RHS, polymorphic pointers, as well as
lower bounds or bounds remapping in pointer assignment inside FORALL.

These cases eventually do not require updating hlfir.region_assign,
lowering can simply prepare the new descriptor for the LHS inside the
RHS region.

Looking more closely at the polymorphic cases, there is not need to call
the runtime, fir.rebox and fir.embox do handle the dynamic type setting
correctly.

After this patch, the last remaining TODO is the allocatable assignment
inside FORALL, which like some cases here, is more likely an accidental
feature given FORALL was deprecated in F2003 at the same time than
allocatable components where added.
2025-03-14 10:51:46 +01:00
Valentin Clement (バレンタイン クレメン)
e350485595 [flang][cuda] Set alloca block in cuf kernel (#128776)
Temporary created during lowering in a cuf kernel must be set in the cuf
kernel itself otherwise they will be allocated on the host.
2025-02-25 17:25:23 -08:00
Slava Zakharin
0caa8f42be Reland "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)"
This change is inspired by a case in facerec benchmark, where
performance
of scalar code may improve by about 6%@aarch64 due to getting rid of
redundant
loads from Fortran descriptors. These descriptors are corresponding
to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest
in LocalMove subroutine contains call to Fortran runtime IO functions,
and LLVM globals-aa analysis cannot prove that these calls do not modify
the globalized descriptors with internal linkage.

This patch sets and propagates llvm.memory_effects attribute for
fir.call
operations calling Fortran runtime functions. In particular, it tries
to set the Other memory effect to NoModRef. The Other memory effect
includes accesses to globals and captured pointers, so we cannot set
it for functions taking Fortran descriptors with one exception
for calls where the Fortran descriptor arguments are all null.

As long as different calls to the same Fortran runtime function may have
different attributes, I decided to attach the attributes to the calls
rather than functions. Moreover, attaching the attributes to func.func
will require propagating these attributes to llvm.func, which is not
happening right now.

In addition to llvm.memory_effects, the new pass sets llvm.nosync
and llvm.nocallback attributes that may also help LLVM alias analysis
(e.g. see #127707). These attributes are ignored currently.
I will support them in LLVM IR dialect in a separate patch.

I also added another pass for developers to be able to print
declarations/calls of all Fortran runtime functions that are recognized
by the attributes setting pass. It should help with maintenance
of the LIT tests.
2025-02-24 14:18:17 -08:00
Slava Zakharin
69cc16fb55 Revert "[flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)"
This reverts commit 36fdeb2ade.
2025-02-24 10:52:53 -08:00
Slava Zakharin
36fdeb2ade [flang] Set LLVM specific attributes to fir.call's of Fortran runtime. (#128093)
This change is inspired by a case in facerec benchmark, where
performance
of scalar code may improve by about 6%@aarch64 due to getting rid of
redundant
loads from Fortran descriptors. These descriptors are corresponding
to subroutine local ALLOCATABLE, SAVE variables. The scalar loop nest
in LocalMove subroutine contains call to Fortran runtime IO functions,
and LLVM globals-aa analysis cannot prove that these calls do not modify
the globalized descriptors with internal linkage.

This patch sets and propagates llvm.memory_effects attribute for
fir.call
operations calling Fortran runtime functions. In particular, it tries
to set the Other memory effect to NoModRef. The Other memory effect
includes accesses to globals and captured pointers, so we cannot set
it for functions taking Fortran descriptors with one exception
for calls where the Fortran descriptor arguments are all null.

As long as different calls to the same Fortran runtime function may have
different attributes, I decided to attach the attributes to the calls
rather than functions. Moreover, attaching the attributes to func.func
will require propagating these attributes to llvm.func, which is not
happening right now.

In addition to llvm.memory_effects, the new pass sets llvm.nosync
and llvm.nocallback attributes that may also help LLVM alias analysis
(e.g. see #127707). These attributes are ignored currently.
I will support them in LLVM IR dialect in a separate patch.

I also added another pass for developers to be able to print
declarations/calls of all Fortran runtime functions that are recognized
by the attributes setting pass. It should help with maintenance
of the LIT tests.
2025-02-24 09:27:48 -08:00
Krzysztof Parzyszek
d553e5d4b6 [flang] Fix build break after bac9575274
.../flang/lib/Optimizer/Builder/FIRBuilder.cpp: In function ‘llvm::Small
Vector<mlir::Value> fir::factory::updateRuntimeExtentsForEmptyArrays(fir
::FirOpBuilder&, mlir::Location, mlir::ValueRange)’:
.../flang/lib/Optimizer/Builder/FIRBuilder.cpp:1786:10: error: could not
 convert ‘newExtents’ from ‘SmallVector<[...],15>’ to ‘SmallVector<[...]
,6>’
 1786 |   return newExtents;
      |          ^~~~~~~~~~
      |          |
      |          SmallVector<[...],15>

Remove size from template parameters in the declaration of `newExtents`.
2025-01-30 07:14:16 -06:00
Slava Zakharin
bac9575274 [flang] Reset all extents to zero for empty hlfir.elemental loops. (#124867)
An hlfir.elemental with a shape `(0, HUGE)` still runs `HUGE`
number of iterations when expanded into a loop nest.
HLFIR transformational operations inlined as hlfir.elemental
may execute slower comparing to Fortran runtime implementation.
This patch adds an option for BufferizeHLFIR pass to reset all
upper bounds in the elemental loop nests to zero, if the result
is an empty array.

A separate patch will enable this option in the driver after I do
more performance testing. The option is off by default now.
2025-01-29 12:03:05 -08:00
Valentin Clement (バレンタイン クレメン)
05fd4d5775 [flang][cuda] Perform inlined assignment when field is c_devptr (#124322)
When a field in a derived type is `c_devptr`, keep check if we can do a
memcpy instead of falling back to the runtime assignment.

Many internal CUDA Fortran derived type have a `c_devptr` field and this
would lead to stack overflow on the device if the assignment is
performed by the runtime function.
2025-01-24 14:32:07 -08:00
Valentin Clement (バレンタイン クレメン)
2523d3b102 [flang][cuda] Perform scalar assignment of c_devptr inlined (#123407)
Because `c_devptr` has a `c_ptr` field, any assignment were done via the
Assign runtime function. This leads to stack overflow on the device and
taking too much memory. As we know the c_devptr can be directly copied
on assignment, make it a special case.
2025-01-17 14:34:47 -08:00
Matthias Springer
f023da12d1 [mlir][IR] Remove factory methods from FloatType (#123026)
This commit removes convenience methods from `FloatType` to make it
independent of concrete interface implementations.

See discussion here:
https://discourse.llvm.org/t/rethink-on-approach-to-low-precision-fp-types/82361

Note for LLVM integration: Replace `FloatType::getF32(` with
`Float32Type::get(` etc.
2025-01-16 08:56:09 +01:00
Slava Zakharin
3bb969f3eb [flang] Inline hlfir.matmul[_transpose]. (#122821)
Inlining `hlfir.matmul` as `hlfir.eval_in_mem` does not allow
to get rid of a temporary array in many cases, but it may still be
much better allowing to:
  * Get rid of any overhead related to calling runtime MATMUL
    (such as descriptors creation).
  * Use CPU-specific vectorization cost model for matmul loops,
    which Fortran runtime cannot currently do.
  * Optimize matmul of known-size arrays by complete unrolling.

One of the drawbacks of `hlfir.eval_in_mem` inlining is that
the ops inside it with store memory effects block the current
MLIR CSE, so I decided to run this inlining late in the pipeline.
There is a source commen explaining the CSE issue in more detail.

Straightforward inlining of `hlfir.matmul` as an `hlfir.elemental`
is not good for performance, and I got performance regressions
with it comparing to Fortran runtime implementation. I put it
under an enigneering option for experiments.

At the same time, inlining `hlfir.matmul_transpose` as `hlfir.elemental`
seems to be a good approach, e.g. it allows getting rid of a temporay
array in cases like: `A(:)=B(:)+MATMUL(TRANSPOSE(C(:,:)),D(:))`.

This patch improves performance of galgel and tonto a little bit.
2025-01-15 08:42:57 -08:00
Valentin Clement (バレンタイン クレメン)
878a57468b [flang][cuda] Add c_devloc as intrinsic and inline it during lowering (#120648)
Add `c_devloc` as intrinsic and inline it during lowering. `c_devloc` is
used in CUDA Fortran to get the address of device variables.

For the moment, we borrow almost all semantic checks from `c_loc` except
for the pointer or target restriction. The specifications of `c_devloc`
are are pretty vague and we will relax/enforce the restrictions based on
library and apps usage comparing them to the reference compiler.
2025-01-08 11:23:05 -08:00
agozillon
e508bacce4 [Flang][OpenMP] Derived type explicit allocatable member mapping (#113557)
This PR is one of 3 in a PR stack, this is the primary change set which
seeks to extend the current derived type explicit member mapping support
to handle descriptor member mapping at arbitrary levels of nesting. The
PR stack seems to do this reasonably (from testing so far) but as you
can create quite complex mappings with derived types (in particular when
adding allocatable derived types or arrays of allocatable derived types)
I imagine there will be hiccups, which I am more than happy to address.
There will also be further extensions to this work to handle the
implicit auto-magical mapping of descriptor members in derived types and
a few other changes planned for the future (with some ideas on
optimizing things).

The changes in this PR primarily occur in the OpenMP lowering and the
OMPMapInfoFinalization pass.

In the OpenMP lowering several utility functions were added or extended
to support the generation of appropriate intermediate member mappings
which are currently required when the parent (or multiple parents) of a
mapped member are descriptor types. We need to map the entirety of these
types or do a "deep copy" for lack of a better term, where we map both
the base address and the descriptor as without the copying of both of
these we lack the information in the case of the descriptor to access
the member or attach the pointers data to the pointer and in the latter
case we require the base address to map the chunk of data. Currently we
do not segment descriptor based derived types as we do with regular
non-descriptor derived types, we effectively map their entirety in all
cases at the moment, I hope to address this at some point in the future
as it adds a fair bit of a performance penalty to having nestings of
allocatable derived types as an example. The process of mapping all
intermediate descriptor members in a members path only occurs if a
member has an allocatable or object parent in its symbol path or the
member itself is a member or allocatable. This occurs in the
createParentSymAndGenIntermediateMaps function, which will also generate
the appropriate address for the allocatable member within the derived
type to use as a the varPtr field of the map (for intermediate
allocatable maps and final allocatable mappings). In this case it's
necessary as we can't utilise the usual Fortran::lower functionality
such as gatherDataOperandAddrAndBounds without causing issues later in
the lowering due to extra allocas being spawned which seem to affect the
pointer attachment (at least this is my current assumption, it results
in memory access errors on the device due to incorrect map information
generation). This is similar to why we do not use the MLIR value
generated for this and utilise the original symbol provided when mapping
descriptor types external to derived types. Hopefully this can be
rectified in the future so this function can be simplified and more
closely aligned to the other type mappings. We also make use of
fir::CoordinateOp as opposed to the HLFIR version as the HLFIR version
doesn't support the appropriate lowering to FIR necessary at the moment,
we also cannot use a single CoordinateOp (similarly to a single GEP) as
when we index through a descriptor operation (BoxType) we encounter
issues later in the lowering, however in either case we need access to
intermediate descriptors so individual CoordinateOp's aid this
(although, being able to compress them into a smaller amount of
CoordinateOp's may simplify the IR and perhaps result in a better end
product, something to consider for the future).

The other large change area was in the OMPMapInfoFinalization pass,
where the pass had to be extended to support the expansion of box types
(or multiple nestings of box types) within derived types, or box type
derived types. This requires expanding each BoxType mapping from one
into two maps and then modifying all of the existing member indices of
the overarching parent mapping to account for the addition of these new
members alongside adjusting the existing member indices to support the
addition of these new maps which extend the original member indices (as
a base address of a box type is currently considered a member of the box
type at a position of 0 as when lowered to LLVM-IR it's a pointer
contained at this position in the descriptor type, however, this means
extending mapped children of this expanded descriptor type to
additionally incorporate the new member index in the correct location in
its own index list). I believe there is a reasonable amount of comments
that should aid in understanding this better, alongside the test
alterations for the pass.

A subset of the changes were also aimed at making some of the utilities
for packing and unpacking the DenseIntElementsAttr containing the member
indices shareable across the lowering and OMPMapInfoFinalization, this
required moving some functions to the Lower/Support/Utils.h header, and
transforming the lowering structure containing the member index data
into something more similar to the version used in
OMPMapInfoFinalization. There we also some other attempts at tidying
things up in relation to the member index data generation in the
lowering, some of which required creating a logical operator for the
OpenMP ID class so it can be utilised as a map key (it simply utilises
the symbol address for the moment as ordering isn't particularly
important).

Otherwise I have added a set of new tests encompassing some of the
mappings currently supported by this PR (unfortunately as you can have
arbitrary nestings of all shapes and types it's not very feasible to
cover them all).
2024-11-16 12:28:37 +01:00
Leandro Lupori
390943f25b [flang] Implement conversion of compatible derived types (#111165)
With some restrictions, BIND(C) derived types can be converted to
compatible BIND(C) derived types.
Semantics already support this, but ConvertOp was missing the
conversion of such types.

Fixes https://github.com/llvm/llvm-project/issues/107783
2024-10-09 10:37:46 -03:00
jeanPerier
1753de2d95 [flang][FIR] remove fir.complex type and its fir.real element type (#111025)
Final patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292

Since fir.real was only still used as fir.complex element type, this
patch removes it at the same time.
2024-10-04 09:57:03 +02:00
jeanPerier
c4204c0b29 [flang] replace fir.complex usages with mlir complex (#110850)
Core patch of
https://discourse.llvm.org/t/rfc-flang-replace-usages-of-fir-complex-by-mlir-complex-type/82292.
After that, the last step is to remove fir.complex from FIR types.
2024-10-03 17:10:57 +02:00
Yusuke MINATO
b91a25ef58 [flang] add nsw to operations in subscripts (#110060)
This patch adds nsw to operations when lowering subscripts.

See also the discussion in the following discourse post.
https://discourse.llvm.org/t/rfc-add-nsw-flags-to-arithmetic-integer-operations-using-the-option-fno-wrapv/77584/9
2024-10-03 10:56:01 +09:00
jeanPerier
e6618aae43 [flang] fix ignore_tkr(tk) with character dummy (#108168)
The test code with ignore_tkr(tk) on character dummy passed by
fir.boxchar<> was crashing the compiler in [an
assert](2afe678f0a/flang/lib/Optimizer/Dialect/FIRType.cpp (L632))
in `changeElementType`.

It makes little sense to call changeElementType on a fir.boxchar since
this type is lossy (the shape is not part of it). Just skip it in the
code dealing with ignore(tk) when hitting this case
2024-09-16 16:27:11 +02:00
Tom Eccles
5aaf384b16 [flang][NFC] use llvm.intr.stacksave/restore instead of opaque calls (#108562)
The new LLVM stack save/restore intrinsic operations are more convenient
than function calls because they do not add function declarations to the
module and therefore do not block the parallelisation of passes.
Furthermore they could be much more easily marked with memory effects
than function calls if that ever proved useful.

This builds on top of #107879.

Resolves #108016
2024-09-16 12:33:37 +01:00
Valentin Clement (バレンタイン クレメン)
e67a6667dc [flang][cuda] Avoid extra load in c_f_pointer lowering with c_devptr (#108090)
Remove unnecessary load of the `cptr` component when getting the
`__address`. `fir.coordinate_of` operation can be chained so the load is
not needed.
2024-09-10 19:33:33 -07:00
Valentin Clement (バレンタイン クレメン)
cd8229bb4b [flang][cuda] Support c_devptr in c_f_pointer intrinsic (#107470)
This is an extension of CUDA Fortran. The iso_c_binding intrinsic can
accept a `TYPE(c_devptr)` as its first argument. This patch relax the
semantic check to accept it and update the lowering to unwrap the cptr
field from the c_devptr.
2024-09-09 10:32:35 -07:00
Valentin Clement (バレンタイン クレメン)
5bb379f6f0 [flang][cuda] Fix allocation of descriptor for cray pointer (#103474)
The cray pointee descriptor with device attribute was not allocated with
cuf.alloc so it leads to error on deallocation with cuf.free.
2024-08-13 16:52:00 -07:00
khaki3
26d92826a5 [mlir][flang] Add an interface of OpenACC compute regions for further getAllocaBlock support (#100675)
This PR implements `ComputeRegionOpInterface` to define `getAllocaBlock`
of OpenACC loop and compute constructs (parallel/kernels/serial). The
primary objective here is to accommodate local variables in OpenACC
compute regions. The change in `fir::FirOpBuilder::getAllocaBlock`
allows local variable allocation inside loops and kernels.
2024-07-26 13:52:27 -07:00
jeanPerier
1ead51a86c [flang] fix C_PTR function result lowering (#100082)
Functions returning C_PTR were lowered to function returning intptr (i64
on 64bit arch). This caused conflicts when these functions were defined
as returning !fir.ref<none>/llvm.ptr in other compiler generated
contexts (e.g., malloc).

Lower them to return !fir.ref<none>.

This should deal with https://github.com/llvm/llvm-project/issues/97325
and https://github.com/llvm/llvm-project/issues/98644.
2024-07-24 10:24:04 +02:00
jeanPerier
31087c5e4c [flang] handle alloca outside of entry blocks in MemoryAllocation (#98457)
This patch generalizes the MemoryAllocation pass (alloca -> heap) to
handle fir.alloca regardless of their postion in the IR. Currently, it
only dealt with fir.alloca in function entry blocks. The logic is placed
in a utility that can be used to replace alloca in an operation on
demand to whatever kind of allocation the utility user wants via
callbacks (allocmem, or custom runtime calls to instrument the code...).

To do so, a concept of ownership, that was already implied a bit and
used in passes like stack-reclaim, is formalized. Any operation with the
LoopLikeInterface, AutomaticAllocationScope, or IsolatedFromAbove owns
the alloca directly nested inside its regions, and they must not be used
after the operation.

The pass then looks for the exit points of region with such interface,
and use that to insert deallocation. If dominance is not proved, the
pass fallbacks to storing the new address into a C pointer variable
created in the entry of the owning region which allows inserting
deallocation as needed, included near the alloca itself to avoid leaks
when the alloca is executed multiple times due to block CFGs loops.

This should fix https://github.com/llvm/llvm-project/issues/88344.

In a next step, I will try to refactor lowering a bit to introduce
lifetime operation for alloca so that the deallocation points can be
inserted as soon as possible.
2024-07-17 09:15:47 +02:00
jeanPerier
66d5ca2a3d Reland "[flang] add extra component information in fir.type_info" (#97404)
Reland #96746 with the proper Support/CMakelist.txt change.

fir.type does not contain all Fortran level information about
components. For instance, component lower bounds and default initial
value are lost. For correctness purpose, this does not matter because
this information is "applied" in lowering (e.g., when addressing the
components, the lower bounds are reflected in the hlfir.designate).

However, this "loss" of information will prevent the generation of
correct debug info for the type (needs to know about lower bounds). The
initial value could help building some optimization pass to get rid of
initialization runtime calls.

This patch adds lower bound and initial value information into
fir.type_info via a new fir.dt_component operation. This operation is
generated only for component that needs it, which helps keeping the IR
small for "boring" types.

In general, adding Fortran level info in fir.type_info will allow
delaying the generation of "type descriptors" gobals that are very
verbose in FIR and make it hard to work with FIR dumps from applications
with many derived types.
2024-07-02 15:19:49 +02:00
jeanPerier
6a66b8224d Revert "[flang] add extra component information in fir.type_info" (#96937)
Reverts llvm/llvm-project#96746
Breaking shared library buillds:
https://lab.llvm.org/buildbot/#/builders/89/builds/931
2024-06-27 19:22:48 +02:00
jeanPerier
1448ed2000 [flang] add extra component information in fir.type_info (#96746)
fir.type does not contain all Fortran level information about
components. For instance, component lower bounds and default initial
value are lost. For correctness purpose, this does not matter because
this information is "applied" in lowering (e.g., when addressing the
components, the lower bounds are reflected in the hlfir.designate).

However, this "loss" of information will prevent the generation of
correct debug info for the type (needs to know about lower bounds). The
initial value could help building some optimization pass to get rid of
initialization runtime calls.

This patch adds lower bound and initial value information into
fir.type_info via a new fir.dt_component operation. This operation is
generated only for component that needs it, which helps keeping the IR
small for "boring" types.

In general, adding Fortran level info in fir.type_info will allow
delaying the generation of "type descriptors" gobals that are very
verbose in FIR and make it hard to work with FIR dumps from applications
with many derived types.
2024-06-27 18:59:03 +02:00
Kareem Ergawy
d0413438ec [flang][OpenMP] Handle omp.private in FirOpBuilder::getAllocaBlock() (#93927)
Fixes a crash uncovered by
[pr89651](https://github.com/llvm/llvm-test-suite/blob/main/Fortran/gfortran/regression/gomp/pr89651.f90)
in the test suite.

Fixes a crash caused by missing handling of `omp.private` ops in
`FirOpBuilder::getAllocaBlock()`.
2024-06-04 05:03:39 +02:00
Valentin Clement (バレンタイン クレメン)
45daa4fdc6 [flang][cuda] Move CUDA Fortran operations to a CUF dialect (#92317)
The number of operations dedicated to CUF grew and where all still in
FIR. In order to have a better organization, the CUF operations,
attributes and code is moved into their specific dialect and files. CUF
dialect is tightly coupled with HLFIR/FIR and their types.

The CUF attributes are bundled into their own library since some
HLFIR/FIR operations depend on them and the CUF dialect depends on the
FIR types. Without having the attributes into a separate library there
would be a dependency cycle.
2024-05-17 09:37:53 -07:00
Valentin Clement (バレンタイン クレメン)
26060de063 [flang][cuda] Lower device/managed/unified allocation to cuda ops (#90623)
Lower locals allocation of cuda device, managed and unified variables to
fir.cuda_alloc. Add fir.cuda_free in the function context finalization.

@vzakhari For some reason the PR #90526 has been closed when I merged PR
#90525. Just reopening one.
2024-05-02 14:32:53 -07:00
Christian Sigg
fac349a169 Reapply "[mlir] Mark isa/dyn_cast/cast/... member functions depreca… (#90406)
…ted. (#89998)" (#90250)

This partially reverts commit 7aedd7dc75.

This change removes calls to the deprecated member functions. It does
not mark the functions deprecated yet and does not disable the
deprecation warning in TypeSwitch. This seems to cause problems with
MSVC.
2024-04-28 22:01:42 +02:00
dyung
7aedd7dc75 Revert "[mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)" (#90250)
This reverts commit 950b7ce0b8.

This change is causing build failures on a bot
https://lab.llvm.org/buildbot/#/builders/216/builds/38157
2024-04-26 12:09:13 -07:00
Christian Sigg
950b7ce0b8 [mlir] Mark isa/dyn_cast/cast/... member functions deprecated. (#89998)
See https://mlir.llvm.org/deprecation and
https://discourse.llvm.org/t/preferred-casting-style-going-forward.
2024-04-26 16:28:30 +02:00
Tom Eccles
8cc34fadec [flang][OpenMP] Support reduction of allocatable variables (#88392)
Both arrays and trivial scalars are supported. Both cases must use
by-ref reductions because both are boxed.

My understanding of the standards are that OpenMP says that this should
follow the rules of the intrinsic reduction operators in fortran, and
fortran says that unallocated allocatable variables can only be
referenced to allocate them or test if they are already allocated.
Therefore we do not need a null pointer check in the combiner region.
2024-04-23 10:34:28 +01:00
jeanPerier
8ddfb66903 [flang] Fix MASKR/MASKL lowering for INTEGER(16) (#87496)
The all one masks was not properly created for i128 types because
builder.createIntegerConstant ended-up truncating -1 to something
positive.

Add a builder.createAllOnesInteger/createMinusOneInteger helpers and use
them where createIntegerConstant(..., -1) was used.
Add an assert in createIntegerConstant to catch negative numbers for
i128 type.
2024-04-08 10:18:56 +02:00