As part of OpenMP 5.2, the allocate directive has been deprecated in
favour of the allocators construct for Fortran when an ALLOCATE
statement follows the OpenMP allocate directive.
To enable this in flang, a warning has been added informing the user of
this. Tests to ensure this behaviour is continued are also included.
See also: #110008
Before this change, OmpShared was not always set in shared symbols.
Instead, absence of private flags was interpreted as shared DSA.
The problem was that symbols with no flags, with only a host
association, could also mean "has same DSA as in the enclosing
context". Now shared symbols behave the same as private and can be
treated the same way.
Because of the host association symbols with no flags mentioned
above, it was also incorrect to simply test the flags of a given
symbol to find out if it was private or shared. The function
GetSymbolDSA() was added to fix this. It would be better to avoid
the need of these special symbols, but this would require changes
to how symbols are collected in lowering.
Besides that, some semantic checks need to know if a DSA clause
was used or not. To avoid confusing implicit symbols with DSA
clauses a new flag was added: OmpExplicit. It is now set for all
symbols with explicitly determined data-sharing attributes.
With the changes above, AddToContextObjectWithDSA() and the symbol
to DSA map could probably be removed and the DSA could be obtained
directly from the symbol, but this was not attempted.
Some debug messages were also added, with the "omp" DEBUG_TYPE, to
make it easier to debug the creation of implicit symbols and to
visualize all associations of a given symbol.
Fixes#130533Fixes#140882
I noticed that this testcase was failing in a CI build because it has a
`CHECK-NOT: bbb` while `bbb` could appear in the hashed global variable
name. Improved the test by changing this name to `ggg` which can't
appear in a hex string.
Add a visitor for OmpClause::Uniform to resolve its parameter names.
Add Symbol::Flag::OmpUniform to attach it to the resolved symbols.
Fixes issue #140741.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
In the openmp reduction clause lowering, reduction variables' reference
types were not propagating the volatility of the original variable's
type. Add a test and address cases of this in ReductionProcessor.
This PR extends `genSymbolType` so that the type of an associating
symbol carries the shape of the selector expression, if any. This is a
fix for a bug that triggered when an associating symbol is used in a
locality specifier. For example, given the following input:
```fortran
associate(a => aa(4:))
do concurrent (i = 4:11) local(a)
a(i) = 0
end do
end associate
```
before the changes in the PR, flang would assert that we are casting
between incompatible types. The issue happened since for the associating
symbol (`a`), flang generated its type as `f32` rather than
`!fir.array<8xf32>` as it should be in this case.
PR#142073 introduced a new test that checks the
prefer-vector-width function attribute. This test was not accounting for
target triples that include default function attributes.
This patch updates prefer-vector-width.f90 to ignore extra function
attributes.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This PR add functionality to change `flang` command line using
environment variable `FCC_OVERRIDE_OPTIONS`. It is quite similar to what
`CCC_OVERRIDE_OPTIONS` does for clang.
The `FCC_OVERRIDE_OPTIONS` seemed like the most obvious name to me but I
am open to other ideas. The `applyOverrideOptions` now takes an extra
argument that is the name of the environment variable. Previously
`CCC_OVERRIDE_OPTIONS` was hardcoded.
With these enabled we see a 70% performance regression for exchange2_r
on neoverse-v1 (aws graviton 3) using `-mcpu=native -Ofast -flto`. There
is also a smaller regression on neoverse-v2.
This appears to be because function specialization is no longer kicking
in during LTO for digits_2. This can be seen in the output executable:
previously it contained specialized copies of the function with names
like `_QMbrute_forcePdigits_2.specialized.4`. Now there are no names
like this.
The bug is not in flang - instead in the function specialization pass -
but due to the size of the regression I would like to request that this
is disabled until function specialization has been fixed.
This patch implements IR-based Profile-Guided Optimization support in
Flang through the following flags:
- `-fprofile-generate` for instrumentation-based profile generation
- `-fprofile-use=<dir>/file` for profile-guided optimization
Resolves#74216 (implements IR PGO support phase)
**Key changes:**
- Frontend flag handling aligned with Clang/GCC semantics
- Instrumentation hooks into LLVM PGO infrastructure
- LIT tests verifying:
- Instrumentation metadata generation
- Profile loading from specified path
- Branch weight attribution (IR checks)
**Tests:**
- Added gcc-flag-compatibility.f90 test module verifying:
- Flag parsing boundary conditions
- IR-level profile annotation consistency
- Profile input path normalization rules
- SPEC2006 benchmark results will be shared in comments
For details on LLVM's PGO framework, refer to [Clang PGO
Documentation](https://clang.llvm.org/docs/UsersManual.html#profile-guided-optimization).
This implementation was developed by [XSCC Compiler
Team](https://github.com/orgs/OpenXiangShan/teams/xscc).
---------
Co-authored-by: ict-ql <168183727+ict-ql@users.noreply.github.com>
Co-authored-by: Tom Eccles <t@freedommail.info>
This patch adds support for the -mprefer-vector-width= command line
option. The parsing of this options is equivalent to Clang's and it is
implemented by setting the "prefer-vector-width" function attribute.
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
This commit adds support for the `__COUNTER__` preprocessor macro, which
works the same as the one found in clang.
It is useful to generate unique names at compile-time.
Even though the spec (version 5.2) prohibits strcuture components from
being specified in `depend` clauses, this restriction is not sensible.
This PR rectifies the issue by lifting that restriction and allowing
structure components in `depend` clauses (which is allowed by OpenMP
6.0).
This helps to disambiguate accesses in the caller and the callee
after LLVM inlining in some apps. I did not see any performance
changes, but this is one step towards enabling other optimizations
in the apps that I am looking at.
The definition of llvm.noalias says:
```
... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.
```
I believe this exactly matches Fortran rules for the dummy arguments
that are modified during their subprogram execution.
I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments,
because the corresponding descriptors cannot be captured and cannot
alias anything (not based on them) during the execution of the
subprogram.
The check if the arguments are variable list items was missing, leading
to a crash in lowering in some invalid situations.
This fixes the first testcase reported in
https://github.com/llvm/llvm-project/issues/141481
Refactors the utils needed to create privtization/locatization ops for
both the fir and OpenMP dialects into a shared location isolating OpenMP
stuff out of it as much as possible.
This helps to disambiguate accesses in the caller and the callee
after LLVM inlining in some apps. I did not see any performance
changes, but this is one step towards enabling other optimizations
in the apps that I am looking at.
The definition of llvm.noalias says:
```
... indicates that memory locations accessed via pointer values based on the argument or return value are not also accessed, during the execution of the function, via pointer values not based on the argument or return value. This guarantee only holds for memory locations that are modified, by any means, during the execution of the function.
```
I believe this exactly matches Fortran rules for the dummy arguments
that are modified during their subprogram execution.
I also set llvm.noalias and llvm.nocapture on the !fir.box<> arguments,
because the corresponding descriptors cannot be captured and cannot
alias anything (not based on them) during the execution of the
subprogram.
When processing free form source line continuation, the prescanner
treats empty keyword macros as if they were spaces or tabs. After
skipping over them, however, there's code that only works if the skipped
characters ended with an actual space or tab. If the last skipped item
was an empty keyword macro's name, the last character of that name would
end up being the first character of the continuation line. Fix.
A dummy argument with an explicit INTEGER type of non-default kind can
be forward-referenced from a specification expression in many Fortran
compilers. Handle by adding type declaration statements to the initial
pass over a specification part's declaration constructs. Emit an
optional warning under -pedantic.
Fixes https://github.com/llvm/llvm-project/issues/140941.
When a TYPE(*) dummy argument is erroneously used as a component value
in a structure constructor, semantics crashes if the structure
constructor had been initially parsed as a potential function reference.
Clean out stale typed expressions when reanalyzing the reconstructed
parse subtree to ensure that errors are caught the next time around.
FORMAT("J=",I3) is accepted by a few other Fortran compilers as a valid
format for input as well as for output. The character string edit
descriptor "J=" is interpreted as if it had been 2X on input, causing
two characters to be skipped over. The skipped characters don't have to
match the characters in the literal string. An optional warning is
emitted under control of the -pedantic option.
The current semantic check in place is incorrect, this patch fixes this.
Up to 1 **'default'** named mapper should be allowed for each derived
type.
The current semantic check only allows up to 1 **'default'** named
mapper across all derived types.
This also makes sure that declare mappers follow proper scoping rules
for both default and named mappers.
Co-authored-by: Raghu Maddhipatla <Raghu.Maddhipatla@amd.com>
asin, acos, atan, and atan2 were being lowered to libm calls instead of
llvm intrinsics. Add the conversion patterns to handle these intrinsics
and update tests to expect this.
This patch only supports the conversion from `fir.do_loop` to `scf.for`.
This pass is still experimental, and future work will focus on gradually
improving this conversion pass.
Co-authored-by: yanming <ming.yan@terapines.com>
Fix a crash caused by an invalid expression in the atomic capture
clause, due to the `checkForSymbolMatch` function not accounting for
`GetExpr` potentially returning null.
Fix https://github.com/llvm/llvm-project/issues/139884
`cuf.alloc` and `cuf.free` are converted to `fir.alloca` or deleted when
in device context during the CUFOpConversion pass. Do not generate them
in lowering to avoid confusion.
Fixes#136357
The barrier needs to go between the copying into firstprivate variables
and the initialization call for the OpenMP construct (e.g. wsloop).
There is no way of expressing this in MLIR because for delayed
privatization that is all implicit (added in MLIR->LLVMIR conversion).
The previous approach put the barrier immediately before the wsloop (or
similar). For delayed privatization, the firstprivate copy code would
then be inserted after that, opening the possibility for the race
observed in the bug report.
This patch solves the issue by instead setting an attribute on the mlir
operation, which will instruct openmp dialect to llvm ir conversion to
insert a barrier in the correct place.
Memory effects on the volatile memory resource may not be attached to a
particular source, in which case the value of an effect will be null.
This caused this test case to crash in the optimized bufferization
pass's safety analysis because it assumes it can get the SSA value
modified by the memory effect. This is because memory effects on the
volatile resource indicate that the operation must not be reordered with
respect to other volatile operations, but there is not a material ssa
value that can be pointed to.
This patch changes the safety checks such that memory effects which do
not have associated values are not safe for optimized bufferization.
Flang at Ofast usually produces executables that consume more stack that
other Fortran compilers.
This is in part because the alloca created from temporary heap
allocation by the StackArray pass are created at the function scope
level without lifetimes, and LLVM does not/is not able to merge alloca
that do not have overlapping lifetimes.
This patch adds an option to generate LLVM lifetime in the StackArray
pass at the previous heap allocation/free using the LLVM dialect
operation for it.
In #138329, _GNU_SOURCE was added for Cygwin, but when building Clang
standalone against an installed LLVM this definition was not picked up,
resulting in undefined strnlen. Follow the documentation in
https://llvm.org/docs/CMake.html#embedding-llvm-in-your-project and add
the LLVM_DEFINITIONS in standalone projects' cmakes.
Some MPI libraries use character dummies + ignore(TKR) to allow passing
any kind of buffer.
This was meant to already be handled by #108168
However, when the library interface also had an argument requiring an
explicit interface, `builder.convertWithSemantics` was not allowed to properly deal
with the actual/dummy type mismatch and generated bad IR causing errors like:
`'fir.convert' op invalid type conversion'!fir.ref' / '!fir.boxchar\<1\>'`.
This restriction was artificial, lowering should just handle any cases
allowed by semantics. Just remove it.