Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.
This patch adds support for that flag in the driver, as well as setting the
necessary codegen options for the backend. Largely, this means we select
the newly added pass pipeline for generating fat objects.
Users are expected to pass -ffat-lto-objects to clang in addition to one
of the -flto variants. Without the -flto flag, -ffat-lto-objects has no
effect.
// Compile and link. Use the object code from the fat object w/o LTO.
clang -fno-lto -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Select full LTO at link time.
clang -flto -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Select ThinLTO at link time.
clang -flto=thin -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Use ThinLTO with the UnifiedLTO pipeline.
clang -flto=thin -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c
// Compile and link. Use full LTO with the UnifiedLTO pipeline.
clang -flto -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c
// Link separately, using ThinLTO.
clang -c -flto=thin -ffat-lto-objects foo.c
clang -flto=thin -fuse-ld=lld foo.o -ffat-lto-objects # pass --lto=thin --fat-lto-objects to ld.lld
// Link separately, using full LTO.
clang -c -flto -ffat-lto-objects foo.c
clang -flto -fuse-ld=lld foo.o # pass --lto=full --fat-lto-objects to ld.lld
Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977
Depends on D146776
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D146777
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.
This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
I've included IR autoupgrade support as well.
I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154647
Commit 98390ccb80 fixed late template
instantiation by clearing FP pragma stack before instantiation. This
solution was based on the assumptions:
- FP pragma stack is not used anymore and it is safe to clear it,
- Default FP options are determined by command line options.
Both the assumptions are wrong. When compilation produces precompiled
header file, state of the stack is serialized and then restored when the
precompiled header is used. Delayed template parsing occurs at the end
of translation unit but before serialization, so clearing FP pragma
stack effects serialized representation. When the precompiled file is
loaded, some conditions can be broken and clang crashed, it was
described in https://github.com/llvm/llvm-project/issues/63704. The
crash was observed only in few cases, on most buildbots it was absent.
The violation of expected conditions was caused by violation of the
second assumption. FPEvalMethod can be modified by target, so it is not
possible to deduce it from LangOptions only. So default FP state read
from precompiled header was different from the state in the initialized
Sema, and this was the crash reason.
Only two targets do such modification of default FP options, these are
i386 and AIX. so the problem was hard to reproduce.
Delayed template parsing should occur with empty pragma stack, so it
must be cleared before the instantiation, but the stack now is saved
and restored after the instantiation is done.
This change should fix https://github.com/llvm/llvm-project/issues/63704.
Differential Revision: https://reviews.llvm.org/D155380
Besides being a useful abstraction, this function will help insulate existing clients of the framework from upcoming changes to the API of `StructValue` and `AggregateStorageLocation`.
Depends On D155202
Reviewed By: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D155204
This consolidates the code used in various places to initialize objects (usually for variables) into one central location.
It will also help reduce the number of changes needed when we make the upcoming API changes to `AggregateStorageLocation` and `StructValue`.
Depends On D155074
Reviewed By: ymandel, xazax.hun
Differential Revision: https://reviews.llvm.org/D155075
In https://reviews.llvm.org/D126664, a warning is introduced
warning against the deprecated out of line definition of a
static constexpr member in C++17 and later. Prior to this patch,
the only diagnostic group controlling this diagnostic was -Wdeprecated,
which controls many many diagnostics. This patch creates
a diagnostic group specifically for this warning so it can
be controlled in isolation, while also being included with -Wdeprecated.
Differential Revision: https://reviews.llvm.org/D153881
This patch adds a new option -fkeep-persistent-storage-variables to emit
all variables that have a persistent storage duration, including global,
static and thread-local variables. This could be useful in cases where
the presence of all these variables as symbols in the object file are
required, so that they can be directly addressed.
Reviewed By: hubert.reinterpretcast
Differential Revision: https://reviews.llvm.org/D150221
Unsigned is a better representation for bitmanipulation and cryptography.w
The only exception being the return values for clz and ctz intrinsics is
a signed int. That matches the target independent clz and ctz builtins.
This is consistent with the current scalar crypto proposal
https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154616
This removes another use of 'long' to mean xlen from builtins.
I've also converted the types to unsigned as proposed in D154616.
clmul_32 is available to RV64 as its emulation is clmul+sext.w
clmulh_32 and clmulr_32 are not available on RV64 as their emulation
is currently 6 instructions in the worst case.
For a fix-it that inserts the `[[clang::unsafe_buffer_usage]]`
attribute, it will lookup existing macros defined for the attribute
and use the (last defined such) macro directly. Fix-its will use raw
`[[clang::unsafe_buffer_usage]]` if no such macro is defined.
The implementation mimics how a similar machine for the
`[[fallthrough]]` attribute was implemented.
Reviewed by: NoQ (Artem Dergachev)
Differential revision: https://reviews.llvm.org/D150338
In D114095, `HeaderFileInfo::NumIncludes` was moved into `Preprocessor`. This still makes sense, because we want to track this on the granularity of submodules (D112915, D114173), but the way this information is serialized is not ideal. In `ASTWriter`, the set of included files gets deserialized eagerly, issuing lots of calls to `FileManager::getFile()` for input files the PCM consumer might not be interested in.
This patch makes the information part of the header file info table, taking advantage of its lazy deserialization which typically happens when a file is about to be included.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D155131
When -mcpu=native is specified, try detecting GPU
on the system by using amdgpu-arch tool. If it
fails to detect GPU, emit an error about GPU
not detected. If multiple GPUs are detected,
use the first GPU and emit a warning.
Reviewed by: Matt Arsenault, Fangrui Song
Differential Revision: https://reviews.llvm.org/D154531
/data/llvm-project/clang/include/clang/Support/RISCVVIntrinsicUtils.h:390:8: error: private field 'HasFRMRoundModeOp' is not used [-Werror,-Wunused-private-field]
bool HasFRMRoundModeOp;
^
1 error generated.
Depends on D154635
For the cover letter of the patch-set, please checkout D154628.
This is the 8th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154636
Depends on D154634
For the cover letter of the patch-set, please checkout D154628.
This is the 7th patch of the patch-set. This patch includes change to
vfcvt_x_f, vfcvt_xu_f, vfwcvt_x_f, vfwcvt_xu_f, vfncvt_x_f, vfncvt_xu_f
vfcvt_f_x, vfcvt_f_xu, vfncvt_f_x vfncvt_f_xu, vfncvt_f_f
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154635
Depends on D154633
For the cover letter of the patch-set, please checkout D154628.
This is the 6th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154634
Depends on D154632
For the cover letter of the patch-set, please checkout D154628.
This is the 5th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154633
Depends on D154631
For the cover letter of the patch-set, please checkout D154628.
This is the 4th patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154632
Depends on D154629
For the cover letter of the patch-set, please checkout D154628.
This is the 3rd patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154631
Depends on D154628
For the cover letter of the patch-set, please checkout D154628.
This is the 2nd patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154629
Depends on D152996.
This patch-set aims to add a variant for the RVV floating-point
intrinsics that controls the rounding mode (`frm`). The rounding mode
variant appends `_rm` before the policy suffix to distinguish from
those without them.
Specification PR: riscv-non-isa/rvv-intrinsic-doc#226
This is the 1st patch of the patch-set.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D154628
Depends on D152879.
Specification PR: riscv-non-isa/rvv-intrinsic-doc#226
This patch adds variant of `vfadd` that models the rounding mode control.
The added variant has suffix `_rm` appended to differentiate from the
existing ones that does not alternate `frm` and uses whatever is inside.
The value `7` is used to indicate no rounding mode change. Reusing the
semantic from the rounding mode encoding for scalar floating-point
instructions.
Additional data member `HasFRMRoundModeOp` is added so we can append
`_rm` suffix for the fadd variants that models rounding mode control.
Additional data member `IsRVVFixedPoint` is added so we can define
pseudo instructions with rounding mode operand and distinguish the
instructions between fixed-point and floating-point.
Reviewed By: craig.topper, kito-cheng
Differential Revision: https://reviews.llvm.org/D152996
This patch makes the bit-fields wider, and also implement a small optimization for `PseudoObjectExprBitfields`, when there is no result in `PseudoObjectExpr`, we use 32 bits to store the number of subexpressions, otherwise, we use 16 bits to store the number of subexpressions, and use 16 bits to store the result indexes.
Fixes https://github.com/llvm/llvm-project/issues/63169
Reviewed By: aaron.ballman
Differential Revision: https://reviews.llvm.org/D154784
D152023 made ubsan consider __builtin_clz of 0 undefined regardless of
the target. This ensures portability and matches gcc.
This causes the ACLE intrinsics to also be considered to also be
considered to be undefined for 0 since they used the generic builtins
as their implementation.
This patch adds builtins for ARM that ubsan doesn't know about to make
the behavior defined for 0. Alternatively, I could have added a zero
check to the intrinsics, but the dedicated builtin will give better -O0
codegen.
Fixes#63113.
Reviewed By: tmatheson
Differential Revision: https://reviews.llvm.org/D154915
Arm Performance Libraries contain math library which provides
vectorized versions of common math functions.
This patch allows to use it with clang and llvm via -fveclib=ArmPL or
-vector-library=ArmPL, so loops with such calls can be vectorized.
The executable needs to be linked with the amath library.
Arm Performance Libraries are available at:
https://developer.arm.com/Tools%20and%20Software/Arm%20Performance%20Libraries
Reviewed by: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D154508
Fixes two places where we relied on map iteration order when processing
values, which leaked nondeterminism into the generated SAT formulas.
Adds a couple of tests that directly assert that the SAT system is
equivalent on each run.
It's desirable that the formulas are deterministic based on the input:
- our SAT solver is naive and perfermance is sensitive to even simple
semantics-preserving transformations like A|B to B|A.
(e.g. it's likely to choose a different variable to split on).
Timeout failures are bad, but *flaky* ones are terrible to debug.
- similarly when debugging, it's important to have a consistent
understanding of what e.g. "V23" means across runs.
---
Both changes in this patch were isolated from a nullability analysis of
real-world code which was extremely slow, spending ages in the SAT
solver at "random" points that varied on each run.
I've included a reduced version of the code as a regression test.
One of the changes shows up directly as flow-condition nondeterminism
with a no-op analysis, the other relied on bits of the nullability
analysis but I found a synthetic example to show the problem.
Differential Revision: https://reviews.llvm.org/D154948
There's some documentation of this concept at
https://clang.llvm.org/docs/DataFlowAnalysisIntro.html
but it would be nice to have it closer to the code.
I also was laboring under an obvious but wrong mental model that
the flow condition token represented "execution reached this point",
I'd like to explicitly call that out as wrong.
Differential Revision: https://reviews.llvm.org/D154969
Previously, we were including these fields only in the specific instance that was initialized by the `InitListExpr`, but not in other instances of the same type. This is inconsistent and error-prone.
Depends On D154952
Reviewed By: xazax.hun, gribozavr2
Differential Revision: https://reviews.llvm.org/D154961
The unified LTO pipeline creates a single LTO bitcode structure that can
be used by Thin or Full LTO. This means that the LTO mode can be chosen
at link time and that all LTO bitcode produced by the pipeline is
compatible, from an optimization perspective. This makes the behavior of
LTO a bit more predictable by normalizing the set of LTO features
supported by each LTO bitcode file.
Example usage:
# Compile and link. Select regular LTO at link time.
clang -flto -funified-lto -fuse-ld=lld foo.c
# Compile and link. Select ThinLTO at link time.
clang -flto=thin -funified-lto -fuse-ld=lld foo.c
# Link separately, using ThinLTO.
clang -c -flto -funified-lto foo.c # -flto={full,thin} are identical in
terms of compilation actions
clang -flto=thin -fuse-ld=lld foo.o # pass --lto=thin to ld.lld
# Link separately, using regular LTO.
clang -c -flto -funified-lto foo.c
clang -flto -fuse-ld=lld foo.o # pass --lto=full to ld.lld
The RFC discussing the details and rational for this change is here:
https://discourse.llvm.org/t/rfc-a-unified-lto-bitcode-frontend/61774
This restores commit b4a82b6225, reverted
in 3ab7ef28ee because it was thought to
cause a bot failure, which ended up being unrelated to this patch set.
Differential Revision: https://reviews.llvm.org/D154856
Such jumps are not allowed by GCC and allowing them
can lead to situations where we jumps into unevaluated
statements.
Fixes#63682
Reviewed By: aaron.ballman, #clang-language-wg
Differential Revision: https://reviews.llvm.org/D154696
This patch implements `llvm::PointerLikeTraits<FileEntryRef>` and `llvm::PointerLikeTraits<DirectoryEntryRef>`, allowing some simplifications around umbrella header/directory code.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D154905
Close https://github.com/llvm/llvm-project/issues/63544.
Background: We landed std modules in libcxx recently but we haven't
landed the corresponding in-tree tests. According to @Mordante, there
are only 1% libcxx tests failing with std modules. And the major
blocking issue is the lambda expression in the require clauses.
The root cause of the issue is that previously we never consider any
lambda expression as the same. Per [temp.over.link]p5:
> Two lambda-expressions are never considered equivalent.
I thought this is an oversight at first but @rsmith explains that in the
wording, the program is as if there is only a single definition, and a
single lambda-expression. So we don't need worry about this in the spec.
The explanation makes sense. But it didn't reflect to the implementation
directly.
Here is a cycle in the implementation. If we want to merge two
definitions, we need to make sure its implementation are the same. But
according to the explanation above, we need to judge if two
lambda-expression are the same by looking at its parent definitions. So
here is the problem.
To solve the problem, I think we have to profile the lambda expressions
actually to get the accurate information. But we can't do this
universally. So in this patch I tried to modify the interface of
`Stmt::Profile` and only profile the lambda expression during the
process of merging the constraint expressions.
Differential Revision: https://reviews.llvm.org/D153957
Have ToolChain::IsIntegratedAssemblerDefault default to true.
Almost all of the ToolChains are using IAS nowadays. There are a few exceptions like
XCore, some NaCl archs, and NVPTX/XCore in Generic_GCC::IsIntegratedAssemblerDefault.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D154902
Previously the MemProf profile was expected to be in the same profile
file as a normal PGO profile, passed via the usual -fprofile-use=
option, and was matched in the same pass. To simplify profile
preparation, since the raw MemProf profile requires the binary for
symbolization and may be simpler to index separately from the raw PGO
profile, and also to enable providing a MemProf profile for a SamplePGO
build, separate out the MemProf feedback option and matching pass.
This patch adds the -fmemory-profile-use=${file} option, and the
provided file is passed down to LLVM and ultimately used in a new
MemProfUsePass which performs the matching of just the memory profile
contents of that file.
Note that a single profile file containing both normal PGO and MemProf
profile data is still supported, and the relevant profile data is
matched by the appropriate matching pass(es) based on which option(s)
the profile is provided with (the same profile file can be supplied to
both feedback options).
Differential Revision: https://reviews.llvm.org/D154856