Machine function splitting + branch relaxation currently don't properly
handle inline asm goto blocks that conditional branch to cold goto
labels. While such inline asm is technically invalid, machine
function splitting is the only thing that exposes it as such.
Since machine function splitting doesn't help too much in these
circumstances anyway, disable it for asm goto blocks and their targets.
Differential Revision: https://reviews.llvm.org/D158647
Jump tables on AArch64 are label-relative rather than table-relative, so
having jump table destinations that are in different sections causes
problems with relocation. Jump table lookups have a max range of 1MB, so
all destinations must be in the same section as the lookup code. Both of
these restrictions can be mitigated with some careful and complex logic,
but doing so doesn't gain a huge performance benefit.
Efficiently ensuring jump tables are correct and can be compressed on
AArch64 is a TODO item. In the meantime, don't split blocks that can
cause problems.
Differential Revision: https://reviews.llvm.org/D157124
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D158885
Because unconditional branch relaxation on AArch64 grows the stack to
spill a register, splitting a function would cause the red zone to be
overwritten. Explicitly disable MFS for such functions.
Differential Revision: https://reviews.llvm.org/D157127
Reverted in 4c8d056f50 because it broke
buildbot `llvm-clang-x86_64-expensive-checks-debian` due to the AArch64
test generating invalid code. The issue still exists, but it's fixed in
D156767, so the AArch64 test should be added there.
Differential Revision: https://reviews.llvm.org/D158674
If an end section basic block ends in an unconditional branch to its
fallthrough, BasicBlockSections will duplicate the unconditional branch.
This doesn't break x86, but it is a (slight) size optimization and more
importantly prevents AArch64 builds from breaking.
Ex:
```
bb1 (bbsections Hot):
jmp bb2
bb2 (bbsections Cold):
/* do work... */
```
After running sortBasicBlocksAndUpdateBranches():
```
bb1 (bbsections Hot):
jmp bb2
jmp bb2
bb2 (bbsections Cold):
/* do work... */
```
Differential Revision: https://reviews.llvm.org/D158674
This reverts commit 317a0fe5bd.
This reverts commit 30c4b97aec.
See post-commit discussions on https://reviews.llvm.org/D157750 that
we should use a different mechanism to handle the error with --cuda-gpu-arch=
The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in
clang/test/Driver, and the following driver behavior (downgrading error
to warning) changes are undesired.
```
% clang --target=riscv64 -fsplit-machine-functions -c a.c
warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin]
```
This CL includes two changes:
1. moved clang backend-warnings test cases from Driver/ to CodeGen/.
2. removed multiple `cd "$(dirname "%t")"` and replaced with `-o %t`.
Reviewed By: maskray (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D157565
When building a fatbinary, the driver invokes the compiler multiple
times with different "--target". (For example, with "-x cuda
--cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with
--target=x86_64_...., once with --target=sm_70) If we use
-fsplit-machine-functions or -fno-split-machine-functions for such
invocation, the driver reports an error.
This CL changes the behavior so:
- "-fsplit-machine-functions" is now passed to all targets, for non-X86
targets, the flag is a NOOP and causes a warning.
- "-fno-split-machine-functions" now negates -fsplit-machine-functions (if
-fno-split-machine-functions appears after any -fsplit-machine-functions)
for any target triple, previously, it causes an error.
- "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions"
enables MFS on host but disables MFS for GPUS without warnings/errors.
- "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables
MFS for GPUS without warnings/errors.
Reviewed by: xur, dhoekwater
Differential Revision: https://reviews.llvm.org/D157750
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.
MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.
Differential Revision: https://reviews.llvm.org/D157563
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.
MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.
Differential Revision: https://reviews.llvm.org/D157563
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.
There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.
Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.
Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.
I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.
If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.
This could also be of use for strictfp handling, where code may be
changing the denormal mode.
Alternative name could be "unknown".
Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
With this patch an undefined mask in a shufflevector will be printed as poison.
This change is done to support the new shufflevector semantics
for undefined mask elements.
Differential Revision: https://reviews.llvm.org/D149210
Based off D148215, when expanding a min/max reduction we should be creating min/max intrinsics directly instead of relying on instcombine to fold them back together.
This patch handles integer min/max cases. Hopefully we can add floating point support soon (at least for fastmath/nnan cases) - but we're missing some of the plumbing to pass the correct FMF to the intrinsic at the moment.
Differential Revision: https://reviews.llvm.org/D148221
The addr-label.ll test uses the following setup:
define ptr @test1() nounwind {
entry:
ret ptr blockaddress(@test1b, %test_label)
}
define i32 @test1b() nounwind {
entry:
ret i32 -1
test_label:
br label %ret
ret:
ret i32 -1
}
However, according to the LLVM Reference guide for blockaddress()
"This value only has defined behavior when used as an operand to the
‘indirectbr’ or for comparisons against null." For this test the value
is just returned as a pointer from test1().
On PowerPC this test has unreliable results as the order in which
passes are run can make this test pass or fail. If the %test_label
in test1b() is removed before a number of passes are completed on
test1() then this test will fail on PowerPC.
I have marked this test as UNSUPPORTED on PowerPC.
... when finding all memory uses for an address and make it a
parameter.
Now that we have avoided potentially exponential run time of
`FindAllMemoryUses` in D143893. it'd be beneficial to increase the
limit up from 20.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D143894
Change-Id: I3abdf40332ef65e9b2f819ac32ac60e4200ec51d
The counter of the number of instructions seen in `FindAllMemoryUses`
is reset after returning from a recursive invocation of
`FindAllMemoryUses` to the value it had before the call. In effect,
depending on the shape of the uses graph, the function may scan up to
`2^N-1` instructions where `N` is the scan limit
(`MaxMemoryUsesToScan`). This does not look intuitive or intended.
This patch changes the counting to just count the scanned
instructions, independent of the shape of the references.
Reviewed By: mkazantsev
Differential Revision: https://reviews.llvm.org/D143893
Change-Id: I99f5de55e84843cf2fbea287d6ae4312fa196240
This is a mitigation patch for
https://bugs.chromium.org/p/llvm/issues/detail?id=30, where existing stack
protection is skipped if a function is returned through by an unwinder rather
than the normal call/return path. The recent patch D139254 added the ability to
instrument a visible unwind path, at least in the IR case (I'm working on the
SelectionDAG instrumentation too) but there are still invisible unwinds it
can't reach.
So this patch adds logic to DwarfEHPrepare that goes through a function,
converting any call that might throw into an invoke to a simple resume cleanup,
and adding cleanup clauses to existing landingpads that lack them. Obviously we
don't really want to do this if it's wasted effort, so I also exposed
requiresStackProtector from the actual StackProtector code to skip the extra
paths if they won't be used.
https://reviews.llvm.org/D143637
The Language Reference says that aliases can have available_externally
linkage if their aliasee is an available_externally global value. Using
this kind of aliases resulted in crashes during code generation, filter
them out (the same that the AsmPrinter also filters out GlobalVariables
in emitSpecialLLVMGlobal(); Functions are discarded in the machine pass
infrastructure).
Differential Revision: https://reviews.llvm.org/D142352
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.
This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.
The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.
Fixes: #58168
Reviewed By: nickdesaulniers, thegameg
Differential Revision: https://reviews.llvm.org/D135488
Issue #58168 describes the difficulty diagnosing stack size issues
identified by -Wframe-larger-than. For simple code, its easy to
understand the stack layout and where space is being allocated, but in
more complex programs, where code may be heavily inlined, unrolled, and
have duplicated code paths, it is no longer easy to manually inspect the
source program and understand where stack space can be attributed.
This patch implements a machine function pass that emits remarks with a
textual representation of stack slots, and also outputs any available
debug information to map source variables to those slots.
The new behavior can be used by adding `-Rpass-analysis=stack-frame-layout`
to the compiler invocation. Like other remarks the diagnostic
information can be saved to a file in a machine readable format by
adding -fsave-optimzation-record.
Fixes: #58168
Reviewed By: nickdesaulniers, thegameg
Differential Revision: https://reviews.llvm.org/D135488
Instcombine prefers this canonical form (see getPreferredVectorIndex),
as does IRBuilder when passing the index as an integer so we may as
well use the prefered form from creation.
NOTE: All test changes are mechanical with nothing else expected
beyond a change of index type from i32 to i64.
Differential Revision: https://reviews.llvm.org/D140983
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.
Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.
I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.
Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D139647
When this test was originally added in 991dfedfd7, it didn't pass
`-o` to to llc, causing llc to write a .s file to the source directory.
On the next run, lit would then try to run that as a test.
Make the test auto-cleanup that file for a while.