This is mostly NFC but some output does change due to consistently
inserting into poison rather than undef and using i64 as the index
type for inserts.
Hi,
AMD has it's own implementation of vector calls. This patch include the
changes to enable the use of AMD's math library using -fveclib=AMDLIBM.
Please refer https://github.com/amd/aocl-libm-ose
---------
Co-authored-by: Rohit Aggarwal <Rohit.Aggarwal@amd.com>
Now that the work embedding PGO information in SHT_LLVM_BB_ADDR_MAP ELF
sections has landed, there is no longer a need to keep around the
mbb-profile-dump flag.
Both RISC-V and AMDGPU(GCN) deploy two VirtRegRewriter in their codegen
pipeline. This test prematurely stops at the first one, which doesn't
cleanup the virtual register map and cause an assertion failure. Ideally
we can solve this by teaching `-stop-after` how to stop at the last
instance of a Pass, but we're just marking XFAIL for these two targets
for now.
c9f3288 introduced unconditional branch deduplication for basic block
sections and machine function splitting, but it didn't add tests for
AArch64 since prior behavior crashed the test.
This change adds tests for AArch64 and has no functional change.
The change is fairly mechanical:
1. Factor code from `FastISel::selectIntrinsicCall`, which converts
debug intrinsics into debug instructions, into functions (NFC).
2. Call those functions for DPValues attached to instructions too.
The test updates look the same as other RemoveDIs changes: re-run the
tests with `--try-experimental-debuginfo-iterators`, which checks the
output is identical using the new debug info format (if it has been
enabled in the cmake configuration).
Depends on #76941 (otherwise some modified tests spuriously fail).
`-debug-only=isel-dump` is the new debug type for printing SelectionDAG
after each ISel phase. This can be furthered filter by
`-filter-print-funcs=<function names>`.
Note that the existing `-debug-only=isel` will take precedence over the
new behavior and print SelectionDAG dumps of every single function
regardless of `-filter-print-funcs`'s values.
There are some workloads that are negatively impacted by using jump
tables when the number of entries is small. The SPEC2017 perlbench
benchmark is one example of this, where increasing the threshold to
around 13 gives a ~1.5% improvement on neoverse-v1. I chose the minimum
threshold based on empirical evidence rather than science, and just
manually increased the threshold until I got the best performance
without impacting other workloads. For neoverse-v1 I saw around ~0.2%
improvement in the SPEC2017 integer geomean, and no overall change for
neoverse-n1. If we find issues with this threshold later on we can
always revisit this.
The most significant SPEC2017 score changes on neoverse-v1 were:
500.perlbench_r: +1.6%
520.omnetpp_r: +0.6%
and the rest saw changes < 0.5%.
I updated CodeGen/AArch64/min-jump-table.ll to reflect the new
threshold. For most of the affected tests I manually set the min number
of entries back to 4 on the RUN line because the tests seem to rely upon
this behaviour.
Remove support for the fptrunc, fpext, fptoui, fptosi, uitofp and sitofp
constant expressions. All places creating them have been removed
beforehand, so this just removes the APIs and uses of these constant
expressions in tests.
With this, the only remaining FP operation that still has constant
expression support is fcmp.
This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
Noticed whilst working on #69494. VP intrinsics whose functional
equivalent is
an intrinsic were being marked as their lanes being non-speculatable,
even if
the underlying intrinsic was speculatable.
This meant that
```llvm
%1 = call <4 x i32> @llvm.vp.umax(<4 x i32> %x, <4 x i32> %y, <4 x i1> %mask, i32 %evl)
```
would be expanded out to
```llvm
%.splatinsert = insertelement <4 x i32> poison, i32 %evl, i64 0
%.splat = shufflevector <4 x i32> %.splatinsert, <4 x i32> poison, <4 x i32> zeroinitializer
%1 = icmp ult <4 x i32> <i32 0, i32 1, i32 2, i32 3>, %.splat
%2 = and <4 x i1> %1, %mask
%3 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```
instead of
```llvm
%1 = call <4 x i32> @llvm.umax.v4i32(<4 x i32> %x, <4 x i32> %y)
```
The cause of this was isSafeToSpeculativelyExecuteWithOpcode checking
the
function attributes for the VP instruction itself, not the functional
intrinsic. Since isSafeToSpeculativelyExecuteWithOpcode expects an
already
materialized instruction, we can't use it directly for the intrinsic
case. So
this fixes it by manually checking the function attributes on the
intrinsic.
The goal in #66818 was to capture function entry counts, but those are not the same as the frequency of the entry (machine) basic block. This fixes that, and adds explicit profiles to the test.
We also increase the precision of `MachineBlockFrequencyInfo::getBlockFreqRelativeToEntryBlock` to double. Existing code uses it as float so should be unaffected.
This patch moves the MBB Profile Dump to ./llvm/test/CodeGen/Generic
from ./llvm/test/CodeGen/MlRegAlloc as the profile dump doesn't have
anything to do with the ML guided register allocation heuristic.
Machine function splitting + branch relaxation currently don't properly
handle inline asm goto blocks that conditional branch to cold goto
labels. While such inline asm is technically invalid, machine
function splitting is the only thing that exposes it as such.
Since machine function splitting doesn't help too much in these
circumstances anyway, disable it for asm goto blocks and their targets.
Differential Revision: https://reviews.llvm.org/D158647
Jump tables on AArch64 are label-relative rather than table-relative, so
having jump table destinations that are in different sections causes
problems with relocation. Jump table lookups have a max range of 1MB, so
all destinations must be in the same section as the lookup code. Both of
these restrictions can be mitigated with some careful and complex logic,
but doing so doesn't gain a huge performance benefit.
Efficiently ensuring jump tables are correct and can be compressed on
AArch64 is a TODO item. In the meantime, don't split blocks that can
cause problems.
Differential Revision: https://reviews.llvm.org/D157124
The option was added in github.com/llvm/llvm-project/commit/90ab85a but it doesn't seem to be used. The triple check has been removed so this shouldn't be required going forward.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D158885
Because unconditional branch relaxation on AArch64 grows the stack to
spill a register, splitting a function would cause the red zone to be
overwritten. Explicitly disable MFS for such functions.
Differential Revision: https://reviews.llvm.org/D157127
Reverted in 4c8d056f50 because it broke
buildbot `llvm-clang-x86_64-expensive-checks-debian` due to the AArch64
test generating invalid code. The issue still exists, but it's fixed in
D156767, so the AArch64 test should be added there.
Differential Revision: https://reviews.llvm.org/D158674
If an end section basic block ends in an unconditional branch to its
fallthrough, BasicBlockSections will duplicate the unconditional branch.
This doesn't break x86, but it is a (slight) size optimization and more
importantly prevents AArch64 builds from breaking.
Ex:
```
bb1 (bbsections Hot):
jmp bb2
bb2 (bbsections Cold):
/* do work... */
```
After running sortBasicBlocksAndUpdateBranches():
```
bb1 (bbsections Hot):
jmp bb2
jmp bb2
bb2 (bbsections Cold):
/* do work... */
```
Differential Revision: https://reviews.llvm.org/D158674
This reverts commit 317a0fe5bd.
This reverts commit 30c4b97aec.
See post-commit discussions on https://reviews.llvm.org/D157750 that
we should use a different mechanism to handle the error with --cuda-gpu-arch=
The IR/DiagnosticInfo.cpp, warn_drv_for_elf_only, codegne tests in
clang/test/Driver, and the following driver behavior (downgrading error
to warning) changes are undesired.
```
% clang --target=riscv64 -fsplit-machine-functions -c a.c
warning: -fsplit-machine-functions is not valid for riscv64 [-Wbackend-plugin]
```
This CL includes two changes:
1. moved clang backend-warnings test cases from Driver/ to CodeGen/.
2. removed multiple `cd "$(dirname "%t")"` and replaced with `-o %t`.
Reviewed By: maskray (Fangrui Song)
Differential Revision: https://reviews.llvm.org/D157565
When building a fatbinary, the driver invokes the compiler multiple
times with different "--target". (For example, with "-x cuda
--cuda-gpu-arch=sm_70" flags, clang will be invoded twice, once with
--target=x86_64_...., once with --target=sm_70) If we use
-fsplit-machine-functions or -fno-split-machine-functions for such
invocation, the driver reports an error.
This CL changes the behavior so:
- "-fsplit-machine-functions" is now passed to all targets, for non-X86
targets, the flag is a NOOP and causes a warning.
- "-fno-split-machine-functions" now negates -fsplit-machine-functions (if
-fno-split-machine-functions appears after any -fsplit-machine-functions)
for any target triple, previously, it causes an error.
- "-fsplit-machine-functions -Xarch_device -fno-split-machine-functions"
enables MFS on host but disables MFS for GPUS without warnings/errors.
- "-Xarch_host -fsplit-machine-functions" enables MFS on host but disables
MFS for GPUS without warnings/errors.
Reviewed by: xur, dhoekwater
Differential Revision: https://reviews.llvm.org/D157750
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.
MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.
Differential Revision: https://reviews.llvm.org/D157563
Machine function splitting will become available for AArch64; since MFS
is no longer X86-only, the tests for generic behavior should live
somewhere other than tests/CodeGen/X86.
MFS implementation doesn't vary much across platforms, and most tests
should be identical between X86 and AArch64 besides instruction
selection, so the tests can live together in tests/CodeGen/Generic.
Differential Revision: https://reviews.llvm.org/D157563
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004