These new debug values get inserted after the place where the spill
happens, which means they won't be reached by the reverse traversal of
basic block instructions. This would crash or fail assertions if they
contained any virtual registers to be replaced. We can manually handle
the new debug values right away to resolve this.
Fixes https://github.com/llvm/llvm-project/issues/59172
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D139590
Dynamic tls access model will be lowered to MI which clobbers CTR in
the loop in ISEL(ADDItlsgdLADDR) and post-isel CTR loop pass will revert
the loop to a normal compare + branch form.
So no need to add this clobber check in hardware loop insertion pass now.
Reviewed By: nemanjai
Differential revision: https://reviews.llvm.org/D140367
Passes before hardware loop insertion change the loop to a form which
is not a hardware loop candidate (return early before checking the ctr clobbers).
And the PHI in the loop exit block is also optimized away. This breaks the
previous test point when the case was committed. Fixing this by running this
case just before hardware loop insertion pass.
Reviewed By: nemanjai
Differential revision: https://reviews.llvm.org/D140366
This patch also includes:
1: CRRegBank support
2: Some workarounds in PPC table gen for anyext/setcc patterns
selection.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D140878
Previous to this patch we only materialized 0.0 and all other floating point
values would be loaded from the TOC. This patch adds materialization for the
floating point values that can be represented as integers in [-16.0, 15.0].
For example we will now materialize 3.0 and -5.0 but not 4.7.
Reviewed By: nemanjai, lei, #powerpc
Differential Revision: https://reviews.llvm.org/D138844
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.
This patch is for scalars only, but we might be able to do this
for vectors in a follow up.
Differential Revision: https://reviews.llvm.org/D140750
Generate brh, brw and brd instructions for byte-swap operations
on P10 and generating a single instruction for a 32-bit swap followed
by a 16-bit right shift.
Reviewed By: stefanp
Differential Revision: https://reviews.llvm.org/D140414
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.
This reverts commit 428f36401b.
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139507
This reverts commit 37b8f09a4b,
and returns commit 1bd0b82e50.
The miscompile was in InstCombine, and it has been addressed.
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
The combiner for BUILD_VECTOR that merges consecutive
loads into a wide load had two issues:
- It didn't check that the input loads all have the
same input chain
- It didn't update nodes that are chained to the original
loads to be chained to the new load
This caused issues with bootstrap when
3c4d2a0396 was committed.
This patch fixes the issue so it can unblock this commit.
Differential revision: https://reviews.llvm.org/D140046
Adds support for i64 constant. It uses the same pattern-based
approach as in SDAG (see PPCISelDAGToDAG::selectI64ImmDirect(),
PPCISelDAGToDAG::selectI64Imm()). It does not support the
prefixed instructions.
Reviewed By: arsenm, tschuett
Differential Revision: https://reviews.llvm.org/D140119
Alignment of an alloca in IR can be lower than the preferred alignment
on purpose, but this override essentially treats the preferred
alignment as the minimum alignment.
The patch changes this behavior to always use the specified
alignment. If alignment is not set explicitly in LLVM IR, it is set to
DL.getPrefTypeAlign(Ty) in computeAllocaDefaultAlign.
Tests are changed as well: explicit alignment is increased to match
the preferred alignment if it changes output, or omitted when it is
hard to determine the right value (e.g. for pointers, some structs, or
weird types).
Differential Revision: https://reviews.llvm.org/D135462
Summary: Currently we get a wrong fixed value for R_RBR relocations when -ffunction-sections enabled. This patch fixes this.
Reviewed By: DiggerLin, shchenz
Differential Revision: https://reviews.llvm.org/D138982
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
Over the past day or so, i've took a large swing at our tests,
and reduced the number of tests that were still using the old syntax
from ~1800 to just 200.
Left to handle: (as it is seen in this patch)
* Transforms/LSR
* Transforms/CGP
* Transforms/TypePromotion
* Transforms/HardwareLoops
* Analysis/*
* some misc.
I think this is the right point to start actively refusing
to honor the old syntax, except for the old tests,
to prevent the old syntax from creeping back in.
Thus, let's add temporary default-off flag,
and if it is not passed refuse to accept old syntax.
The tests that still need porting are annotated with this flag.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D139647
Machine combiner supports generic reassociation only of associative and
commutative instructions, for example (A + X) + Y => (X + Y) + A. However, we
can extend this generic support to handle patterns like
(X + A) - Y => (X - Y) + A), where `-` is the inverse of `+`.
This patch adds interface functions to process reassociation patterns of
associative/commutative instructions and their inverse variants with minimal
changes in backends.
Differential Revision: https://reviews.llvm.org/D136754
We've exploited test data class instructions introduced in ISA 3.0.
This change unifies the scalar intrinsics into ppc_test_data_class
and add support for 128-bit precision float values using xststdcqp.
Vector versions of the intrinsic can't be unified because they return
vector int instead of int.
Reviewed By: shchenz
Differential Revision: https://reviews.llvm.org/D138105
This reverts commit 122efef8ee.
- Patch fixed to not reuse definitions from predecessors in EH landing pads.
- Late review suggestions (by MaskRay) have been addressed.
- M68k/pipeline.ll test updated.
- Init captures added in processBlock() to avoid capturing structured bindings.
- RISCV has this disabled for now.
Original commit message:
A new pass MachineLateInstrsCleanup is added to be run after PEI.
This is a simple pass that removes redundant and identical instructions
whenever found by scanning the MF once while keeping track of register
definitions in a map. These instructions are typically immediate loads
resulting from rematerialization, and address loads emitted by target in
eliminateFrameInde().
This is enabled by default, but a target could easily disable it by means of
'disablePass(&MachineLateInstrsCleanupID);'.
This late cleanup is naturally not "optimal" in removing instructions as it
is done by looking at phys-regs, but still quite effective. It would be
desirable to improve other parts of CodeGen and avoid these redundant
instructions in the first place, but there are no ideas for this yet.
Differential Revision: https://reviews.llvm.org/D123394
Reviewed By: RKSimon, foad, craig.topper, arsenm, asb
Currently per-function metadata consists of:
(start-pc, size, features)
This adds a new UAR feature and if it's set an additional element:
(start-pc, size, features, stack-args-size)
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D136078
Add support for fptosi,fptoui,sitofp,uitofp
For now only handle 64 bit integer to make it does not depend on
any other patches. 32 bit integer needs handling for G_SEXT/G_ZEXT.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139174
We added a new post-isel CTRLoop pass in D122125. That pass will expand
the hardware loop related intrinsic to CTR loop or normal loop based
on the loop context. So we don't need to conservatively check the CTR
clobber now on the IR level.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D135847
Init captures added in processBlock() to avoid capturing structured bindings,
which caused the build problems (with clang).
RISCV has this disabled for now until problems relating to post RA pseudo
expansions are resolved.
Tail duplication may modify the loop to a "non-canonical" form
that CTR Loop pass can not recognize. We fixed one issue in D135846.
And we found in some other case, the loop is changed to irreducible form.
It is hard to fix this case in CTR loop pass, instead we reorder the
CTR loop pass before tail duplication pass and just after finalize-isel
pass to avoid any unexpected change to the loop form.
Reviewed By: lkail
Differential Revision: https://reviews.llvm.org/D138265