Commit Graph

1751 Commits

Author SHA1 Message Date
Florian Hahn
b1a5ee1feb [ARM] Check all terms in emitPopInst when clearing Restored for LR. (#75527)
emitPopInst checks a single function exit MBB. If other paths also exit
the function and any of there terminators uses LR implicitly, it is not
save to clear the Restored bit.

Check all terminators for the function before clearing Restored.

This fixes a mis-compile in outlined-fn-may-clobber-lr-in-caller.ll
where the machine-outliner previously introduced BLs that clobbered LR
which in turn is used by the tail call return.

Alternative to #73553
2023-12-20 16:56:15 +01:00
Philip Reames
ffb2af3ed6 [SCEVExpander] Attempt to reinfer flags dropped due to CSE (#72431)
LSR uses SCEVExpander to generate induction formulas. The expander
internally tries to reuse existing IR expressions. To do that, it needs
to strip any poison generating flags (nsw, nuw, exact, nneg, etc..)
which may not be valid for the newly added users.

This is conservatively correct, but has the effect that LSR will strip
nneg flags on zext instructions involved in trip counts in loop
preheaders. To avoid this, this patch adjusts the expanded to reinfer
the flags on the CSE candidate if legal for all possible users.

This should fix the regression reported in
https://github.com/llvm/llvm-project/issues/71200.

This should arguably be done inside canReuseInstruction instead, but
doing it outside is more conservative compile time wise. Both
canReuseInstruction and isGuaranteedNotToBePoison walk operand lists, so
right now we are performing work which is roughly O(N^2) in the size of
the operand graph. We should fix that before making the per operand step
more expensive. My tenative plan is to land this, and then rework the
code to sink the logic into more core interfaces.
2023-12-07 13:20:36 -08:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
Florian Hahn
20f634f275 [Thumb] Add test case where the machine-outliner clobbers LR.
Add ad test case where `bl OUTLINED_FUNCTION_0` clobbers LR, which in
turn is used the later call to memcpy to return to the caller.
2023-11-24 20:27:43 +00:00
Simon Pilgrim
de41396895 [DAG] foldABSToABD - add support for abs(sub(sign_extend_inreg(),sign_extend_inreg())) patterns
Partial fix for ABDS regressions on D152928
2023-11-15 15:49:30 +00:00
Matthias Braun
e3cf80c5c1 BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00
David Green
8a701024f3 [ARM] Lower i1 concat via MVETRUNC
The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
2023-10-18 19:40:11 +01:00
David Green
c060757bcc [ARM] Correct v2i1 concat extract types.
For two v2i1 concat into a v4i1, we cannot extract each i64 element as an i32.
This casts to a v4i32 instead and extracts the correct vector lanes.
2023-10-18 13:40:38 +01:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3 [CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
JP Lehr
e816c89c84 Revert "InlineSpiller: Consider if all subranges are the same when avoiding redundant spills"
This reverts commit d8127b2ba8.
2023-10-02 06:26:33 -05:00
Matt Arsenault
d8127b2ba8 InlineSpiller: Consider if all subranges are the same when avoiding redundant spills
This avoids some redundant spills of subranges, and avoids a compile failure.
This greatly reduces the numbers of spills in a loop.

The main range is not informative when multiple instructions are needed to fully define
a register. A common scenario is a lowered reg_sequence where every subregister
is sequentially defined, but each def changes the main range's value number. If
we look at specific lanes at the use index, we can see the value is actually the
same.

In this testcase, there are a large number of materialized 64-bit constant defs
which are hoisted outside of the loop by MachineLICM. These are feeding REG_SEQUENCES,
which is not considered rematerializable inside the loop. After coalescing, the split
constant defs produce main ranges with an apparent phi def. There's no phi def if you look
at each individual subrange, and only half of the register is really redefined to a constant.

Fixes: SWDEV-380865

https://reviews.llvm.org/D147079
2023-10-01 11:37:53 +03:00
Matt Arsenault
7252787dd9 RegAllocGreedy: Fix detection of lanes read by a bundle
SplitKit creates questionably formed bundles of copies
when it needs to copy a subset of live lanes and can't do
it with a single subregister index. These are merely marked
as part of a bundle, and don't start with a BUNDLE instruction.
Queries for the slot index would give the first copy in the
bundle, and we need to inspect the operands of all the other
bundled copies.

Also fix and simplify detection of read lane subsets. This causes
some RISCV test regressions, but these look like accidentally beneficial
splits. I don't see a subrange based reason to perform these splits.

Avoids some really ugly regressions in a future patch.

https://reviews.llvm.org/D146859
2023-10-01 11:37:48 +03:00
Jingu Kang
ff68e43c81 [MachineLICM] Handle Subloops
It is a re-commit from reverted commit 3454cf67bd.

Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-26 14:25:11 +01:00
David Green
54e5de08d4 [ARM][LSR] Exclude uses outside the loop when favoring postinc. (#67090)
Extra uses for variables outside the loop can mess with the generation
of postinc variables. This patch alters the collection of loop invariant
fixups in LSR when the target is optimizing for PostInc, to exclude the
collection of these extra uses. It is expected that the variable can be
rematerialized, which will lead to a more optimal sequence of
instructions in the loop.
2023-09-25 10:09:36 +01:00
David Green
22f423aa46 [ARM] Add some extra testing for MVE postinc loops. NFC 2023-09-22 07:08:49 +01:00
Jay Foad
e0919b189b [CodeGen] Renumber slot indexes before register allocation (#66334)
RegAllocGreedy uses SlotIndexes::getApproxInstrDistance to approximate
the length of a live range for its heuristics. Renumbering all slot
indexes with the default instruction distance ensures that this estimate
will be as accurate as possible, and will not depend on the history of
how instructions have been added to and removed from SlotIndexes's maps.

This also means that enabling -early-live-intervals, which runs the
SlotIndexes analysis earlier, will not cause large amounts of churn due
to different register allocator decisions.
2023-09-19 11:18:12 +01:00
Jay Foad
d8d0588f66 [TwoAddressInstruction] Update LiveIntervals after INSERT_SUBREG with undef read (#66211)
Update LiveIntervals after rewriting:
  %reg = INSERT_SUBREG undef %reg, %subreg, subidx
to:
  undef %reg:subidx = COPY %subreg

D113044 implemented this for the non-undef case.
2023-09-18 14:51:58 +01:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Benjamin Kramer
3454cf67bd Revert "[MachineLICM] Handle Subloops"
This reverts commit 5ec9699c4d. It
accesses MI after it has been hoisted.
2023-09-15 13:20:31 +02:00
Jingu Kang
5ec9699c4d [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outermost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-09-14 18:07:31 +01:00
Simon Pilgrim
e6b85c3027 [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case (REAPPLIED)
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Reapplied after reversion at e1e3c75c7d with a tweak to the pseudo-probe-peep.ll test

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 12:33:39 +01:00
Simon Pilgrim
e1e3c75c7d Revert rG6c56cf71ee82ec3a28e0dfc2b751bd10c16929da "[DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case"
Need to address a missed test change
2023-09-13 11:27:47 +01:00
Simon Pilgrim
6c56cf71ee [DAG] FoldSetCC - add missing icmp(X,undef) -> isTrueWhenEqual case
Followup to D59363 which failed to handle the icmp(X,undef) -> isTrueWhenEqual case - similar to llvm::ConstantFoldCompareInstruction

As discussed on the review, this is affecting some previously reduced test cases, but will also prevent reductions from relying on this inconsistent behaviour in the future.

Differential Revision: https://reviews.llvm.org/D158068
2023-09-13 11:01:58 +01:00
John Brawn
fae3f9ec4f [ARM] Fix prologue/epilogue for pacbti-m leaf functions
R12 is callee-saved in functions with pacbti-m enabled, but this is
done in assignCalleeSavedSpillSlots, meaning that in
determineCalleeSaves we have to manually set CanEliminateFrame.

This fixes a bug where in leaf functions with no other callee-saved
registers the aut instruction wouldn't be emitted and stack offsets
of arguments passed on the stack would be incorrect.

Differential Revision: https://reviews.llvm.org/D157865
2023-09-04 13:46:01 +01:00
Serguei Katkov
a701b7e368 [CGP] Remove dead PHI nodes before elimination of mostly empty blocks
Before elimination of mostly empty block it makes sense to remove dead PHI nodes.
It open more opportunity for elimination plus eliminates dead code itself.

It appeared that change results in failing many unit tests and some of
them I've updated and for another one I disable this optimization.
The pattern I observed in the tests is that there is a infinite loop
without side effects. As a result after elimination of dead phi node all other
related instruction are also removed and tests stops to check what it is expected.

Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D158503
2023-08-29 04:35:06 +00:00
Noah Goldstein
7c9fe735d4 [ValueTracking] Strengthen analysis in computeKnownBits of phi
Use the comparison based analysis to strengthen the standard
knownbits analysis rather than choosing either/or.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D157807
2023-08-22 10:59:03 -05:00
Nikita Popov
1c6e6432ca [SCEVExpander] Fix incorrect reuse of more poisonous instructions (PR63763)
SCEVExpander tries to reuse existing instruction with the same
SCEV expression. However, doing this replacement blindly is not
safe, because the instruction might be more poisonous.

What we were already doing is to drop poison-generating flags on
the reused instruction. But this is not the only way that more
poison can be introduced. The poison-generating flag might not
be directly on the reused instruction, or the poison contribution
might come from something like 0 * %var, which folds to 0 but can
still introduce poison.

This patch fixes the issue in a principled way, by determining which
values can contribute poison to the SCEV expression, and then
checking whether any additional values can contribute poison to the
instruction being reused. Poison-generating flags are dropped if
doing that enables reuse.

This is a pretty big hammer and does cause some regressions in
tests, but less than I would have expected. I wasn't able to come
up with a less intrusive fix that still satisfies the correctness
requirements.

Fixes https://github.com/llvm/llvm-project/issues/63763.
Fixes https://github.com/llvm/llvm-project/issues/63926.
Fixes https://github.com/llvm/llvm-project/issues/64333.
Fixes https://github.com/llvm/llvm-project/issues/63727.

Differential Revision: https://reviews.llvm.org/D158181
2023-08-22 09:27:07 +02:00
Jay Foad
56d92c1758 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Recommit after fixing AggressiveAntiDepBreaker in D156880.

Differential Revision: https://reviews.llvm.org/D156552
2023-08-07 15:41:40 +01:00
Simon Tatham
60b98363c7 Retain all jump table range checks when using BTI.
This modifies the switch-statement generation in SelectionDAGBuilder,
specifically the part that generates case clusters of type CC_JumpTable.

A table-based branch of any kind is at risk of being a JOP gadget, if
it doesn't range-check the offset into the table. For some types of
table branch, such as Arm TBB/TBH, the impact of this is limited
because the value loaded from the table is a relative offset of
limited size; for others, such as a MOV PC,Rn computed branch into a
table of further branch instructions, the gadget is fully general.

When compiling for branch-target enforcement via Arm's BTI system,
many of these table branch idioms use branch instructions of types
that do not require a BTI instruction at the branch destination. This
avoids the need to put a BTI at the start of each case handler,
reducing the number of available gadgets //with// BTIs (i.e. ones
which could be used by a JOP attack in spite of the BTI system). But
without a range check, the use of a non-BTI-requiring branch also
opens up a larger range of followup gadgets for an attacker's use.

A defence against this is to avoid optimising away the range check on
the table offset, even if the compiler believes that no out-of-range
value should be able to reach the table branch. (Rationale: that may
be true for values generated legitimately by the program, but not
those generated maliciously by attackers who have already corrupted
the control flow.)

The effect of keeping the range check and branching to an unreachable
block is that no actual code is generated at that block, so it will
typically point at the end of the function. That may still cause some
kind of unpredictable code execution (such as executing data as code,
or falling through to the next function in the code section), but even
if so, there will only be //one// possible invalid branch target,
rather than giving an attacker the choice of many possibilities.

This defence is enabled only when branch target enforcement is in use.
Without branch target enforcement, the range check is easily bypassed
anyway, by branching in to a location just after it. But with
enforcement, the attacker will have to enter the jump table dispatcher
at the initial BTI and then go through the range check. (Or, if they
don't, it's because they //already// have a general BTI-bypassing
gadget.)

Reviewed By: MaskRay, chill

Differential Revision: https://reviews.llvm.org/D155485
2023-07-31 10:39:50 +01:00
Jay Foad
e2e3f06813 Revert "[MachineScheduler] Track physical register dependencies per-regunit"
This reverts commit 1a54671d54.

It was causing lit test failures in a LLVM_ENABLE_EXPENSIVE_CHECKS
build.
2023-07-29 18:05:25 +01:00
Jay Foad
1a54671d54 [MachineScheduler] Track physical register dependencies per-regunit
Change the scheduler's physical register dependency tracking from
registers-and-their-aliases to regunits. This has a couple of advantages
when subregisters are used:

- The dependency tracking is more accurate and creates fewer useless
  edges in the dependency graph. An AMDGPU example, edited for clarity:

    SU(0): $vgpr1 = V_MOV_B32 $sgpr0
    SU(1): $vgpr1 = V_ADDC_U32 0, $vgpr1
    SU(2): $vgpr0_vgpr1 = FLAT_LOAD_DWORDX2 $vgpr0_vgpr1, 0, 0

  There is a data dependency on $vgpr1 from SU(0) to SU(1) and from
  SU(1) to SU(2). But the old dependency tracking code also added a
  useless edge from SU(0) to SU(2) because it thought that SU(0)'s def
  of $vgpr1 aliased with SU(2)'s use of $vgpr0_vgpr1.

- On targets like AMDGPU that make heavy use of subregisters, each
  register can have a huge number of aliases - it can be quadratic in
  the size of the largest defined register tuple. There is a much lower
  bound on the number of regunits per register, so iterating over
  regunits is faster than iterating over aliases.

The LLVM compile-time tracker shows a tiny overall improvement of 0.03%
on X86. I expect a larger compile-time improvement on targets like
AMDGPU.

Differential Revision: https://reviews.llvm.org/D156552
2023-07-29 15:34:53 +01:00
Jingu Kang
351b4c17dd Revert "[MachineLICM] Handle Subloops"
This reverts commit 50dd383d08.
2023-07-20 17:12:25 +01:00
Jingu Kang
50dd383d08 [MachineLICM] Handle Subloops
Following discussion on https://reviews.llvm.org/D154205, make MachineLICM pass
handle subloops with only visiting outmost loop's blocks once.

Differential Revision: https://reviews.llvm.org/D154205
2023-07-20 16:39:13 +01:00
John Brawn
1b12b1a335 [ARM] Restructure MOVi32imm expansion to not do pointless instructions
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.

Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.

Differential Revision: https://reviews.llvm.org/D154943
2023-07-19 13:56:36 +01:00
Jingu Kang
62ed3ff4bb Revert "[MachineLICM] Handle Subloops"
This reverts commit 33e60484d7.
2023-07-19 10:30:50 +01:00
John Brawn
343e204a52 [ARM] Replace TransferImpOps with copyImplicitOps
In most places where TransferImpOps is currently used we just have one
machine instruction, so it's doing the same thing as copyImplicitOps
anyway. In those cases where we have more than one machine
instruction the destination is written to in each instruction so any
implicit defs should appear on all of them (and we shouldn't see any
implicit refs as these pseudo-instruction don't have any register
inputs), meaning the current use of TransferImpOps is incorrect and
we should be using copyImplicitOps on all of the generated
instructions.

Differential Revision: https://reviews.llvm.org/D155301
2023-07-18 14:01:04 +01:00
Maurice Heumann
a1cdb323e2 [ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile
occurances.

Recommitting after undefined behavior fix in D155093.

Differential Revision: https://reviews.llvm.org/D153800
2023-07-14 12:54:18 -07:00
Caslyn Tonelli
b11559122e Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions"
This reverts commit 647aff2855.

Differential Revision: https://reviews.llvm.org/D155122
2023-07-12 23:29:15 +00:00
Jingu Kang
33e60484d7 [MachineLICM] Handle Subloops
MachineLICM pass handles inner loops only when outmost loop does not have unique
predecessor. If the loop has preheader and there is loop invariant code, the
invariant code can be hoisted to the preheader in general. This patch makes the
pass handle inner loops in general.

Differential Revision: https://reviews.llvm.org/D154205
2023-07-12 16:32:14 +01:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
John Brawn
647aff2855 [ARM] Restructure MOVi32imm expansion to not do pointless instructions
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.

Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.

Differential Revision: https://reviews.llvm.org/D154943
2023-07-12 11:48:01 +01:00
Nikita Popov
d69033d245 [SCEVExpander] Fix GEP IV inc reuse logic for opaque pointers
Instead of checking the pointer type, check the element type of
the GEP.

Previously we ended up reusing GEP increments that were not in
expanded form, thus not respecting LSRs choice of representation.

The change in 2011-10-06-ReusePhi.ll recovers a regression that
appeared when converting that test to opaque pointers.

Changes in various Thumb tests now compute the step outside the
loop instead of using add.w inside the loop, which is LSR's
preferred representation for this target.
2023-07-12 11:32:13 +02:00
Ties Stuij
f0ae3c23b5 [ARM] in LowerConstantFP, make sure we cover armv6-m execute-only
Currently in LowerConstantFP, when we compile for execute-only (XO) we don't
check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get
here for v6m, so put in an assert.

Reviewed By: simonwallis2, dmgreen

Differential Revision: https://reviews.llvm.org/D154506
2023-07-11 10:42:15 +01:00
David Green
758c4640c9 [CGP] Enable CodeGenPrepares phi type convertion.
This is a recommit of 67121d7, enabling the CodeGenPrepare OptimizePhiTypes
option that can help with the type of phi instructions into ISel.
2023-07-09 10:32:11 +01:00
Yashwant Singh
b7836d8562 [CodeGen]Allow targets to use target specific COPY instructions for live range splitting
Replacing D143754. Right now the LiveRangeSplitting during register allocation uses
TargetOpcode::COPY instruction for splitting. For AMDGPU target that creates a
problem as we have both vector and scalar copies. Vector copies perform a copy over
a vector register but only on the lanes(threads) that are active. This is mostly sufficient
however we do run into cases when we have to copy the entire vector register and
not just active lane data. One major place where we need that is live range splitting.

Allowing targets to use their own copy instructions(if defined) will provide a lot of
flexibility and ease to lower these pseudo instructions to correct MIR.

- Introduce getTargetCopyOpcode() virtual function and use if to generate copy in Live range
 splitting.
- Replace necessary MI.isCopy() checks with TII.isCopyInstr() in register allocator pipeline.

Reviewed By: arsenm, cdevadas, kparzysz

Differential Revision: https://reviews.llvm.org/D150388
2023-07-07 22:29:50 +05:30
David Spickett
ab3bb86d44 Revert "[ARM] Adjust strd/ldrd codegen alignment requirements"
This reverts commit 92a9c30c61.

This has caused a test failure in the 2nd stage of Linaro's
Arm 32 bit buildbots.

LLVM::simplified-template-names.s

            7: error: Simplified template DW_AT_name could not be reconstituted:
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            8:  original: f3<unsigned char, (unsigned char)'\x00'>
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  reconstituted: f3<unsigned char, (unsigned char)'\x7f'>
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I suspect a load/store is slightly off.
2023-07-03 14:05:49 +00:00
Maurice Heumann
92a9c30c61 [ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile
occurances.

Differential Revision: https://reviews.llvm.org/D153800
2023-07-02 14:25:25 -07:00
Nikita Popov
bb3763e497 Revert "[SimplifyCFG] Allow dropping block that only contains ephemeral values"
This reverts commit 20f0c68fd8.

https://reviews.llvm.org/D153966#4464594 reports an optimization
regression in Rust.

Additionally this change has caused an unexpected 0.3% compile-time
regression.
2023-06-30 21:24:05 +02:00
Nikita Popov
20f0c68fd8 [SimplifyCFG] Allow dropping block that only contains ephemeral values
Perform the TryToSimplifyUncondBranchFromEmptyBlock() transform if
the block is empty except for ephemeral values. The ephemeral values
will be dropped in that case.

This makes sure that assumes don't block this transforms, as reported
in https://discourse.llvm.org/t/llvm-assume-blocks-optimization/71609.

Differential Revision: https://reviews.llvm.org/D153966
2023-06-30 15:24:01 +02:00