Commit Graph

4833 Commits

Author SHA1 Message Date
Serge Pavlov
2f81788067 [ARM][FPEnv] Lowering of fpmode intrinsics (#74054)
LLVM intrinsics `get_fpmode`, `set_fpmode` and `reset_fpmode` operate
control modes, the bits of FP environment that affect FP operations. On
ARM these bits are in FPSCR together with the status bits. The
implementation of these intrinsics produces code close to that of
functions `fegetmode` and `fesetmode` from GLIBC.

Pull request: https://github.com/llvm/llvm-project/pull/74054
2023-12-18 18:57:36 +07:00
ostannard
4888218d03 [ARM] Do not emit unwind tables when saving LR around outlined call (#69611)
In some cases, the machine outliner needs to preserve LR across an
outlined call by pushing it onto the stack. Previously, this also
generated unwind table instructions, which is incorrect because EHABI
unwind tables cannot represent different stack frames a different points
in the function, so the extra unwind info applied to the entire
function.

The outliner code already avoided generating CFI instructions, but EHABI
unwind data is generated later from the actual instructions, so we need
to avoid using the FrameSetup and FrameDestroy flags to prevent unwind
data being generated.
2023-12-14 14:46:13 +00:00
Shih-Po Hung
b97c5a9554 [VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
- Replace `undef` pointer operands and add stores to avoid the loads
   being optmized away.
2023-12-14 21:16:11 +08:00
Simon Pilgrim
b7fc78255e Revert rG2047ab00eaf0a17e71ce5e8a5b27a8c90f034c3d "[VPlan] Add a test for testing unused interleave recipes (#75026)"
vplan-unused-interleave-group.ll is causing buildbot failures
2023-12-14 10:25:41 +00:00
Shih-Po Hung
2047ab00ea [VPlan] Add a test for testing unused interleave recipes (#75026)
- Precommit of tests from #71360.
   - Replace `undef` pointer operands and add stores to avoid the loads
      being optmized away.
2023-12-14 17:36:58 +08:00
paperchalice
b0cc42ae0f [CodeGen] Port SjLjEHPrepare to new pass manager (#75023)
`doInitialization` in `SjLjEHPrepare` is trivial.

This is the last pass suffix with `ehprepare`.
2023-12-12 16:07:26 +08:00
Simon Pilgrim
faecc736e2 [DAG] isSplatValue - node is a splat if all demanded elts have the same whole constant value (#74443) 2023-12-08 10:53:51 +00:00
Simon Pilgrim
22df0886a1 [DAG] Don't split f64 constant stores if the fp imm is legal (#74622)
If the target can generate a specific fp immediate constant, then don't split the store into 2 x i32 stores

Another cleanup step for #74304
2023-12-07 10:33:03 +00:00
Simon Pilgrim
609d980b3f [ARM] Regenerate aapcs-hfa-code.ll 2023-12-06 12:09:30 +00:00
Nikita Popov
eecb99c5f6 [Tests] Add disjoint flag to some tests (NFC)
These tests rely on SCEV looking recognizing an "or" with no common
bits as an "add". Add the disjoint flag to relevant or instructions
in preparation for switching SCEV to use the flag instead of the
ValueTracking query. The IR with disjoint flag matches what
InstCombine would produce.
2023-12-05 14:09:36 +01:00
simpal01
74cdb8e6f8 [llvm][ARM] Emit MVE .arch_extension after .fpu directive if it does not include MVE features (#71545)
The floating-point and MVE features together specify the MVE
functionality that is supported on the Cortex-M85 processor. But the FPU
extension for the underlying architecture(armv8.1-m.main) is FPV5 which
does not include MVE-F. So Compiler's -S output and `-save-temps=obj`
loses MVE feature which leads to assembler error. What happening here is
.fpu directive overrides any previously set features by .cpu directive.
Since the the corresponding .fpu generated (.fpu fpv5-d16) does not
include MVE-F, it overrides those features even though it is supported
and set by the .cpu directive. Looks like .fpu is supposed to do this.

In this case, there should be an .arch_extension directive re-enabling
the relevant extensions after .fpu if the goal is to keep these
extensions enabled. GCC also does the same.

So this patch enables the MVE features by emitting the below arch
extension:
  .fpu fpv5-d16
  .arch_extension mve.fp

---------

Co-authored-by: Simi Pallipurath <simi.pallipurath.com>
2023-11-22 09:16:58 +00:00
Serge Pavlov
a2e1de1934 [ARM][FPEnv] Lowering of fpenv intrinsics
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81843
2023-11-20 15:08:25 +07:00
Tavian Barnes
75cf672b12 [SDAG] Simplify is-power-of-2 codegen (#72275)
When x is not known to be nonzero, ctpop(x) == 1 is expanded to

    x != 0 && (x & (x - 1)) == 0

resulting in codegen like

    leal    -1(%rdi), %eax
    testl   %eax, %edi
    sete    %cl
    testl   %edi, %edi
    setne   %al
    andb    %cl, %al

But another expression that works is

    (x ^ (x - 1)) > x - 1

which has nicer codegen:

    leal    -1(%rdi), %eax
    xorl    %eax, %edi
    cmpl    %eax, %edi
    seta    %al
2023-11-15 22:26:34 +09:00
Serge Pavlov
5b0f703918 Revert "[ARM][FPEnv] Lowering of fpenv intrinsics"
This reverts commit d62f040418.
Some cuda buildbots start failing.
2023-11-10 16:24:51 +07:00
Serge Pavlov
d62f040418 [ARM][FPEnv] Lowering of fpenv intrinsics
The change implements lowering of `get_fpenv`, `set_fpenv` and
`reset_fpenv`.

Differential Revision: https://reviews.llvm.org/D81843
2023-11-10 16:06:33 +07:00
Nikita Popov
e4a4122eb6 [IR] Remove zext and sext constant expressions (#71040)
Remove support for zext and sext constant expressions. All places
creating them have been removed beforehand, so this just removes the
APIs and uses of these constant expressions in tests.

There is some additional cleanup that can be done on top of this, e.g.
we can remove the ZExtInst vs ZExtOperator footgun.

This is part of
https://discourse.llvm.org/t/rfc-remove-most-constant-expressions/63179.
2023-11-03 10:46:07 +01:00
Tobias Stadler
373c343a77 Reland: [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
Reland 3686a0b after fixing an exposed miscompile in #68840

Differential Revision: https://reviews.llvm.org/D159140
2023-11-02 00:18:19 +01:00
Fangrui Song
5888dee7d0 [ARM,ELF] Fix access to dso_preemptable __stack_chk_guard with static relocation model (#70014)
The ELF code from https://reviews.llvm.org/D112811 emits LDRLIT_ga_pcrel
when `TM.isPositionIndependent()` but uses a different condition
`Subtarget.isGVIndirectSymbol(GV)` (aka dso_preemptable on ELF targets).
This would cause incorrect access for dso_preemptable
`__stack_chk_guard` with the static relocation model.

Regarding whether `__stack_chk_guard` gets the dso_local specifier,
https://reviews.llvm.org/D150841 switched to
`M.getDirectAccessExternalData()` (implied by "PIC Level") instead of
`TM.getRelocationModel() == Reloc::Static`.

The result is that when non-zero "PIC Level" is used with static
relocation model (e.g. -fPIE/-fPIC LTO compiles with -no-pie linking),
`__stack_chk_guard` accesses are incorrect.
```
        ldr     r0, .LCPI0_0
        ldr     r0, [r0]
        ldr     r0, [r0]   // incorrectly dereferences __stack_chk_guard
...
.LCPI0_0:
        .long   __stack_chk_guard
```

To fix this, for dso_preemptable `__stack_chk_guard`, emit a GOT PIC
code sequence like for -fpic using `LDRLIT_ga_pcrel`:
```
        ldr     r0, .LCPI0_0
.LPC0_0:
        add     r0, pc, r0
        ldr     r0, [r0]
        ldr     r0, [r0]
...
LCPI0_0:
.Ltmp0:
        .long   __stack_chk_guard(GOT_PREL)-((.LPC0_0+8)-.Ltmp0)
```

Technically, `LDRLIT_ga_abs` with `R_ARM_GOT_ABS` could be used, but
`R_ARM_GOT_ABS` does not have GNU or integrated assembler support.
(Note, `.LCPI0_0: .long __stack_chk_guard@GOT` produces an
`R_ARM_GOT_BREL`, which is not desired).

This patch fixes #6499 while not changing behavior for the following
configurations:
```
run arm.linux.nopic --target=arm-linux-gnueabi -fno-pic
run arm.linux.pie --target=arm-linux-gnueabi -fpie
run arm.linux.pic --target=arm-linux-gnueabi -fpic
run armv6.darwin.nopic --target=armv6-apple-darwin -fno-pic
run armv6.darwin.dynamicnopic --target=armv6-apple-darwin -mdynamic-no-pic
run armv6.darwin.pic --target=armv6-apple-darwin -fpic
run armv7.darwin.nopic --target=armv7-apple-darwin -mcpu=cortex-a8 -fno-pic
run armv7.darwin.dynamicnopic --target=armv7-apple-darwin -mcpu=cortex-a8 -mdynamic-no-pic
run armv7.darwin.pic --target=armv7-apple-darwin -mcpu=cortex-a8 -fpic
run arm64.darwin.pic --target=arm64-apple-darwin
```
2023-10-31 15:37:26 -07:00
Fangrui Song
6ae7b735db [ARM][test] Improve stack-protector tests
llvm/test/LTO/ARM/ssp-static-reloc.ll is more about using the static
relocation model with "PIC Level" and unrelated to the LTO infrastructure.
Move the test. Update stack_guard_remat.ll to clearly test "PIC Level"
with the relevant relocation models.
2023-10-31 15:30:08 -07:00
XChy
fc6bdb8549 [SimplifyCFG] Reland transform for redirecting phis between unmergeable BB and SuccBB (#68473)
Reland #67275 with #68953 resolved.
2023-10-28 17:10:20 +08:00
Matthias Braun
e3cf80c5c1 BlockFrequencyInfoImpl: Avoid big numbers, increase precision for small spreads
BlockFrequencyInfo calculates block frequencies as Scaled64 numbers but as a last step converts them to unsigned 64bit integers (`BlockFrequency`). This improves the factors picked for this conversion so that:

* Avoid big numbers close to UINT64_MAX to avoid users overflowing/saturating when adding multiply frequencies together or when multiplying with integers. This leaves the topmost 10 bits unused to allow for some room.
* Spread the difference between hottest/coldest block as much as possible to increase precision.
* If the hot/cold spread cannot be represented loose precision at the lower end, but keep the frequencies at the upper end for hot blocks differentiable.
2023-10-24 20:27:39 -07:00
David Green
8a701024f3 [ARM] Lower i1 concat via MVETRUNC
The MVETRUNC operation can perform the same truncate of two vectors, without
requiring lane inserts/extracts from every vector lane. This moves the concat
i1 lowering to use it for v8i1 and v16i1 result types, trading a bit of extra
stack space for less instructions.
2023-10-18 19:40:11 +01:00
Nikita Popov
a72d88fb4f Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 8840da2db2.

This results in verifier failures during LTO, see #68929.
2023-10-16 12:17:24 +02:00
weiguozhi
b6043f9867 [RA] Disable split around hint register if optimize for size (#68619)
Split a virtual register with hint may generate COPY instructions in
multiple cold basic blocks, and increase code size. So disable this
split when the function is optimized for size.
2023-10-11 14:57:15 -07:00
Nikita Popov
8840da2db2 Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size
Reapply now that generation of incorrect debuginfo for FnDef
in rustc has been fixed.

-----

Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-10-09 14:22:12 +02:00
Jay Foad
7b3bbd83c0 Revert "[CodeGen] Really renumber slot indexes before register allocation (#67038)"
This reverts commit 2501ae58e3.

Reverted due to various buildbot failures.
2023-10-09 12:31:32 +01:00
Jay Foad
2501ae58e3 [CodeGen] Really renumber slot indexes before register allocation (#67038)
PR #66334 tried to renumber slot indexes before register allocation, but
the numbering was still affected by list entries for instructions which
had been erased. Fix this to make the register allocator's live range
length heuristics even less dependent on the history of how instructions
have been added to and removed from SlotIndexes's maps.
2023-10-09 11:44:41 +01:00
Fangrui Song
d20190e684 [test] Change llc -march=aarch64|arm64 to -mtriple=aarch64|arm64
Similar to commit 806761a762 to avoid issues due
to object file format differences. These tests are currently benign.
2023-09-29 10:13:06 -07:00
Tobias Stadler
305fbc1b32 Revert "[GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND"
This reverts commit 3686a0b611.
This seems to have broken some sanitizer tests:
https://lab.llvm.org/buildbot/#/builders/184/builds/7721
2023-09-29 03:35:40 +02:00
Tobias Stadler
3686a0b611 [GlobalISel] LegalizationArtifactCombiner: Elide redundant G_AND
The legalizer currently generates lots of G_AND artifacts.
For example between boolean uses and defs there is always a G_AND with a mask of 1, but when the target uses ZeroOrOneBooleanContents, this is unnecessary.
Currently these artifacts have to be removed using post-legalize combines.
Omitting these artifacts at their source in the artifact combiner has a few advantages:
- We know that the emitted G_AND is very likely to be useless, so our KnownBits call is likely worth it.
- The G_AND and G_CONSTANT can interrupt e.g. G_UADDE/... sequences generated during legalization of wide adds which makes it harder to detect these sequences in the instruction selector (e.g. useful to prevent unnecessary reloading of AArch64 NZCV register).
- This cleans up a lot of legalizer output and even improves compilation-times.
AArch64 CTMark geomean: `O0` -5.6% size..text; `O0` and `O3` ~-0.9% compilation-time (instruction count).

Since this introduces KnownBits into code-paths used by `O0`, I reduced the default recursion depth.
This doesn't seem to make a difference in CTMark, but should prevent excessive recursive calls in the worst case.

Reviewed By: aemerson

Differential Revision: https://reviews.llvm.org/D159140
2023-09-29 02:11:57 +02:00
Jay Foad
fb32baf0ec [ARM] Make some test checks more robust
This makes some tests robust against minor codegen differences
that will be caused by PR #67038.
2023-09-28 14:26:13 +01:00
Douglas Yung
6716d3dd77 Move test split-deadloop.mir that was added in e3d714f to AArch64 directory instead of ARM. 2023-09-26 09:51:47 -07:00
weiguozhi
31f81e96a4 [RA] Don't split a register generated from another split (#67351)
Split a register generated from another split usually doesn't bring us
too much benefit. It may also cause dead loop as pr67188 shows if the
heuristic cost always satisfy the split condition. So prevent such
splitting.

It fixed pr67188.
2023-09-26 08:38:18 -07:00
Muhammad Omair Javaid
431969ede1 Revert "[SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275)"
This reverts commit fc86d031fe.

This change breaks LLVM buildbot clang-aarch64-sve-vls-2stage
https://lab.llvm.org/buildbot/#/builders/176/builds/5474
I am going to revert this patch as the bot has been failing for more than a day without a fix.
2023-09-26 15:47:16 +05:00
XChy
fc86d031fe [SimplifyCFG] Transform for redirecting phis between unmergeable BB and SuccBB (#67275)
This patch extends function TryToSimplifyUncondBranchFromEmptyBlock to
handle the similar cases below.

```llvm
define i8 @src(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %case012
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %end
  ]

unreachable:
  unreachable

case1:
  br label %case012

case2:
  br label %case012

case012:
  %phi1 = phi i8 [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  br label %end

end:
  %phi2 = phi i8 [ %phi1, %case012 ], [ 4, %start ]
  ret i8 %phi2
}
```
The phis here should be merged into one phi, so that we can better
optimize it:

```llvm
define i8 @tgt(i8 noundef %arg) {
start:
  switch i8 %arg, label %unreachable [
    i8 0, label %end
    i8 1, label %case1
    i8 2, label %case2
    i8 3, label %case3
  ]

unreachable:
  unreachable

case1:
  br label %end

case2:
  br label %end

case3:
  br label %end

end:
  %phi = phi i8 [ 4, %case3 ], [ 3, %case2 ], [ 2, %case1 ], [ 1, %start ]
  ret i8 %phi
}
```
Proof:
[normal](https://alive2.llvm.org/ce/z/vAWi88)
[multiple stages](https://alive2.llvm.org/ce/z/DDBQqp)
[multiple stages 2](https://alive2.llvm.org/ce/z/nGkeqN)
[multiple phi combinations](https://alive2.llvm.org/ce/z/VQeEdp)

And lookup table optimization should convert it into add %arg 1.
This patch just match similar CFG structure and merge the phis in
different cases.

Maybe such transform can be applied to other situations besides switch,
but I'm not sure whether it's better than not merging. Therefore, I only
try it in switch,

Related issue:
#63876

[Migrated](https://reviews.llvm.org/D155940)
2023-09-25 10:13:45 +08:00
Matt Harding
64d1ceaa38 Add command line option --no-trap-after-noreturn (#67051)
Add the command line option --no-trap-after-noreturn, which exposes the
pre-existing TargetOption `NoTrapAfterNoreturn`.

This pull request was split off from this one:
https://github.com/llvm/llvm-project/pull/65876
2023-09-22 22:03:21 +02:00
Jon Roelofs
83e6d2edfc Revert "[ARM] Always lower direct calls as direct when the outliner is enabled (#66434)"
This reverts commit 003bcad9a8.

ARM folks say it regresses some of their benchmarks:
https://github.com/llvm/llvm-project/pull/66434#issuecomment-1722424162
2023-09-18 09:45:46 -07:00
Nikita Popov
38c59b9f53 Revert "Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 47324cfd7d.

This exposed incorrect debuginfo in rustc. Revert the verification
until this has been fixed.
2023-09-18 17:24:53 +02:00
Guozhi Wei
cbdccb30c2 [RA] Split a virtual register in cold blocks if it is not assigned preferred physical register
If a virtual register is not assigned preferred physical register, it means some
COPY instructions will be changed to real register move instructions. In this
case we can try to split the virtual register in colder blocks, if success, the
original COPY instructions can be deleted, and the new COPY instructions in
colder blocks will be generated as register move instructions. It results in
fewer dynamic register move instructions executed.

The new test case split-reg-with-hint.ll gives an example, the hot path contains
24 instructions without this patch, now it is only 4 instructions with this
patch.

Differential Revision: https://reviews.llvm.org/D156491
2023-09-15 19:52:50 +00:00
Jon Roelofs
003bcad9a8 [ARM] Always lower direct calls as direct when the outliner is enabled (#66434)
The indirect lowering hinders the outliner's ability to see that
sequences are in fact common, since the sequence similarity is rendered
opaque by the register callee. The size savings from making them
indirect seems to be dwarfed by the outliner's savings from
de-duplication.

rdar://115178034
rdar://115459865
2023-09-15 10:04:56 -07:00
Nikita Popov
47324cfd7d Reapply [Verifier] Sanity check alloca size against DILocalVariable fragment size
Reapply after fixing a clang bug this exposed in D158972 and
adjusting a number of tests that failed for 32-bit targets.

-----

Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-09-15 14:51:50 +02:00
Allen
347b3f1209 [ARM][ISel] Fix crash of ISD::FMINNUM/FMAXNUM (#65849)
The instruction of ISD::FMINNUM/FMAXNUM should be legal if HasFPARMv8 &&
HasNEON.
For the combination of armv7+fp-armv8, armv7 imply the feature HasNEON
on, and fp-armv8 matchs the feature HasFPARMv8, so it is legal

Fixes https://github.com/llvm/llvm-project/issues/65820
2023-09-14 10:35:07 +08:00
David Green
a82c106e57 [ARM] Change CRC predicate to just HasCRC
This removes the backend requirement for crc instructions on HasV8, relying on
just HasCRC instead. This should allow them to be selected with ArmV7 + crc,
making them more usable whilst hopefully not making them incorrectly generated
(they only come from intrinsics, and HasCRC usually requires HasV8). This is
how most other instructions are specified.
2023-09-08 09:02:15 +01:00
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Nikita Popov
98cf20f890 Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 183f49c3e0.

The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on
buildbots.
2023-08-28 09:44:51 +02:00
Nikita Popov
183f49c3e0 [Verifier] Sanity check alloca size against DILocalVariable fragment size
Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-08-28 09:16:33 +02:00
Oliver Stannard
40614e1c14 [ARM] Save and restore CPSR around tMOVimm32
When resolving a frame index with a large offset for v6M execute-only,
we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a
sequence of instructions, all of which are flag-setting. However, a
frame index may be generated for a register spill or reload instruction,
which can be inserted at a point where CPSR is live. This patch inserts
MRS and MSR instructions around the tMOVimm32 to save and restore the
value of CPSR, if CPSR is live at that point.

This may need up to two virtual registers (one to build the immediate
value, one to save CPSR) during frame index lowering, which happens
after register allocation, so we need to ensure two spill slots are
avilable to the register scavenger to ensure it can free up enough
registers for this.

There is no test for the emission (or not) of the MRS/MSR pair, because
it requires a spill or reload to be inserted at a point where CPSR is
live, which requires a large, complex function and is fragile enough
that any optimisation changes will break the test. This bug was easily
found by csmith with -verify-machineinstrs, which I now run regularly on
v6M execute-only (and many other combinations).

Patch by John Brawn and myself.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D158404
2023-08-24 14:15:02 +01:00
Nikita Popov
69bd66b3ce [Tests] Remove some and/or constant expressions in tests (NFC)
In preparation for their removal in D158081.
2023-08-21 12:05:32 +02:00
Keith Walker
2d9c6e699a [Thumb1] Use callee-saved register to adjust stack pointer
When adjusting the Stack Pointer at the end of the function epilogue,
use a callee-saved register, rather than explicitly using R4 which may
not have been saved.

Differential Revision: https://reviews.llvm.org/D157500
2023-08-17 18:29:50 +01:00
Nicholas Guy
d65feccb12 [ARM] Set preferred function alignment
Aligning functions yields small performance gains on
embedded cores, moreso with numerous small function calls.
Similar to aligning loops, if the function can fit within
a single cache line then the performance overhead of
fetching more instructions can be limited.

Differential Revision: https://reviews.llvm.org/D157514
2023-08-16 17:31:21 +01:00