Commit Graph

4790 Commits

Author SHA1 Message Date
Matt Arsenault
b14e83d1a4 IR: Add llvm.exp10 intrinsic
We currently have log, log2, log10, exp and exp2 intrinsics. Add exp10
to fix this asymmetry. AMDGPU already has most of the code for f32
exp10 expansion implemented alongside exp, so the current
implementation is duplicating nearly identical effort between the
compiler and library which is inconvenient.

https://reviews.llvm.org/D157871
2023-09-01 19:45:03 -04:00
Nikita Popov
98cf20f890 Revert "[Verifier] Sanity check alloca size against DILocalVariable fragment size"
This reverts commit 183f49c3e0.

The lang/cpp/trivial_abi/TestTrivialABI.py lldb test fails on
buildbots.
2023-08-28 09:44:51 +02:00
Nikita Popov
183f49c3e0 [Verifier] Sanity check alloca size against DILocalVariable fragment size
Add a check that the DILocalVariable fragment size in dbg.declare
does not exceed the size of the alloca.

This would have caught the invalid debuginfo regenerated by rustc
in https://github.com/llvm/llvm-project/issues/64149.

Differential Revision: https://reviews.llvm.org/D158743
2023-08-28 09:16:33 +02:00
Oliver Stannard
40614e1c14 [ARM] Save and restore CPSR around tMOVimm32
When resolving a frame index with a large offset for v6M execute-only,
we emit a tMOVimm32 pseudo-instruction, which later gets lowered to a
sequence of instructions, all of which are flag-setting. However, a
frame index may be generated for a register spill or reload instruction,
which can be inserted at a point where CPSR is live. This patch inserts
MRS and MSR instructions around the tMOVimm32 to save and restore the
value of CPSR, if CPSR is live at that point.

This may need up to two virtual registers (one to build the immediate
value, one to save CPSR) during frame index lowering, which happens
after register allocation, so we need to ensure two spill slots are
avilable to the register scavenger to ensure it can free up enough
registers for this.

There is no test for the emission (or not) of the MRS/MSR pair, because
it requires a spill or reload to be inserted at a point where CPSR is
live, which requires a large, complex function and is fragile enough
that any optimisation changes will break the test. This bug was easily
found by csmith with -verify-machineinstrs, which I now run regularly on
v6M execute-only (and many other combinations).

Patch by John Brawn and myself.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D158404
2023-08-24 14:15:02 +01:00
Nikita Popov
69bd66b3ce [Tests] Remove some and/or constant expressions in tests (NFC)
In preparation for their removal in D158081.
2023-08-21 12:05:32 +02:00
Keith Walker
2d9c6e699a [Thumb1] Use callee-saved register to adjust stack pointer
When adjusting the Stack Pointer at the end of the function epilogue,
use a callee-saved register, rather than explicitly using R4 which may
not have been saved.

Differential Revision: https://reviews.llvm.org/D157500
2023-08-17 18:29:50 +01:00
Nicholas Guy
d65feccb12 [ARM] Set preferred function alignment
Aligning functions yields small performance gains on
embedded cores, moreso with numerous small function calls.
Similar to aligning loops, if the function can fit within
a single cache line then the performance overhead of
fetching more instructions can be limited.

Differential Revision: https://reviews.llvm.org/D157514
2023-08-16 17:31:21 +01:00
Matt Arsenault
c8cac15613 PreISelIntrinsicLowering: Check RuntimeLibcalls instead of TLI for memory functions
We need a better mechanism for expressing which calls you are allowed
to emit and which calls are recognized. This should be applied to the
17 branch.
2023-08-10 16:40:04 -04:00
John Brawn
f83ab2b3be [ARM] Improve generation of thumb stack accesses
Currently when a stack access is out of range of an sp-relative ldr or
str then we jump straight to generating the offset with a literal pool
load or mov32 pseudo-instruction. This patch improves that in two
ways:
 * If the offset is within range of sp-relative add plus an ldr then
   use that.
 * When we use the mov32 pseudo-instruction, if putting part of the
   offset into the ldr will simplify the expansion of the mov32 then
   do so.

Differential Revision: https://reviews.llvm.org/D156875
2023-08-07 17:53:32 +01:00
Francesco Petrogalli
cd921e0fd7 [MISched] Do not erase resource booking history for subunits.
When dealing with the subunits of a resource group, we should reset
the subunits availability at the first avaiable cycle of the resource
that contains the subunits. Previously, the reset operation was
returning cycle 0, effectively erasing the booking history of the
subunits.

Without this change, when using intervals for models have make use of
subunits, the erasing of resource booking for subunits can raise the
assertion "A resource is being overwritten" in
`ResourceSegments::add`. The test added in the patch is one of such
cases.

Reviewed By: andreadb

Differential Revision: https://reviews.llvm.org/D156530
2023-08-01 14:00:37 +02:00
John Brawn
8336d38be9 [ARM] Correctly handle combining segmented stacks with execute-only
Using segmented stacks with execute-only mostly works, but we need to
use the correct movi32 opcode in 6-M, and there's one place where for
thumb1 (i.e. 6-M and 8-M.base) a constant pool was unconditionally
used which needed to be fixed.

Differential Revision: https://reviews.llvm.org/D156339
2023-07-28 10:37:40 +01:00
Fangrui Song
845d83d85f [test] Add --show-all-symbols to some llvm-objdump -d commands
llvm-objdump -d will be changed to not display mapping symbols by
default (D156190).
Add --show-all-symbols to make the intent clearer and prevent test
adjustment with the new behavior.
2023-07-27 19:33:51 -07:00
Jay Foad
2dcf051259 [CodeGen] Store call frame size in MachineBasicBlock
Record the call frame size on entry to each basic block. This is usually
zero except when a basic block has been split in the middle of a call
sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D156113
2023-07-27 10:32:00 +01:00
Jay Foad
6c8f4472b4 [ARM] Extend regression test for D154281
Add a test case with a larger call frame which does not satisfy
ARMFrameLowering::hasReservedCallFrame.
2023-07-21 15:48:45 +01:00
Momchil Velikov
4c95f79cce [CodeGenPrepare] Refactor optimizeSelectInst (NFC)
Refactor to use BasicBlockUtils functions and make life easier for
a subsequent patch for updating the dominator tree.

Reviewed By: dmgreen

Differential Revision: https://reviews.llvm.org/D154053
2023-07-19 18:56:44 +01:00
John Brawn
cee7e7b245 [ARM] Correctly handle execute-only in EmitStructByval
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.

Differential Revision: https://reviews.llvm.org/D154944
2023-07-19 13:56:36 +01:00
John Brawn
1b12b1a335 [ARM] Restructure MOVi32imm expansion to not do pointless instructions
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.

Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.

Differential Revision: https://reviews.llvm.org/D154943
2023-07-19 13:56:36 +01:00
Jay Foad
496766840f [ARM] Add a regression test for D154281
This is a reduced version of one of the tests that was broken by the
original commit of D154281 "[CodeGen] Store SP adjustment in
MachineBasicBlock. NFCI.".

Differential Revision: https://reviews.llvm.org/D155471
2023-07-19 10:32:21 +01:00
John Brawn
343e204a52 [ARM] Replace TransferImpOps with copyImplicitOps
In most places where TransferImpOps is currently used we just have one
machine instruction, so it's doing the same thing as copyImplicitOps
anyway. In those cases where we have more than one machine
instruction the destination is written to in each instruction so any
implicit defs should appear on all of them (and we shouldn't see any
implicit refs as these pseudo-instruction don't have any register
inputs), meaning the current use of TransferImpOps is incorrect and
we should be using copyImplicitOps on all of the generated
instructions.

Differential Revision: https://reviews.llvm.org/D155301
2023-07-18 14:01:04 +01:00
Maurice Heumann
a1cdb323e2 [ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile
occurances.

Recommitting after undefined behavior fix in D155093.

Differential Revision: https://reviews.llvm.org/D153800
2023-07-14 12:54:18 -07:00
Oliver Stannard
aea8db8eb9 Revert "[CodeGen] Store SP adjustment in MachineBasicBlock. NFCI."
This reverts commit 58d1eaa3b6.
2023-07-13 14:25:39 +01:00
Caslyn Tonelli
b11559122e Revert "[ARM] Restructure MOVi32imm expansion to not do pointless instructions"
This reverts commit 647aff2855.

Differential Revision: https://reviews.llvm.org/D155122
2023-07-12 23:29:15 +00:00
Jay Foad
58d1eaa3b6 [CodeGen] Store SP adjustment in MachineBasicBlock. NFCI.
Record the SP adjustment on entry to each basic block. This is almost
always zero except on targets like ARM which can split a basic block in
the middle of a call sequence.

This simplifies PEI::replaceFrameIndices which previously had to visit
basic blocks in a specific order and had special handling for
unreachable blocks. More importantly it paves the way for an equally
simple implementation of a backwards version of replaceFrameIndices,
which is required to fully convert PrologEpilogInserter to backwards
register scavenging, which is preferred because it does not rely on
accurate kill flags.

Differential Revision: https://reviews.llvm.org/D154281
2023-07-12 14:29:26 +01:00
Nikita Popov
edb2fc6dab [llvm] Remove explicit -opaque-pointers flag from tests (NFC)
Opaque pointers mode is enabled by default, no need to explicitly
enable it.
2023-07-12 14:35:55 +02:00
John Brawn
210f61cbdd [ARM] Correctly handle execute-only in EmitStructByval
Currently when compiling for an execute-only target without movt then
EmitStructByval will generate a constant pool load which isn't
compatible with execute-only. Handle this by emitting tMOVi32imm,
and also simplify the existing movt handling by emitting t2MOVi32imm
or MOVi32imm.

Differential Revision: https://reviews.llvm.org/D154944
2023-07-12 11:48:01 +01:00
John Brawn
647aff2855 [ARM] Restructure MOVi32imm expansion to not do pointless instructions
The expansion of the various MOVi32imm pseudo-instructions works by
splitting the operand into components (either halfwords or bytes) and
emitting instructions to combine those components into the final
result. When the operand is an immediate with some components being
zero this can result in pointless instructions that just add zero.

Avoid this by restructuring things so that a separate function handles
splitting the operand into components, then don't emit the component
if it is a zero immediate. This is straightforward for movw/movt,
where we just don't emit the movt if it's zero, but the thumb1
expansion using mov/add/lsl is more complex, as even when we don't
emit a given byte we still need to get the shift correct.

Differential Revision: https://reviews.llvm.org/D154943
2023-07-12 11:48:01 +01:00
Simon Wallis
82458ce69e [ARM] mark tMOVi32imm as killing flags
Mark the tMOVi32imm pseudo instr as killing the flags register.

The pseudo instruction expands to a sequence of 7 movs/lsls/adds
instructions, which are all Thumb-1 flag setting instructions.

For a test case, take an existing arm test which checks for
"Don't CSE a cmp across a call that clobbers CPSR."
and retarget it at thumbv6m execute-only.

Reviewed By: stuij

Differential Revision: https://reviews.llvm.org/D154845

Change-Id: I8f8209fbc40a833f8875629937b9606c1e2c021d
2023-07-11 14:42:07 +01:00
Ties Stuij
f0ae3c23b5 [ARM] in LowerConstantFP, make sure we cover armv6-m execute-only
Currently in LowerConstantFP, when we compile for execute-only (XO) we don't
check what architecture we're compiling for (v6m=< or >v6m). We shouldn't get
here for v6m, so put in an assert.

Reviewed By: simonwallis2, dmgreen

Differential Revision: https://reviews.llvm.org/D154506
2023-07-11 10:42:15 +01:00
Ties Stuij
d145abcfb3 [ARM] fix typo in large-stack.ll introduced when fixing another typo 2023-07-04 11:23:24 +01:00
Ties Stuij
61bcaae7ab [ARM] fix typo in large-stack.ll test
In llvm/test/CodeGen/ARM/large-stack.ll, the C in FileCheck wasn't
uppercased. This wasn't spotted in development as MacOS's HFS+ fs is apparently
often configured case-insensitive.
2023-07-04 11:18:25 +01:00
Ties Stuij
112d769e5e [ARM] generate correct code for armv6-m XO big stack operations
The ARM backend codebase is dotted with places where armv6-m will generate
constant pools. Now that we can generate execute-only code for armv6-m, we need
to make sure we use the movs/lsls/adds/lsls/adds/lsls/adds pattern instead of
these.

Big stacks is one of the obvious places. In this patch we take care of two
sites:
1. take care of big stacks in prologue/epilogue
2. take care of save/tSTRspi nodes, which implicitly fixes
   emitThumbRegPlusImmInReg which is used in several frame lowering fns

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D154233
2023-07-04 10:40:06 +01:00
David Spickett
ab3bb86d44 Revert "[ARM] Adjust strd/ldrd codegen alignment requirements"
This reverts commit 92a9c30c61.

This has caused a test failure in the 2nd stage of Linaro's
Arm 32 bit buildbots.

LLVM::simplified-template-names.s

            7: error: Simplified template DW_AT_name could not be reconstituted:
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            8:  original: f3<unsigned char, (unsigned char)'\x00'>
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            9:  reconstituted: f3<unsigned char, (unsigned char)'\x7f'>
check:10'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I suspect a load/store is slightly off.
2023-07-03 14:05:49 +00:00
Maurice Heumann
92a9c30c61 [ARM] Adjust strd/ldrd codegen alignment requirements
In change https://reviews.llvm.org/D152790, it was discovered that the
alignment requirement calculation for LDRD/STRD codegen was suboptimal
and the calculation for volatile loads and stores was adjusted.

This change here adopts the calculation for the remaining non-volatile
occurances.

Differential Revision: https://reviews.llvm.org/D153800
2023-07-02 14:25:25 -07:00
Fangrui Song
afd20587f9 MachineFunction: -fsanitize={function,kcfi}: ensure 4-byte alignment
Fix https://github.com/llvm/llvm-project/issues/63579
```
% cat a.c
void foo() {}
% clang --target=arm-none-eabi -mthumb -mno-unaligned-access -fsanitize=kcfi a.c -S -o - | grep p2align
        .p2align        1
% clang --target=armv6m-none-eabi -fsanitize=function a.c -S -o - | grep p2align
        .p2align        1
```

Ensure that -fsanitize={function,kcfi} instrumented functions are aligned by at
least 4, so that loading the type hash before the function label will not cause
a misaligned access. This is especially important for -mno-unaligned-access
configurations that don't set `setMinFunctionAlignment` to 4 or greater.

With this patch, the generated assembly for the examples above will contain `.p2align 2`
before the type hash.

If `__attribute__((aligned(N)))` or `-falign-functions=N` is specified, the
larger alignment will be used.

Reviewed By: simon_tatham, samitolvanen

Differential Revision: https://reviews.llvm.org/D154125
2023-06-30 09:13:19 -07:00
Matt Arsenault
160d7227e0 DAG: Fix libcall expansion for frexp on ARM
The ExpandLibcallResult result was a bitcast and not the direct call
result, so we couldn't find the chain. Use the new separate chain
return value instead.
2023-06-30 09:03:45 -04:00
Luke Lau
742fb8b5c7 [DAGCombine] Fold (store (insert_elt (load p)) x p) -> (store x)
If we have a store of a load with no other uses in between it, it's
considered dead and is removed. So sometimes when legalizing a fixed
length vector store of an insert, we end up producing better code
through scalarization than without.
An example is the follow below:

  %a = load <4 x i64>, ptr %x
  %b = insertelement <4 x i64> %a, i64 %y, i32 2
  store <4 x i64> %b, ptr %x

If this is scalarized, then DAGCombine successfully removes 3 of the 4
stores which are considered dead, and on RISC-V we get:

  sd a1, 16(a0)

However if we make the vector type legal (-mattr=+v), then we lose the
optimisation because we don't scalarize it.

This patch attempts to recover the optimisation for vectors by
identifying patterns where we store a load with a single insert
inbetween, replacing it with a scalar store of the inserted element.

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D152276
2023-06-28 22:45:04 +01:00
John Brawn
4fb0e0114f [ARM] Generate out-of-line jump tables for XO without 32-bit branch
When we only have a 16-bit pc-relative branch instruction we generate
a table of address for a jump table. Currently this is placed inline,
but this won't work with execute-only memory. In this case generate
the jump table out-of-line.

Differential Revision: https://reviews.llvm.org/D153774
2023-06-28 13:30:39 +01:00
Ties Stuij
4f19c6a7c7 [ARM] allow long-call codegen for armv6-M eXecute Only (XO)
Recently eXecute Only (XO) codegen was also allowed for armv6-M. Previously this
was only implemented for ~armv7+, effectively if MOVW/MOVT is
available. Regarding long calls, we remove the check for MOVW/MOVT when
generating code for XO, which already was redundant as in the subtarget
initialization we already check if XO is valid for the target. And targets that
generate valid XO code should be able to handle the (wrapper globaladdress)
node.

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D153782
2023-06-28 10:50:24 +01:00
Ties Stuij
03db28edbb [ARM] in ExpandTMOV32BitImm, CPSR register ops should be Defined
The CPSR registers ops of the instructions constructed in ExpandTMOV32BitImm
were marked as kill, instead of define. Best to use the pre-existing
t1CondCodeOp fn to construct CPSRs.

Reviewed By: simonwallis2

Differential Revision: https://reviews.llvm.org/D153763
2023-06-27 14:58:22 +01:00
Matthias Braun
02ba5b8c6b Ignore load/store until stack address computation
No longer conservatively assume a load/store accesses the stack when we
can prove that we did not compute any stack-relative address up to this
point in the program.

We do this in a cheap not-quite-a-dataflow-analysis: Assume
`NoStackAddressUsed` when all predecessors of a block already guarantee
it. Process blocks in reverse post order to guarantee that except for
loop headers we have processed all predecessors of a block before
processing the block itself. For loops we accept the conservative answer
as they are unlikely to be shrink-wrappable anyway.

Differential Revision: https://reviews.llvm.org/D152213
2023-06-26 13:50:36 -07:00
Matthias Braun
759b217626 Switch tests to use update_llc_test_checks
Switch and update some tests to use `update_llc_test_checks` to reduce
clutter in upcoming change.

Differential Revision: https://reviews.llvm.org/D152215
2023-06-26 13:50:36 -07:00
Maurice Heumann
249bd9eab0 [ARM] Fix codegen of unaligned volatile load/store of i64
Volatile loads/stores of i64 are lowered to LDRD/STRD on ARMv5TE.
However, these instructions require the addresses to be aligned.
Unaligned loads/stores therefore should be ignored by this handling.

Differential Revision: https://reviews.llvm.org/D152790
2023-06-26 10:45:41 -07:00
Eli Friedman
bc7f11ccb0 [SelectionDAG] Improve expansion of wide min/max
The current implementation tries to handle the high and low halves
separately, but that's less efficient in most cases; use a wide SETCC
instead.

Differential Revision: https://reviews.llvm.org/D151358
2023-06-26 10:45:41 -07:00
Amaury Séchet
8412a17b79 [NFC] Autogenerate CodeGen/ARM/2013-07-29-vector-or-combine.ll 2023-06-25 01:05:21 +00:00
Amaury Séchet
7457acb842 [NFC] Autogenerate CodeGen/ARM/2011-03-15-LdStMultipleBug.ll 2023-06-25 01:02:49 +00:00
Amaury Séchet
e271a539c5 [NFC] Autogenerate CodeGen/ARM/and-sext-combine.ll 2023-06-25 00:55:03 +00:00
Amaury Séchet
78c1985f99 [NFC] Autogenerate CodeGen/ARM/machine-cse-cmp.ll 2023-06-25 00:44:30 +00:00
Amaury Séchet
2e8111d4c4 [NFC] Autogenerate CodeGen/ARM/pr35103.ll 2023-06-25 00:29:14 +00:00
Ties Stuij
5ddd561cb5 disable execute-only tests which are failing with expensive checks
Temporarily disabling the execute-only tests. We recently added codegen for
armv6-m, which is still in heavy development (D152795).

Disabling the tests while we're figuring out what's going on is probably the
least disruptive option, as a patch dependent on it also already landed.
2023-06-23 16:35:24 +01:00
Ties Stuij
2273741ea2 [ARM] generate armv6m eXecute Only (XO) code
[ARM] generate armv6m eXecute Only (XO) code for immediates, globals

Previously eXecute Only (XO) support was implemented for targets that support
MOVW/MOVT (~armv7+). See: https://reviews.llvm.org/D27449

XO prevents the compiler from generating data accesses to code sections. This
patch implements XO codegen for armv6-M, which does not support MOVW/MOVT, and
must resort to the following general pattern to avoid loads:

    movs    r3, :upper8_15:foo
    lsls    r3, #8
    adds    r3, :upper0_7:foo
    lsls    r3, #8
    adds    r3, :lower8_15:foo
    lsls    r3, #8
    adds    r3, :lower0_7:foo
    ldr     r3, [r3]

This is equivalent to the code pattern generated by GCC.

The above relocations are new to LLVM and have been implemented in a parent
patch: https://reviews.llvm.org/D149443.

This patch limits itself to implementing codegen for this pattern and enabling
XO for armv6-M in the backend.

Separate patches will follow for:
- switch tables
- replacing specific loads from constant islands which are spread out over the
  ARM backend codebase. Amongst others: FastISel, call lowering, stack frames.

Reviewed By: john.brawn

Differential Revision: https://reviews.llvm.org/D152795
2023-06-23 10:50:47 +01:00