Code generation of VLD3, VLD4, VST3 and VST4 with register writeback is
broken due to 2 separate bugs:
1) VLD1d64TPseudoWB_register and VLD1d64QPseudoWB_register are missing
rules to expand them to non pseudo MIR. These are selected for
ARMISD::VLD3_UPD/VLD4_UPD with v1i64 vectors in SelectVLD.
2) Selection of the right VLD/VST instruction is broken for load and
store of 3 and 4 v1i64 vectors. SelectVLD and SelectVST are called
with MIR opcode for fixed writeback (ie increment is access size)
and call getVLDSTRegisterUpdateOpcode() to select an opcode with
register writeback if base register update is of a different size.
Since getVLDSTRegisterUpdateOpcode() only knows about
VLD1/VLD2/VST1/VST2 the call is currently conditional on the number
of element in the vector.
However, VLD1/VST1 is selected by SelectVLD/SelectVST's caller for
load and stores of 3 or 4 v1i64 vectors. Therefore the opcode is not
updated which later lead to a fixed writeback instruction being
constructed with an extra operand for the register writeback.
This patch addresses the two issues as follows:
- it adds the necessary mapping from VLD1d64TPseudoWB_register and
VLD1d64QPseudoWB_register to VLD1d64Twb_register and
VLD1d64Qwb_register respectively. Like for the existing _fixed
variants, the cost of these is bumped for unaligned access.
- it changes the logic in SelectVLD and SelectVSD to call isVLDfixed
and isVSTfixed respectively to decide whether the opcode should be
updated. It also reworks the logic and comments for pushing the
writeback offset operand and r0 operand to clarify the logic:
writeback offset needs to be pushed if it's a register writeback,
r0 needs to be pushed if not and the instruction is a
VLD1/VLD2/VST1/VST2.
Reviewers: rengolin, t.p.northover, samparker
Reviewed By: samparker
Patch by Thomas Preud'homme <thomas.preudhomme@arm.com>
Differential Revision: https://reviews.llvm.org/D42970
llvm-svn: 326570
Emulated TLS is enabled by llc flag -emulated-tls,
which is passed by clang driver.
When llc is called explicitly or from other drivers like LTO,
missing -emulated-tls flag would generate wrong TLS code for targets
that supports only this mode.
Now use useEmulatedTLS() instead of Options.EmulatedTLS to decide whether
emulated TLS code should be generated.
Unit tests are modified to run with and without the -emulated-tls flag.
Differential Revision: https://reviews.llvm.org/D42999
llvm-svn: 326341
Summary:
Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves
In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.
Patch by Martin Svanfeldt
Reviewers: fhahn, pbarrio, rogfer01
Reviewed By: pbarrio, rogfer01
Subscribers: chrib, yroux, eugenis, efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42574
llvm-svn: 326333
We were always setting the block alignment to 2 bytes in Thumb mode
and 4-bytes in ARM mode (r325754, and r325012), but this could cause
reducing the block alignment when it already had been aligned (e.g.
in Thumb mode when the block is a CPE that was already 4-byte aligned).
Patch by Momchil Velikov, I've only added a test.
Differential Revision: https://reviews.llvm.org/D43777
llvm-svn: 326232
Re-enable commit r323991 now that r325931 has been committed to make
MachineOperand::isRenamable() check more conservative w.r.t. code
changes and opt-in on a per-target basis.
llvm-svn: 326208
In r322867, we introduced IsStandalone when printing MIR in -debug
output. The default behaviour for that was:
1) If any of MBB, MI, or MO are -debug-printed separately, don't omit any
redundant information.
2) When -debug-printing a MF entirely, don't print any redundant
information.
3) When printing MIR, don't print any redundant information.
I'd like to change 2) to:
2) When -debug-printing a MF entirely, don't omit any redundant information.
Differential Revision: https://reviews.llvm.org/D43337
llvm-svn: 326094
This is a follow up of r325012, that allowed half types in constant pools.
Proper alignment was enforced when a big basic block was split up, but not when
a CPE was placed before/after a block; the successor block had the wrong
alignment.
Differential Revision: https://reviews.llvm.org/D43580
llvm-svn: 325754
Follow up of Clang commit r325351; this adds the LLVM tests, which
were also missing.
Differential Revision: https://reviews.llvm.org/D43395
llvm-svn: 325443
This reverts commit r323991.
This commit breaks target that don't model all the register constraints
in TableGen. So far the workaround was to set the
hasExtraXXXRegAllocReq, but it proves that it doesn't cover all the
cases.
For instance, when mutating an instruction (like in the lowering of
COPYs) the isRenamable flag is not properly updated. The same problem
will happen when attaching machine operand from one instruction to
another.
Geoff Berry is working on a fix in https://reviews.llvm.org/D43042.
llvm-svn: 325421
Enable multiple COPY hints to eliminate more COPYs during register allocation.
Note that this is something all targets should do, see
https://reviews.llvm.org/D38128.
Review: Eli Friedman
llvm-svn: 325327
Summary:
Currently when expanding a SETCC node into a SELECT_CC, LLVM uses
an incorrect type for determining BooleanContent of the result. This
patch fixes the issue.
Fixes PR36079.
Reviewers: rogfer01, javed.absar, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D43282
llvm-svn: 325325
This patch combines some cases of ARMISD::CMOV for integers that arise in comparisons of the form
a != b ? x : 0
a == b ? 0 : x
and that currently (e.g. in Thumb1) are emitted as branches.
Differential Revision: https://reviews.llvm.org/D34515
llvm-svn: 325323
Summary:
In LLVM, 't' selects a floating-point/SIMD register and only supports
32-bit values. This is appropriately documented in the LLVM Language
Reference Manual. However, this behaviour diverges from that of GCC, where
't' selects the s0-s31 registers and its qX and dX variants depending on
additional operand modifiers (q/P).
For example, the following C code:
#include <arm_neon.h>
float32x4_t a, b, x;
asm("vadd.f32 %0, %1, %2" : "=t" (x) : "t" (a), "t" (b))
results in the following assembly if compiled with GCC:
vadd.f32 s0, s0, s1
whereas LLVM will show "error: couldn't allocate output register for
constraint 't'", since a, b, x are 128-bit variables, not 32-bit.
This patch extends the use of 't' to mean that of GCC, thus allowing
selection of the lower Q vector regs and their D/S variants. For example,
the earlier code will now compile as:
vadd.f32 q0, q0, q1
This behaviour still differs from that of GCC but I think it is actually
more correct, since LLVM picks up the right register type based on the
datatype of x, while GCC would need an extra operand modifier to achieve
the same result, as follows:
asm("vadd.f32 %q0, %q1, %q2" : "=t" (x) : "t" (a), "t" (b))
Since this is only an extension of functionality, existing code should not
be affected by this change. Note that operand modifiers q/P are already
supported by LLVM, so this patch should suffice to support inline
assembly with constraint 't' originally built for GCC.
Reviewers: grosbach, rengolin
Reviewed By: rengolin
Subscribers: rogfer01, efriedma, olista01, aemerson, javed.absar, eraman, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42962
llvm-svn: 325244
Change ARMConstantIslandPass to:
- accept f16 literals as litpool entries,
- if the litpool needs to be inserted in the middle of a big block, then we
need to 4-byte align the next instruction in ARM mode.
Differential Revision: https://reviews.llvm.org/D42784
llvm-svn: 325012
If merging them, the dllexport attribute needs to be brought along
to the new GlobalAlias.
Differential Revision: https://reviews.llvm.org/D43192
llvm-svn: 324937
Add a common -trap-unreachable option, similar to the target
specific hexagon equivalent, which has been replaced. This
turns unreachable instructions into traps, which is useful for
debugging.
Differential Revision: https://reviews.llvm.org/D42965
llvm-svn: 324880
* Use uleb128 for code offsets in the LSDA call site table.
* Omit the TTBase offset if the type table is empty.
This change can reduce the size of the DWARF/Itanium LSDA by about half.
Patch by Ryan Prichard!
llvm-svn: 324750
Rely on the assembler to finalize the layout of the DWARF/Itanium
exception-handling LSDA. Rather than calculate the exact size of each
thing in the LSDA, use assembler directives:
To emit the offset to the TTBase label:
.uleb128 .Lttbase0-.Lttbaseref0
.Lttbaseref0:
To emit the size of the call site table:
.uleb128 .Lcst_end0-.Lcst_begin0
.Lcst_begin0:
... call site table entries ...
.Lcst_end0:
To align the type info table:
... action table ...
.balign 4
.long _ZTIi
.long _ZTIl
.Lttbase0:
Using assembler directives simplifies the compiler and allows switching
the encoding of offsets in the call site table from udata4 to uleb128 for
a large code size savings. (This commit does not change the encoding.)
The combination of the uleb128 followed by a balign creates an unfortunate
dependency cycle that the assembler must sometimes resolve either by
padding an LEB or by inserting zero padding before the type table. See
PR35809 or GNU as bug 4029.
Patch by Ryan Prichard!
llvm-svn: 324749
Instead of:
%bb.1: derived from LLVM BB %for.body
print:
bb.1.for.body:
Also use MIR syntax for MBB attributes like "align", "landing-pad", etc.
llvm-svn: 324563
This is a follow up of r324321, adding a match pattern for mov with a FP16
immediate (also fixing operand vfp_f16imm that wasn't even compiling).
Differential Revision: https://reviews.llvm.org/D42973
llvm-svn: 324456
Following up on the discussion from
http://lists.llvm.org/pipermail/llvm-dev/2017-April/112305.html, undef
values are now placed in the .bss as well as null values. This prevents
undef global values taking up potentially huge amounts of space in the
.data section.
The following two lines now both generate equivalent .bss data:
@vals1 = internal unnamed_addr global [20000000 x i32] zeroinitializer, align 4
@vals2 = internal unnamed_addr global [20000000 x i32] undef, align 4 ; previously unaccounted for
This is primarily motivated by the corresponding issue in the Rust
compiler (https://github.com/rust-lang/rust/issues/41315).
Differential Revision: https://reviews.llvm.org/D41705
Patch by varkor!
llvm-svn: 324424
See D42509 for the original version of this.
Basically, there are two significant changes to behavior here:
- addLiveOuts always adds all pristine registers (even if a block has
no successors).
- addLiveOuts and addLiveOutsNoPristines always add all callee-saved
registers for return blocks (including conditional return blocks).
I cleaned up the functions a bit to make it clear these properties hold.
Differential Revision: https://reviews.llvm.org/D42655
llvm-svn: 324422
This is a follow up of r324321, adding f16 <-> f32 and f16 <-> f64 conversion
match patterns.
Differential Revision: https://reviews.llvm.org/D42954
llvm-svn: 324360
This adds most of the FP16 codegen support, but these areas need further work:
- FP16 literals and immediates are not properly supported yet (e.g. literal
pool needs work),
- Instructions that are generated from intrinsics (e.g. vabs) haven't been
added.
This will be addressed in follow-up patches.
Differential Revision: https://reviews.llvm.org/D42849
llvm-svn: 324321
Example situation:
```
BB0:
%0 = ...
use %0
; ...
condjump BB1
jmp BB2
BB1:
%0 = ... ; rematerialized def from above (from earlier split step)
jmp BB2
BB2:
; ...
use %0
```
%0 will have a live interval with 3 value numbers (for the BB0, BB1 and
BB2 parts). Now SplitKit tries and succeeds in rematerializing the value
number in BB2 (This only works because it is a secondary split so
SplitKit is can trace this back to a single original def).
We need to recompute all live ranges affected by a value number that we
rematerialize. The case that we missed before is that when the value
that is rematerialized is at a join (Phi VNI) then we also have to
recompute liveness for the predecessor VNIs.
rdar://35699130
Differential Revision: https://reviews.llvm.org/D42667
llvm-svn: 324039
Summary:
This change extends MachineCopyPropagation to do COPY source forwarding
and adds an additional run of the pass to the default pass pipeline just
after register allocation.
This version of this patch uses the newly added
MachineOperand::isRenamable bit to avoid forwarding registers is such a
way as to violate constraints that aren't captured in the
Machine IR (e.g. ABI or ISA constraints).
This change is a continuation of the work started in D30751.
Reviewers: qcolombet, javed.absar, MatzeB, jonpa, tstellar
Subscribers: tpr, mgorny, mcrosier, nhaehnle, nemanjai, jyknight, hfinkel, arsenm, inouehrs, eraman, sdardis, guyblank, fedor.sergeev, aheejin, dschuff, jfb, myatsina, llvm-commits
Differential Revision: https://reviews.llvm.org/D41835
llvm-svn: 323991
Commit r323512 introduced an optimisation in LowerReturn for half-precision
return values. A missing check caused a crash when the return value is "undef"
(i.e. a node that has no operands).
Differential Revision: https://reviews.llvm.org/D42743
llvm-svn: 323968
Discussed here:
http://lists.llvm.org/pipermail/llvm-dev/2018-January/120320.html
In preparation for adding support for named vregs we are changing the sigil for
physical registers in MIR to '$' from '%'. This will prevent name clashes of
named physical register with named vregs.
llvm-svn: 323922
Summary:
Expressions of the form x < 0 ? 0 : x; and x < -1 ? -1 : x can be lowered using bit-operations instead of branching or conditional moves
In thumb-mode this results in a two-instruction sequence, a shift followed by a bic or or while in ARM/thumb2 mode that has flexible second operand the shift can be folded into a single bic/or instructions. In most cases this results in smaller code and possibly less branches, and in no case larger than before.
Patch by Marten Svanfeldt.
Reviewers: fhahn, pbarrio
Reviewed By: pbarrio
Subscribers: efriedma, rogfer01, aemerson, javed.absar, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D42574
llvm-svn: 323869
Half-precision arguments and return values are passed as if it were an int or
float for ARM. This results in truncates and bitcasts to/from i16 and f16
values, which are legalized very early to stack stores/loads. When FullFP16 is
enabled, we want to avoid codegen for these bitcasts as it is unnecessary and
inefficient.
Differential Revision: https://reviews.llvm.org/D42580
llvm-svn: 323861
Legal if we have hardware support for floating point, libcalls
otherwise.
Also add the necessary support for libcalls in the legalizer helper.
llvm-svn: 323726