Commit Graph

4102 Commits

Author SHA1 Message Date
Florian Hahn
eb5c7f6b8f [ARM] Add test with tcreturn and debug value.
In the attached test case, a non-terminator instruction (DBG_VALUE) is
inserted after a terminator, producing an invalid MBB.
2020-07-10 16:32:21 +01:00
Puyan Lotfi
7e169cec74 [NFC][test] Adding fastcc test case for promoted 16-bit integer bitcasts.
The following: https://reviews.llvm.org/D82552

fixed an assert in the SelectionDag ISel legalizer for some CCs on armv7.

I noticed that this fix also fixes the assert when using fastcc, so I am
adding a fastcc regression test here.

Differential Revision: https://reviews.llvm.org/D82443
2020-07-09 11:38:49 -07:00
Lucas Prates
fc39a9ca0e [CodeGen] Matching promoted type for 16-bit integer bitcasts from fp16 operand
Summary:
When legalizing a biscast operation from an fp16 operand to an i16 on a
target that requires both input and output types to be promoted to
32-bits, an assertion can fail when building the new node due to a
mismatch between the the operation's result size and the type specified to
the node.

This patches fix the issue by making sure the bit width of the types
match for the FP_TO_FP16 node, covering the difference with an extra
ANYEXTEND operation.

Reviewers: ostannard, efriedma, pirama, jmolloy, plotfi

Reviewed By: efriedma

Subscribers: hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82552
2020-07-09 09:46:17 +01:00
David Green
74ca67c109 [ARM] Remove hasSideEffects from FP converts
Whether an instruction is deemed to have side effects in determined by
whether it has a tblgen pattern that emits a single instruction.
Because of the way a lot of the the vcvt instructions are specified
either in dagtodag code or with patterns that emit multiple
instructions, they don't get marked as not having side effects.

This just marks them as not having side effects manually. It can help
especially with instruction scheduling, to not create artificial
barriers, but one of these tests also managed to produce fewer
instructions.

Differential Revision: https://reviews.llvm.org/D81639
2020-07-05 16:23:24 +01:00
Nicholas Guy
dc8e4d8566 [ARM] Rearrange SizeReduction when using -Oz
Move the Thumb2SizeReduce pass to before IfConversion when optimising
for minimal code size.

Running the Thumb2SizeReduction pass before IfConversionallows T1
instructions to propagate to the final output, rather than the
ifConverter modifying T2 instructions and preventing them from being
reduced later.

This change does introduce a regression regarding execution time, so
it's only applied when optimising for size.

Running the LLVM Test Suite with this change produces a geomean
difference of -0.1% for the size..text metric.

Differential Revision: https://reviews.llvm.org/D82439
2020-07-02 09:19:38 +01:00
James Y Knight
4b0aa5724f Change the INLINEASM_BR MachineInstr to be a non-terminating instruction.
Before this instruction supported output values, it fit fairly
naturally as a terminator. However, being a terminator while also
supporting outputs causes some trouble, as the physreg->vreg COPY
operations cannot be in the same block.

Modeling it as a non-terminator allows it to be handled the same way
as invoke is handled already.

Most of the changes here were created by auditing all the existing
users of MachineBasicBlock::isEHPad() and
MachineBasicBlock::hasEHPadSuccessor(), and adding calls to
isInlineAsmBrIndirectTarget or mayHaveInlineAsmBr, as appropriate.

Reviewed By: nickdesaulniers, void

Differential Revision: https://reviews.llvm.org/D79794
2020-07-01 12:51:50 -04:00
Guillaume Chatelet
3500d9ec95 Fix invalid alignment in DAGCombiner::isLegalNarrowLdSt
`ShAmt / 8` can be a non power of two, this can lead to an invalid alignment.
context: https://reviews.llvm.org/D41350#inline-749165

Differential Revision: https://reviews.llvm.org/D82565
2020-06-29 09:22:15 +00:00
David Green
d428f88152 [ARM] VCVTT fpround instruction selection
Similar to the recent patch for fpext, this adds vcvtb and vcvtt with
insert into vector instruction selection patterns for fptruncs. This
helps clear up a lot of register shuffling that we would otherwise do.

Differential Revision: https://reviews.llvm.org/D81637
2020-06-26 10:24:06 +01:00
David Green
76e0e1a55d [ARM] VCVTT instruction selection
We current extract and convert from a top lane of a f16 vector using a
VMOVX;VCVTB pair. We can simplify that to use a single VCVTT. The
pattern is mostly copied from a vector extract pattern, but produces a
VCVTTHS f32 directly.

This had to move some code around so that ARMInstrVFP had access to the
required pattern frags that were previously part of ARMInstrNEON.

Differential Revision: https://reviews.llvm.org/D81556
2020-06-26 08:58:55 +01:00
Simon Tatham
b769eb02b5 [ARM][BFloat] Legalize bf16 type even without fullfp16.
Summary:
This change permits scalar bfloats to be loaded, stored, moved and
used as function call arguments and return values, whenever the bf16
feature is supported by the subtarget.

Previously that was only supported in the presence of the fullfp16
feature, because the code generation strategy depended on instructions
from that extension. This change adds alternative code generation
strategies so that those operations can be done even without fullfp16.

The strategy for loads and stores is to replace VLDRH/VSTRH with
integer LDRH/STRH plus a move between register classes. I've written
isel patterns for those, conditional on //not// having the fullfp16
feature (so that in the fullfp16 case, the existing patterns will
still be used).

For function arguments and returns, instead of writing isel patterns
to match `VMOVhr` and `VMOVrh`, I've avoided generating those SDNodes
in the first place, by factoring out the code that constructs them
into helper functions `MoveToHPR` and `MoveFromHPR` which have a
fallback for non-fullfp16 subtargets.

The current output code is not especially pretty: in the new test file
you can see unnecessary store/load pairs implementing no-op bitcasts,
and lots of pointless moves back and forth between FP registers and
GPRs. But it at least works, which is an improvement on the previous
situation.

Reviewers: dmgreen, SjoerdMeijer, stuij, chill, miyuki, labrinea

Reviewed By: dmgreen, labrinea

Subscribers: labrinea, kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D82372
2020-06-24 09:36:26 +01:00
Momchil Velikov
adf7973fd3 [ARM] Describe defs/uses of VLLDM and VLSTM
The VLLDM and VLSTM instructions are incompletely specified.  They
(potentially) write (or read, respectively) registers Q0-Q7, VPR, and
FPSCR, but the compiler is unaware of it.

In the new test case `cmse-vlldm-no-reorder.ll` case the compiler
missed an anti-dependency and reordered a `VLLDM` ahead of the
instruction, which stashed the return value from the non-secure call,
effectively clobbering said value.

This test case does not fail with upstream LLVM, because of scheduling
differences and I couldn't find a test case for the VLSTM either.

Differential Revision: https://reviews.llvm.org/D81586
2020-06-23 16:04:23 +01:00
Mikhail Maltsev
3f353a2e5a [BFloat] Add convert/copy instrinsic support
This patch is part of a series implementing the Bfloat16 extension of the Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

Specifically it adds intrinsic support in clang and llvm for Arm and AArch64.

The bfloat type, and its properties are specified in the Arm Architecture Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:
  - Alexandros Lamprineas
  - Luke Cheeseman
  - Mikhail Maltsev
  - Momchil Velikov
  - Luke Geeson

Differential Revision: https://reviews.llvm.org/D80928
2020-06-23 14:27:05 +00:00
Mikhail Maltsev
9c579540ff [ARM] BFloat MatMul Intrinsics&CodeGen
Summary:
This patch adds support for BFloat Matrix Multiplication Intrinsics
and Code Generation from __bf16 to AArch32. This includes IR intrinsics. Tests are
provided as needed.

This patch is part of a series implementing the Bfloat16 extension of
the
Armv8.6-a architecture, as detailed here:

https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/arm-architecture-developments-armv8-6-a

The bfloat type and its properties are specified in the Arm
Architecture
Reference Manual:

https://developer.arm.com/docs/ddi0487/latest/arm-architecture-reference-manual-armv8-for-armv8-a-architecture-profile

The following people contributed to this patch:

 - Luke Geeson
 - Momchil Velikov
 - Mikhail Maltsev
 - Luke Cheeseman
 - Simon Tatham

Reviewers: stuij, t.p.northover, SjoerdMeijer, sdesmalen, fpetrogalli, LukeGeeson, simon_tatham, dmgreen, MarkMurrayARM

Reviewed By: MarkMurrayARM

Subscribers: MarkMurrayARM, danielkiss, kristof.beyls, hiraditya, cfe-commits, llvm-commits, chill, miyuki

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81740
2020-06-23 12:06:37 +00:00
Mikhail Maltsev
490f78c038 [ARM][BFloat] Implement lowering of bf16 load/store intrinsics
Reviewers: labrinea, dmgreen, pratlucas, LukeGeeson

Reviewed By: dmgreen

Subscribers: kristof.beyls, hiraditya, danielkiss, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81486
2020-06-19 14:02:35 +00:00
Mikhail Maltsev
7526881246 [ARM][BFloat] Lowering of create/get/set/dup intrinsics
This patch adds codegen for the following BFloat
operations to the ARM backend:
* concatenation of bf16 vectors
* bf16 vector element extraction
* bf16 vector element insertion
* duplication of a bf16 value into each lane of a vector
* duplication of a bf16 vector lane into each lane

Differential Revision: https://reviews.llvm.org/D81411
2020-06-19 12:52:40 +00:00
Alexandros Lamprineas
ecdf48f15b [ARM] Basic bfloat support
This patch adds basic support for BFloat in the Arm backend.
For now the code generation relies on fullfp16 being present.

Briefly:
* adds the bfloat scalar and vector types in the necessary register classes,
* adjusts the calling convention to cope with bfloat argument passing and return,
* adds codegen patterns for moves, loads and stores.

It's tested mostly by the intrinsic patches that depend on it (load/store, convert/copy).

The following people contributed to this patch:

 * Alexandros Lamprineas
 * Ties Stuij

Differential Revision: https://reviews.llvm.org/D81373
2020-06-18 17:26:24 +01:00
Lucas Prates
92ad6d57c2 [ARM] Moving CMSE handling of half arguments and return to the backend
Summary:
As half-precision floating point arguments and returns were previously
coerced to either float or int32 by clang's codegen, the CMSE handling
of those was also performed in clang's side by zeroing the unused MSBs
of the coercer values.

This patch moves this handling to the backend's calling convention
lowering, making sure the high bits of the registers used by
half-precision arguments and returns are zeroed.

Reviewers: chill, rjmccall, ostannard

Reviewed By: ostannard

Subscribers: kristof.beyls, hiraditya, danielkiss, cfe-commits, llvm-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D81428
2020-06-18 13:16:29 +01:00
Lucas Prates
a255931c40 [ARM] Supporting lowering of half-precision FP arguments and returns in AArch32's backend
Summary:
Half-precision floating point arguments and returns are currently
promoted to either float or int32 in clang's CodeGen and there's
no existing support for the lowering of `half` arguments and returns
from IR in AArch32's backend.

Such frontend coercions, implemented as coercion through memory
in clang, can cause a series of issues in argument lowering, as causing
arguments to be stored on the wrong bits on big-endian architectures
and incurring in missing overflow detections in the return of certain
functions.

This patch introduces the handling of half-precision arguments and returns in
the backend using the actual "half" type on the IR. Using the "half"
type the backend is able to properly enforce the AAPCS' directions for
those arguments, making sure they are stored on the proper bits of the
registers and performing the necessary floating point convertions.

Reviewers: rjmccall, olista01, asl, efriedma, ostannard, SjoerdMeijer

Reviewed By: ostannard

Subscribers: stuij, hiraditya, dmgreen, llvm-commits, chill, dnsampaio, danielkiss, kristof.beyls, cfe-commits

Tags: #clang, #llvm

Differential Revision: https://reviews.llvm.org/D75169
2020-06-18 13:15:13 +01:00
Yvan Roux
ffe8f6d33b [ARM][MachineOutliner] Fix no-lr-save testcase.
Now that saving LR into a register is handled, some register constraints
are needed to keep machine-outliner-no-lr-save.mir meaningful.
2020-06-15 16:09:31 +02:00
Yvan Roux
669066de65 [ARM][MachineOutliner] Add LR RegSave mode.
Outline chunks of code which need to save and restore the link register
when a spare register can be used to it.

Differential Revision: https://reviews.llvm.org/D80127
2020-06-15 15:22:08 +02:00
Michael Liao
e7b920e6fe [DAGCombine] Generalize the case (add (or x, c1), c2) -> (add x, (c1 + c2))
Reviewers: arsenm

Subscribers: sdardis, wdng, hiraditya, asb, rbar, johnrusso, simoncook, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, Jim, lenary, s.egerton, pzheng, sameer.abuasal, apazos, luismarques, ecnelises, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81708
2020-06-12 13:53:08 -04:00
Yvan Roux
6b8628a1f0 [ARM][MachineOutliner] Add NoLRSave mode.
Outline chunks of code which don't need a save/restore mechanism of the
link register.

Differential Revision: https://reviews.llvm.org/D80125
2020-06-11 08:45:46 +02:00
David Green
61ef2d27c4 [ARM] Update fp16-insert-extract.ll test checks. NFC 2020-06-10 17:50:27 +01:00
Simon Wallis
4dba59689d [ARM] prologue instructions emitted for naked function with >64 byte argument
Summary:

The naked function attribute is meant to suppress all function
prologue/epilogue instructions.

On ARM, some are still emitted if an argument greater than 64 bytes in size
(the threshold for using the byval attribute in IR) is passed partially
in registers.

Perform the check for Attribute::Naked and early exit in
SelectionDAGISel::LowerArguments().

Checking in ARMFrameLowering::determineCalleeSaves() is too late.

A test case is included.

Reviewers: llvm-commits, olista01, danielkiss

Reviewed By: danielkiss

Subscribers: kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80715

Change-Id: Icedecf2a4ad31bc3c35ab0df7489a9d346e1f7cc
2020-06-09 11:33:03 +01:00
Eli Friedman
7e58d0ded0 Revert "[arm][darwin] Don't generate libcalls for wide shifts on Darwin"
This reverts commit 2ba016cd5c.

This is causing a failure on the clang-cmake-armv7-full bot, and there
are outstanding review comments.
2020-06-08 16:37:29 -07:00
Simon Wallis
7432fb2c78 [ARM][XO] Execute-only miscompiles double literals for big-endian
Summary:
With -mbig-endian -mexecute-only and targeting an fpu,
an incorrect sequence of movw/movt was generated to construct a double literal.
The test suite was hardwired to check these wrong values.

The fault was caused by the explicit word swap in LowerConstantFP().

With -mbig-endian -mexecute-only -mfpu=none, a correct sequence of
movw/movt is generated to construct a double literal.
The test suite did not test this no fpu case.

The test suite expected values have been corrected.
The test file is updated to add testing of fpu=none case

Reviewers: christof, llvm-commits, dmgreen

Reviewed By: dmgreen

Subscribers: dmgreen, kristof.beyls, hiraditya, danielkiss

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D81259

Change-Id: Ia3737df243218c89c82f02b7f9f4032ecd5a3917
2020-06-08 08:13:08 +01:00
Alex Lorenz
2ba016cd5c [arm][darwin] Don't generate libcalls for wide shifts on Darwin
Similar to ceb801612a.

Darwin doesn't always use compiler-rt, and so we can't assume that these
functions are available on arm.
2020-06-05 15:41:23 -07:00
Matt Arsenault
66251f7e1d RegAllocFast: Record internal state based on register units
Record internal state based on register units. This is often more
efficient as there are typically fewer register units to update
compared to iterating over all the aliases of a register.

Original patch by Matthias Braun, but I've been rebasing and fixing it
for almost 2 years and fixed a few bugs causing intermediate failures
to make this patch independent of the changes in
https://reviews.llvm.org/D52010.
2020-06-03 16:51:46 -04:00
Matt Arsenault
056a375b7c ARM: Reduce debug info testcase
This had multiple functions and only one vague check. Reduce it.
2020-06-03 10:33:32 -04:00
Zequan Wu
80e107ccd0 Add NoMerge MIFlag to avoid MIR branch folding
Let the codegen recognized the nomerge attribute and disable branch folding when the attribute is given

Differential Revision: https://reviews.llvm.org/D79537
2020-05-29 12:31:06 -07:00
Victor Campos
c010d4d195 [ARM] Improve codegen of volatile load/store of i64
Summary:
Instead of generating two i32 instructions for each load or store of a volatile
i64 value (two LDRs or STRs), now emit LDRD/STRD.

These improvements cover architectures implementing ARMv5TE or Thumb-2.

The code generation explicitly deviates from using the register-offset
variant of LDRD/STRD. In this variant, the register allocated to the
register-offset cannot be reused in any of the remaining operands. Such
restriction seems to be non-trivial to implement in LLVM, thus it is
left as a to-do.

Differential Revision: https://reviews.llvm.org/D70072
2020-05-28 10:52:43 +01:00
Juneyoung Lee
54b6457240 [TargetPassConfig] Add CanonicalizeFreezeInLoops before LSR
Summary:
This patch adds CanonicalizeFreezeInLoops before LSR.
Relevant patch: https://reviews.llvm.org/D77523

Reviewers: spatel, efriedma, jdoerfert, fhahn, nikic, reames, xbolva00

Reviewed By: nikic

Subscribers: xbolva00, nikic, lebedev.ri, hiraditya, llvm-commits, sanwou01, nlopes

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77524
2020-05-28 05:21:12 +09:00
Nikita Popov
0c6bba71e3 [TargetPassConfig] Don't add alias analysis at optnone
When performing codegen at optnone, don't add alias analysis to
the pipeline. We don't need it, but it causes an unnecessary
dominator tree calculation.

I've also moved the module verifier call to the top so that a bunch
of disabled-at-optnone passes group more nicely.

Differential Revision: https://reviews.llvm.org/D80378
2020-05-23 10:35:03 +02:00
Jean-Michel Gorius
65cd2c7a80 Revert "[CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias"
This temporarily reverts commit 7019cea26d.

It seems that, for some targets, there are instructions with a lot of memory operands (probably more than would be expected). This causes a lot of buildbots to timeout and notify failed builds. While investigations are ongoing to find out why this happens, revert the changes.
2020-05-22 21:26:46 +02:00
Jon Roelofs
5a8db275f8 Revert "[llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC"
This reverts commit 183d6af081.

Revert pending further consensus building: https://reviews.llvm.org/D79963#2050521
2020-05-22 05:36:15 -06:00
Victor Campos
872ee78f65 Revert "[ARM] Improve codegen of volatile load/store of i64"
This reverts commit 8a12553223.

A bug has been found when generating code for Thumb2. In some very
specific cases, the prologue/epilogue emitter generates erroneous stack
offsets for the new LDRD instructions that access the stack.

This bug does not seem to be caused by the reverted patch though. Likely
the latter has made an undiscovered issue emerge in the
prologue/epilogue emission pass. Nevertheless, this reversion is
necessary since it is blocking users of the ARM backend.
2020-05-22 11:01:57 +01:00
Jean-Michel Gorius
7019cea26d [CodeGen] Add support for multiple memory operands in MachineInstr::mayAlias
Summary:
To support all targets, the mayAlias member function needs to support instructions with multiple operands.

This revision also changes the order of the emitted instructions in some test cases.

Reviewers: efriedma, hfinkel, craig.topper, dmgreen

Reviewed By: efriedma

Subscribers: MatzeB, dmgreen, hiraditya, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D80161
2020-05-21 23:02:54 +02:00
Jon Roelofs
183d6af081 [llvm][test] Add COM: directives before colon-less non-CHECKs in comments. NFC
Differential Revision: https://reviews.llvm.org/D79963
2020-05-21 09:29:27 -06:00
Diogo Sampaio
6c68f75ee4 Prevent register coalescing in functions whith setjmp
Summary:
In the the given example, a stack slot pointer is merged
between a setjmp and longjmp. This pointer is spilled,
so it does not get correctly restored, addinga undefined
behaviour where it shouldn't.

Change-Id: I60ec010844f2a24ce01ceccf12eb5eba5ab94abb

Reviewers: eli.friedman, thanm, efriedma

Reviewed By: efriedma

Subscribers: MatzeB, qcolombet, tpr, rnk, efriedma, hiraditya, llvm-commits, chill

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D77767
2020-05-16 00:36:34 +01:00
Yvan Roux
0e4827aa4e [ARM][MachineOutliner] Add Machine Outliner support for ARM.
Enables Machine Outlining for ARM and Thumb2 modes.  This is the first
patch of the series which adds all the basic logic for the support, and
only handles tail-calls and thunks.

The outliner can be turned on by using clang -moutline option or -mllvm
-enable-machine-outliner one (like AArch64).

Differential Revision: https://reviews.llvm.org/D76066
2020-05-15 08:44:23 +02:00
Eli Friedman
4532a50899 Infer alignment of unmarked loads in IR/bitcode parsing.
For IR generated by a compiler, this is really simple: you just take the
datalayout from the beginning of the file, and apply it to all the IR
later in the file. For optimization testcases that don't care about the
datalayout, this is also really simple: we just use the default
datalayout.

The complexity here comes from the fact that some LLVM tools allow
overriding the datalayout: some tools have an explicit flag for this,
some tools will infer a datalayout based on the code generation target.
Supporting this properly required plumbing through a bunch of new
machinery: we want to allow overriding the datalayout after the
datalayout is parsed from the file, but before we use any information
from it. Therefore, IR/bitcode parsing now has a callback to allow tools
to compute the datalayout at the appropriate time.

Not sure if I covered all the LLVM tools that want to use the callback.
(clang? lli? Misc IR manipulation tools like llvm-link?). But this is at
least enough for all the LLVM regression tests, and IR without a
datalayout is not something frontends should generate.

This change had some sort of weird effects for certain CodeGen
regression tests: if the datalayout is overridden with a datalayout with
a different program or stack address space, we now parse IR based on the
overridden datalayout, instead of the one written in the file (or the
default one, if none is specified). This broke a few AVR tests, and one
AMDGPU test.

Outside the CodeGen tests I mentioned, the test changes are all just
fixing CHECK lines and moving around datalayout lines in weird places.

Differential Revision: https://reviews.llvm.org/D78403
2020-05-14 13:03:50 -07:00
Momchil Velikov
bc2e572f51 Re-commit: [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to no use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns

* when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-14 16:46:16 +01:00
Craig Topper
de92dc2850 [Statepoint] Mark FixupStatepointCallerSaved as preserving the CFG
I'm hoping this will restore some compile time lost by D75936 and D75937.

Differential Revision: https://reviews.llvm.org/D79813
2020-05-13 10:59:44 -07:00
Eli Friedman
c9c930ae67 [SelectionDAG] Don't promote the alignment of allocas beyond the stack alignment.
allocas in LLVM IR have a specified alignment. When that alignment is
specified, the alloca has at least that alignment at runtime.

If the specified type of the alloca has a higher preferred alignment,
SelectionDAG currently ignores that specified alignment, and increases
the alignment. It does this even if it would trigger stack realignment.
I don't think this makes sense, so this patch changes that.

I was looking into this for SVE in particular: for SVE, overaligning
vscale'ed types is extra expensive because it requires realigning the
stack multiple times, or using dynamic allocation. (This currently isn't
implemented.)

I updated the expected assembly for a couple tests; in particular, for
arg-copy-elide.ll, the optimization in question does not increase the
alignment the way SelectionDAG normally would. For the rest, I just
increased the specified alignment on the allocas to match what
SelectionDAG was inferring.

Differential Revision: https://reviews.llvm.org/D79532
2020-05-11 17:39:00 -07:00
James Y Knight
7af9d386da Correctly modify the CFG in IfConverter, and then remove the
CorrectExtraCFGEdges function.

The latter was a workaround for "Various pieces of code" leaving bogus
extra CFG edges in place. Where by "various" it meant only
IfConverter::MergeBlocks, which failed to clear all of the successors
of dead blocks it emptied out. This wouldn't matter a whole lot,
except that the dead blocks remained listed as predecessors of
still-useful blocks, inhibiting optimizations.

This fix slightly changed two thumb tests, because the correct CFG
successors allowed for the "diamond" if-conversion pattern to be
detected, when it could only use "simple" before.

Additionally, the removal of a now-redundant call to analyzeBranch
(with AllowModify=true) in BranchFolder::OptimizeFunction caused a
later check for an empty block in BranchFolder::OptimizeBlock to
fail. Correct this by moving the call to analyzeBranch in
OptimizeBlock higher.

Differential Revision: https://reviews.llvm.org/D79527
2020-05-07 18:17:07 -04:00
Momchil Velikov
fb18dffaeb Revert "[ARM] CMSE code generation"
This reverts commit 7cbbf89d23.

The regression tests fail with the expensive checks.
2020-05-05 19:05:40 +01:00
Momchil Velikov
7cbbf89d23 [ARM] CMSE code generation
This patch implements the final bits of CMSE code generation:

* emit special linker symbols

* restrict parameter passing to not use memory

* emit BXNS and BLXNS instructions for returns from non-secure entry
  functions, and non-secure function calls, respectively

* emit code to save/restore secure floating-point state around calls
  to non-secure functions

* emit code to save/restore non-secure floating-pointy state upon
  entry to non-secure entry function, and return to non-secure state

* emit code to clobber registers not used for arguments and returns
  when switching to no-secure state

Patch by Momchil Velikov, Bradley Smith, Javed Absar, David Green,
possibly others.

Differential Revision: https://reviews.llvm.org/D76518
2020-05-05 18:23:28 +01:00
Eli Friedman
1eb160fe8d [ARM] Fix tail call validity checking for varargs calls.
If a varargs function is calling a non-varargs function, or vice versa,
make sure we use the correct "varargs" bit for each.

Fixes https://bugs.llvm.org/show_bug.cgi?id=45234

Differential Revision: https://reviews.llvm.org/D79199
2020-05-04 12:34:14 -07:00
Evgeniy Brevnov
3e68a66704 [BPI][NFC] Reuse post dominantor tree from analysis manager when available
Summary: Currenlty BPI unconditionally creates post dominator tree each time. While this is not incorrect we can save compile time by reusing existing post dominator tree (when it's valid) provided by analysis manager.

Reviewers: skatkov, taewookoh, yrouban

Reviewed By: skatkov

Subscribers: hiraditya, steven_wu, dexonsmith, llvm-commits

Tags: #llvm

Differential Revision: https://reviews.llvm.org/D78987
2020-04-30 11:31:03 +07:00
LemonBoy
f30416fdde [AsmPrinter] Fix emission of non-standard integer constants for BE targets
The code assumed that zero-extending the integer constant to the
designated alloc size would be fine even for BE targets, but that's not
the case as that pulls in zeros from the MSB side while we actually
expect the padding zeros to go after the LSB.

I've changed the codepath handling the constant integers to use the
store size for both small(er than u64) and big constants and then add
zero padding right after that.

Differential Revision: https://reviews.llvm.org/D78011
2020-04-27 14:57:29 -07:00