Commit Graph

3501 Commits

Author SHA1 Message Date
Sam Parker
aaec3c6260 [ARM] Allow truncs as sources in ARM CGP
We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.

Differential Revision: https://reviews.llvm.org/D51978

llvm-svn: 342141
2018-09-13 15:14:12 +00:00
Sam Parker
96f77f142b [ARM] Fix FixConst for ARMCodeGenPrepare
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.

Differential Revision: https://reviews.llvm.org/D52032

llvm-svn: 342140
2018-09-13 14:48:10 +00:00
Tim Northover
c15d47bb01 ARM: align loops to 4 bytes on Cortex-M3 and Cortex-M4.
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.

This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.

llvm-svn: 342127
2018-09-13 10:28:05 +00:00
Diogo N. Sampaio
01b916e188 [ARM] Tighten f64<->f16 conversion requirements
Fix missing Requires fields.

Patch by Bernard Ogden (bogden)

Reviewers: SjoerdMeijer, javed.absar, t.p.northover	

Reviewed By: t.p.northover

Differential Revision: https://reviews.llvm.org/D51631

llvm-svn: 342061
2018-09-12 16:24:43 +00:00
Sam Parker
a023c7a9cb [ARM] Exchange MAC operands in ARMParallelDSP
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.

AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.

Differential Revision: https://reviews.llvm.org/D51424

llvm-svn: 342033
2018-09-12 09:17:44 +00:00
Sam Parker
569b24549e [ARM] Allow bitcasts in ARMCodeGenPrepare
Allow bitcasts in the use-def chains, treating them as sources.

Differential Revision: https://reviews.llvm.org/D50758

llvm-svn: 342032
2018-09-12 09:11:48 +00:00
Sam Parker
01db2983cd [ARM] Add smlald support in ARMParallelDSP
Search from i64 reducing phis, as well as i32, to allow the
generation of smlald instructions.

Differential Revision: https://reviews.llvm.org/D51101

llvm-svn: 341941
2018-09-11 14:01:22 +00:00
Tim Northover
bb7d7b3d33 ARM: fix Thumb2 CodeGen for ldrex with folded frame-index.
Because t2LDREX (& t2STREX) were marked as AddrModeNone, but did allow a
FrameIndex operand, rewriteT2FrameIndex asserted. This gives them a
proper addressing-mode and tells the rewriter about it so that encodable
offsets are exploited and others are rejected.

Should fix PR38828.

llvm-svn: 341642
2018-09-07 09:21:25 +00:00
Eric Christopher
fe83270ee9 The initial .text section generated in object files was missing the
SHF_ARM_PURECODE flag when being built with the -mexecute-only flag.
All code sections of an ELF must have the flag set for the final .text
section to be execute-only, otherwise the flag gets removed.

A HasData flag is added to MCSection to aid in the determination that
the section is empty. A virtual setTargetSectionFlags is added to
MCELFObjectTargetWriter to allow subclasses to set target specific
section flags to be added to sections which we then use in the ARM
backend to set SHF_ARM_PURECODE.

Patch by Ivan Lozano!

Reviewed By: echristo

Differential Revision: https://reviews.llvm.org/D48792

llvm-svn: 341593
2018-09-06 22:09:31 +00:00
Sanjay Patel
dbf52837fe [DAGCombiner] try to convert pow(x, 0.25) to sqrt(sqrt(x))
This was proposed as an IR transform in D49306, but it was not clearly justifiable as a canonicalization. 
Here, we only do the transform when the target tells us that sqrt can be lowered with inline code.

This is the basic case. Some potential enhancements are in the TODO comments:

1. Generalize the transform for other exponents (allow more than 2 sqrt calcs if that's really cheaper).
2. If we have less fast-math-flags, generate code to avoid -0.0 and/or INF.
3. Allow the transform when optimizing/minimizing size (might require a target hook to get that right).

Note that by default, x86 converts single-precision sqrt calcs into sqrt reciprocal estimate with 
refinement. That codegen is controlled by CPU attributes and can be manually overridden. We have plenty 
of test coverage for that already, so I didn't bother to include extra testing for that here. AArch uses 
its full-precision ops in all cases (not sure if that's the intended behavior or not, but that should 
also be covered by existing tests).

Differential Revision: https://reviews.llvm.org/D51630 

llvm-svn: 341481
2018-09-05 17:01:56 +00:00
Martin Storsjo
2dcaa41e1e [MinGW] [ARM] Add stubs for potential automatic dllimported variables
The runtime pseudo relocations can't handle the ARM format embedded
addresses in movw/movt pairs. By using stubs, the potentially
dllimported addresses can be touched up by the runtime pseudo relocation
framework.

Differential Revision: https://reviews.llvm.org/D51450

llvm-svn: 341176
2018-08-31 08:00:25 +00:00
Ties Stuij
9c16d809d2 [CodeGen] emit inline asm clobber list warnings for reserved (cont)
Summary:
This is a continuation of https://reviews.llvm.org/D49727
Below the original text, current changes in the comments:

Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.

For example:

  extern int bar(int[]);
  
  int foo(int i) {
    int a[i]; // VLA
    asm volatile(
        "mov r7, #1"
      :
      :
      : "r7"
    );
  
    return 1 + bar(a);
  }

Compiled for thumb, this gives:

  $ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
  ...
  foo:
          .fnstart
  @ %bb.0:                                @ %entry
          .save   {r4, r5, r6, r7, lr}
          push    {r4, r5, r6, r7, lr}
          .setfp  r7, sp, #12
          add     r7, sp, #12
          .pad    #4
          sub     sp, #4
          movs    r1, #7
          add.w   r0, r1, r0, lsl #2
          bic     r0, r0, #7
          sub.w   r0, sp, r0
          mov     sp, r0
          @APP
          mov.w   r7, #1
          @NO_APP
          bl      bar
          adds    r0, #1
          sub.w   r4, r7, #12
          mov     sp, r4
          pop     {r4, r5, r6, r7, pc}
  ...

r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.

This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward. Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.

The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.

If we find a reserved register, we print a warning:

  repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
        "mov r7, #1"
        ^

Reviewers: efriedma, olista01, javed.absar

Reviewed By: efriedma

Subscribers: eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D51165

llvm-svn: 341062
2018-08-30 12:52:35 +00:00
Florian Hahn
521dc4dda4 Fix "Q" and "R" inline assembly template modifiers for big-endian Arm
Consider the endianness of the target when printing register names.  This is in line with the documentation at http://llvm.org/docs/LangRef.html#asm-template-argument-modifiers

Patch by Jackson Woodruff <jackson.woodruff@arm.com>

Reviewers: t.p.northover, echristo, javed.absar, efriedma

Reviewed By: efriedma

Differential Revision: https://reviews.llvm.org/D49778

llvm-svn: 341052
2018-08-30 10:28:23 +00:00
Huihui Zhang
2f4106592d [GlobalMerge] Fix GlobalMerge on bss external global variables.
Summary:
Global variables that are external and zero initialized are
supposed to be merged with global variables in the bss section
rather than the data section.

Reviewers: efriedma, rengolin, t.p.northover, javed.absar, asl, john.brawn, pcc

Reviewed By: efriedma

Subscribers: dmgreen, llvm-commits

Differential Revision: https://reviews.llvm.org/D51379

llvm-svn: 341008
2018-08-30 00:49:50 +00:00
Eli Friedman
96e3cd85bd [ARM] Lower llvm.ctlz.i32 to a libcall when clz is not available.
The inline sequence is very long (about 70 bytes on Thumb1), so it's
not really a good idea to inline it, especially when optimizing for
size.

Differential Revision: https://reviews.llvm.org/D47917

llvm-svn: 340458
2018-08-22 21:47:14 +00:00
Martin Storsjo
5ab1d107bb [ARM] Avoid injecting constant islands in movw+movt pairs on Windows
On Windows, movw+movt pairs with relocations are handled with a single
relocation that covers them both. Therefore we can't inject anything
between these instructions, otherwise the relocation (which in LLVM
only is treated as the movw instruction's relocation, while the movt
instruction's relocation is dropped) will end up bogus.

These instructions are bundled up until right before the constant
islands pass, making this effectively the only place that can split
them apart.

Differential Revision: https://reviews.llvm.org/D51032

llvm-svn: 340451
2018-08-22 20:34:12 +00:00
Eli Friedman
c11e2b9470 [ARM] Handle all-ones mask explicitly in targetShrinkDemandedConstant.
This avoids a potential infinite loop setting and unsetting bits in the
mask.

Reduced from a failure on the polly-aosp bot.

Differential Revision: https://reviews.llvm.org/D51066

llvm-svn: 340446
2018-08-22 20:13:45 +00:00
Sam Parker
4d519fc3b5 [ARM] Rotated operand patterns for *xtb16
Add intrinsic isel patterns for sxtb16, sxtab16, uxtb16 and uxtab16
so that they can perform a ror.

Differential Revision: https://reviews.llvm.org/D51034

llvm-svn: 340405
2018-08-22 12:58:36 +00:00
Sam Parker
597811e7a7 [DAGCombiner] Reduce load widths of shifted masks
During combining, ReduceLoadWdith is used to combine AND nodes that
mask loads into narrow loads. This patch allows the mask to be a
shifted constant. This results in a narrow load which is then left
shifted to compensate for the new offset.

Differential Revision: https://reviews.llvm.org/D50432

llvm-svn: 340261
2018-08-21 10:26:59 +00:00
Eli Friedman
73e8a784e6 [SelectionDAG] Improve the legalisation lowering of UMULO.
There is no way in the universe, that doing a full-width division in
software will be faster than doing overflowing multiplication in
software in the first place, especially given that this same full-width
multiplication needs to be done anyway.

This patch replaces the previous implementation with a direct lowering
into an overflowing multiplication algorithm based on half-width
operations.

Correctness of the algorithm was verified by exhaustively checking the
output of this algorithm for overflowing multiplication of 16 bit
integers against an obviously correct widening multiplication. Baring
any oversights introduced by porting the algorithm to DAG, confidence in
correctness of this algorithm is extremely high.

Following table shows the change in both t = runtime and s = space. The
change is expressed as a multiplier of original, so anything under 1 is
“better” and anything above 1 is worse.

+-------+-----------+-----------+-------------+-------------+
| Arch  | u64*u64 t | u64*u64 s | u128*u128 t | u128*u128 s |
+-------+-----------+-----------+-------------+-------------+
|   X64 |     -     |     -     |    ~0.5     |    ~0.64    |
|  i686 |   ~0.5    |   ~0.6666 |    ~0.05    |    ~0.9     |
| armv7 |     -     |   ~0.75   |      -      |    ~1.4     |
+-------+-----------+-----------+-------------+-------------+

Performance numbers have been collected by running overflowing
multiplication in a loop under `perf` on two x86_64 (one Intel Haswell,
other AMD Ryzen) based machines. Size numbers have been collected by
looking at the size of function containing an overflowing multiply in
a loop.

All in all, it can be seen that both performance and size has improved
except in the case of armv7 where code size has regressed for 128-bit
multiply. u128*u128 overflowing multiply on 32-bit platforms seem to
benefit from this change a lot, taking only 5% of the time compared to
original algorithm to calculate the same thing.

The final benefit of this change is that LLVM is now capable of lowering
the overflowing unsigned multiply for integers of any bit-width as long
as the target is capable of lowering regular multiplication for the same
bit-width. Previously, 128-bit overflowing multiply was the widest
possible.

Patch by Simonas Kazlauskas!

Differential Revision: https://reviews.llvm.org/D50310

llvm-svn: 339922
2018-08-16 18:39:39 +00:00
Sam Parker
0d51197051 [ARM] Ignore GEPs in ARMCodeGenPrepare
While searching through the use-def tree, ignore GetElementPtrInst
instructions because they don't need promoting and neither do their
indices. Otherwise, the wide indices prevent the transformation from
happening.

Differential Revision: https://reviews.llvm.org/D50762

llvm-svn: 339871
2018-08-16 12:24:40 +00:00
Sam Parker
0e2f0bd48e [ARM] Allow zext in ARMCodeGenPrepare
Treat zext instructions as roots, like we do for truncs.

Differential Revision: https://reviews.llvm.org/D50759

llvm-svn: 339868
2018-08-16 11:54:09 +00:00
Sam Parker
13567dbbd8 [ARM] Allow signed icmps in ARMCodeGenPrepare
Originally committed in r339755 which was reverted in r339806 due to
an asan issue. The issue was caused by my assumption that operands to
a CallInst mapped to the FunctionType Params. CallInsts are now
handled by iterating over their ArgOperands instead of Operands.
    
Original Message:
  Treat signed icmps as 'sinks', allowing them to be in the use-def
  tree, enabling more promotions to be performed. As a sink, any
  promoted incoming values need to be truncated before being used by
  the signed icmp.
    
  Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339858
2018-08-16 10:05:39 +00:00
Vitaly Buka
ed4239f482 Revert "[ARM] Allow signed icmps in ARMCodeGenPrepare"
use-after-poison in check-llvm under asan

This reverts commit r339755.

llvm-svn: 339806
2018-08-15 20:09:35 +00:00
Sam Parker
fabf7fe5f8 [ARM] TypeSize lower bound for ARMCodeGenPrepare
We only try to promote types with are smaller than 16-bits, but we
also need to check that the type is not less than 8-bits.

Differential Revision: https://reviews.llvm.org/D50769

llvm-svn: 339770
2018-08-15 13:29:50 +00:00
Sam Parker
6548cd3905 [ARM] Allow signed icmps in ARMCodeGenPrepare
Treat signed icmps as 'sinks', allowing them to be in the use-def
tree, enabling more promotions to be performed. As a sink, any
promoted incoming values need to be truncated before being used by
the signed icmp.

Differential Revision: https://reviews.llvm.org/D50067

llvm-svn: 339755
2018-08-15 08:23:03 +00:00
Sam Parker
7def86bbdb [ARM] Allow pointer values in ARMCodeGenPrepare
Add pointers to the list of allowed types, but don't try to promote
them. Also fixed a bug with the promotion of undef values, so a new
value is now created instead of mutating in place. We also now only
promote if there's an instruction in the use-def chains other than
the icmp, sinks and sources.

Differential Revision: https://reviews.llvm.org/D50054

llvm-svn: 339754
2018-08-15 07:52:35 +00:00
Luke Geeson
4ce41d2bb7 [ARM] Added FP16 VREV Vector Instrinsic CodeGen support
llvm-svn: 339546
2018-08-13 08:37:41 +00:00
Eli Friedman
e1687a89e8 [ARM] Adjust AND immediates to make them cheaper to select.
LLVM normally prefers to minimize the number of bits set in an AND
immediate, but that doesn't always match the available ARM instructions.
In Thumb1 mode, prefer uxtb or uxth where possible; otherwise, prefer
a two-instruction sequence movs+ands or movs+bics.

Some potential improvements outlined in
ARMTargetLowering::targetShrinkDemandedConstant, but seems to work
pretty well already.

The ARMISelDAGToDAG fix ensures we don't generate an invalid UBFX
instruction due to a larger-than-expected mask. (It's orthogonal, in
some sense, but as far as I can tell it's either impossible or nearly
impossible to reproduce the bug without this change.)

According to my testing, this seems to consistently improve codesize by
a small amount by forming bic more often for ISD::AND with an immediate.

Differential Revision: https://reviews.llvm.org/D50030

llvm-svn: 339472
2018-08-10 21:21:53 +00:00
Sam Parker
8c4b964c5a [ARM] Disallow zexts in ARMCodeGenPrepare
Enabling ARMCodeGenPrepare by default caused a whole load of
failures. This is due to zexts and truncs not being handled properly.
ZExts are messy so it's just easier to disable for now and truncs
are allowed only as 'sinks'. I still need to figure out why allowing
them as 'sources' causes so many failures. The other main changes are
that we are explicit in the types that we converting to, it's now
always 'TypeSize'. Type support is also now performed while checking
for valid opcodes as it unnecessarily complicated having the checks
are different stages.
    
I've moved the tests around too, so we have the zext and truncs in
their own file as well as the overflowing opcode tests.

Differential Revision: https://reviews.llvm.org/D50518

llvm-svn: 339432
2018-08-10 13:57:13 +00:00
Evandro Menezes
9a92fe0c9e [ARM] Replace processor check with feature
Add new feature, `FeatureUseWideStrideVFP`, that replaces the need for a
processor check.  Otherwise, NFC.

llvm-svn: 339354
2018-08-09 16:13:24 +00:00
Sjoerd Meijer
806f70d229 [ARM] FP16: codegen support for VTRN
Differential Revision: https://reviews.llvm.org/D50454

llvm-svn: 339340
2018-08-09 12:45:09 +00:00
Petr Hosek
7b27454477 [ADT] Normalize empty triple components
LLVM triple normalization is handling "unknown" and empty components
differently; for example given "x86_64-unknown-linux-gnu" and
"x86_64-linux-gnu" which should be equivalent, triple normalization
returns "x86_64-unknown-linux-gnu" and "x86_64--linux-gnu". autoconf's
config.sub returns "x86_64-unknown-linux-gnu" for both
"x86_64-linux-gnu" and "x86_64-unknown-linux-gnu". This changes the
triple normalization to behave the same way, replacing empty triple
components with "unknown".

This addresses PR37129.

Differential Revision: https://reviews.llvm.org/D50219

llvm-svn: 339294
2018-08-08 22:23:57 +00:00
Eli Friedman
5b45a39056 [ARM] Avoid spilling lr with Thumb1 tail calls.
Normally, if any registers are spilled, we prefer to spill lr on Thumb1
so we can fold the "bx lr" into the "pop".  However, if there are tail
calls involved, restoring lr is expensive, so skip the optimization in
that case.

The spill of r7 in the new test also isn't necessary, but that's
mostly orthogonal to this patch. (It's the same code in
ARMFrameLowering, but it's not related to tail calls.)

Differential Revision: https://reviews.llvm.org/D49459

llvm-svn: 339283
2018-08-08 20:03:10 +00:00
Ties Stuij
0244aa67d6 revert tests of '[CodeGen] emit inline asm clobber list warnings for reserved'
llvm-svn: 339276
2018-08-08 17:19:32 +00:00
Ties Stuij
52f3631f4b [CodeGen] emit inline asm clobber list warnings for reserved
Summary:
Currently, in line with GCC, when specifying reserved registers like sp or pc on an inline asm() clobber list, we don't always preserve the original value across the statement. And in general, overwriting reserved registers can have surprising results.

For example:


```
extern int bar(int[]);

int foo(int i) {
  int a[i]; // VLA
  asm volatile(
      "mov r7, #1"
    :
    :
    : "r7"
  );

  return 1 + bar(a);
}
```

Compiled for thumb, this gives:
```
$ clang --target=arm-arm-none-eabi -march=armv7a -c test.c -o - -S -O1 -mthumb
...
foo:
        .fnstart
@ %bb.0:                                @ %entry
        .save   {r4, r5, r6, r7, lr}
        push    {r4, r5, r6, r7, lr}
        .setfp  r7, sp, #12
        add     r7, sp, #12
        .pad    #4
        sub     sp, #4
        movs    r1, #7
        add.w   r0, r1, r0, lsl #2
        bic     r0, r0, #7
        sub.w   r0, sp, r0
        mov     sp, r0
        @APP
        mov.w   r7, #1
        @NO_APP
        bl      bar
        adds    r0, #1
        sub.w   r4, r7, #12
        mov     sp, r4
        pop     {r4, r5, r6, r7, pc}
...
```

r7 is used as the frame pointer for thumb targets, and this function needs to restore the SP from the FP because of the variable-length stack allocation a. r7 is clobbered by the inline assembly (and r7 is included in the clobber list), but LLVM does not preserve the value of the frame pointer across the assembly block.

This type of behavior is similar to GCC's and has been discussed on the bugtracker: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=11807 . No consensus seemed to have been reached on the way forward.  Clang behavior has briefly been discussed on the CFE mailing (starting here: http://lists.llvm.org/pipermail/cfe-dev/2018-July/058392.html). I've opted for following Eli Friedman's advice to print warnings when there are reserved registers on the clobber list so as not to diverge from GCC behavior for now.

The patch uses MachineRegisterInfo's target-specific knowledge of reserved registers, just before we convert the inline asm string in the AsmPrinter.

If we find a reserved register, we print a warning:
```
repro.c:6:7: warning: inline asm clobber list contains reserved registers: R7 [-Winline-asm]
      "mov r7, #1"
      ^
```

Reviewers: eli.friedman, olista01, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, eraman, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D49727

llvm-svn: 339257
2018-08-08 15:15:59 +00:00
Sjoerd Meijer
1919ecfd0b [ARM][NFC] Replaced tab-characters in test file vtrn.ll
llvm-svn: 339251
2018-08-08 14:42:11 +00:00
Sjoerd Meijer
f8c394f0f5 [ARM] FP16: codegen support for VEXT
Differential Revision: https://reviews.llvm.org/D50427

llvm-svn: 339241
2018-08-08 13:26:38 +00:00
Sjoerd Meijer
db5908deb9 [ARM] FP16: vector vmov and vdup support
This adds codegen support for the vmov_n_f16 and vdup_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50329

llvm-svn: 339238
2018-08-08 13:11:31 +00:00
Sjoerd Meijer
920a453485 [ARM] FP16: vector VMUL variants
This adds codegen support for the vmul_lane_f16 and vmul_n_f16 variants.

Differential Revision: https://reviews.llvm.org/D50326

llvm-svn: 339232
2018-08-08 10:27:34 +00:00
Sjoerd Meijer
b33a4c02cc [ARM] FP16: support vector INT_TO_FP and FP_TO_INT
This adds codegen support for the different vcvt_f16 variants.

Differential Revision: https://reviews.llvm.org/D50393

llvm-svn: 339227
2018-08-08 09:45:34 +00:00
Thomas Preud'homme
4107b31df2 Support inline asm with multiple 64bit output in 32bit GPR
Summary: Extend fix for PR34170 to support inline assembly with multiple output operands that do not naturally go in the register class it is constrained to (eg. double in a 32-bit GPR as in the PR).

Reviewers: bogner, t.p.northover, lattner, javed.absar, efriedma

Reviewed By: efriedma

Subscribers: efriedma, tra, eraman, javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D45437

llvm-svn: 339225
2018-08-08 09:35:26 +00:00
Sjoerd Meijer
b264944ed5 [ARM] FP16: support the vector vmin and vmax variants
Differential Revision: https://reviews.llvm.org/D50238

llvm-svn: 339221
2018-08-08 07:20:15 +00:00
Sjoerd Meijer
b39cd886b9 [ARM] FP16: codegen support for VACGT
Differential Revision: https://reviews.llvm.org/D50236

llvm-svn: 339148
2018-08-07 15:11:47 +00:00
Sjoerd Meijer
a2ddddfd3e [ARM][NFC] Replaced tab characters in test file vfcmp.ll.
llvm-svn: 339111
2018-08-07 08:05:15 +00:00
Sjoerd Meijer
d62c5ec2fe [ARM] FP16: support vector zip and unzip
This is addressing PR38404.

Differential Revision: https://reviews.llvm.org/D50186

llvm-svn: 338835
2018-08-03 09:24:29 +00:00
Sjoerd Meijer
9b30213828 [ARM] FP16: support VFMA
This is addressing PR38404.

llvm-svn: 338830
2018-08-03 09:12:56 +00:00
Sjoerd Meijer
8e7fab0443 [ARM][NFC] Follow up of r338568
I disabled more tests than necessary, this enables them.

llvm-svn: 338717
2018-08-02 14:04:48 +00:00
Alexander Ivchenko
49168f6778 [GlobalISel] Rewrite CallLowering::lowerReturn to accept multiple VRegs per Value
This is logical continuation of https://reviews.llvm.org/D46018 (r332449)

Differential Revision: https://reviews.llvm.org/D49660

llvm-svn: 338685
2018-08-02 08:33:31 +00:00
Sjoerd Meijer
590e4e8dde [ARM] Armv8.2-A FP16 vector intrinsics tests
Clang support for the Armv8.2-A FP16 vector intrinsic was committed in
rC328277, but this was never followed up, i.e. the LLVM part is missing.

I've raised PR38404, and this is the first step to address this. I.e.,
this adds tests for the Armv8.2-A FP16 vector intrinsic, and thus shows
which intrinsics already work, and which need further work.

Differential Revision: https://reviews.llvm.org/D50142

llvm-svn: 338568
2018-08-01 14:43:59 +00:00