Commit Graph

21087 Commits

Author SHA1 Message Date
Taewook Oh
424c88c381 Move unit test to the proper location
Summary: Move test/CodeGen/AArch64/reg-bank-128bit.mir to test/CodeGen/AArch64/GlobalISel/reg-bank-128bit.mir so that the test is executed only when global-isel is enabled. lit.local.cfg under test/CodeGen/AArch64/GlobalISel checks if 'global-isel' is in the available_features while the same file under test/CodeGen/AArch64 doesn't.

Reviewers: qcolombet, davide

Reviewed By: davide

Subscribers: davide, aemerson, javed.absar, igorb, kristof.beyls, llvm-commits

Differential Revision: https://reviews.llvm.org/D36282

llvm-svn: 309985
2017-08-03 21:07:12 +00:00
Connor Abbott
82267a553d test commit
llvm-svn: 309981
2017-08-03 20:22:30 +00:00
Simon Pilgrim
99d0ab38bc [X86] Adding a test for vector shuffle extractions.
When both the vector inputs of the shuffle vector is comprising of same vector or shuffle mask is accessing elements from only one operand vector (like in PR33758 test already present).

Committed on behalf of @jbhateja (Jatin Bhateja)

Differential Revision: https://reviews.llvm.org/D36271

llvm-svn: 309963
2017-08-03 17:04:59 +00:00
Simon Pilgrim
0dfd4fa8d3 [X86][AVX512] Tidied up v64i8 vector shuffle tests with triple
llvm-svn: 309961
2017-08-03 16:56:52 +00:00
Changpeng Fang
ef4dbb46da AMDGPU/SI: Don't fix a PHI under uniform branch in SIFixSGPRCopies only when sources and destination are all sgprs
Summary:
  If a PHI has at lease one VGPR operand, we have to fix the PHI
in SIFixSGPRCopies.

Reviewer:
  Matt

Differential Revision:
  http://reviews.llvm.org/D34727

llvm-svn: 309959
2017-08-03 16:37:02 +00:00
Nirav Dave
3fc1c2365c [DAG] Allow merging of stores of vector loads
Remove restriction disallowing merging of stores vector loads into
larger store of larger vector load.

Reviewers: RKSimon, efriedma, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36158

llvm-svn: 309951
2017-08-03 15:51:20 +00:00
Nico Weber
bfde70b097 Revert r309923, it caused PR34045.
llvm-svn: 309950
2017-08-03 15:41:26 +00:00
Simon Dardis
51296593a8 [SelectionDAG] Resolve PR33978.
rL306209 taught SelectionDAG how to add the dereferenceable flag when
expanding memcpy and memmove. The fix however contained a nit where
the offset + size was constructed as an APInt of PointerSize rather
than PointerSizeInBits.

This lead to isDereferenceableAndAlignedPointer() get truncated values or
values which would be sign extended within that function leading to
incorrect results.

Thanks to Alex Crichton for reporting the issue!

This resolves PR33978.

Reviewers: inouehrs

Differential Revision: https://reviews.llvm.org/D36236

llvm-svn: 309930
2017-08-03 09:38:46 +00:00
Diana Picus
930e6ec8f3 [ARM] GlobalISel: Select simple G_GLOBAL_VALUE instructions
Add support in the instruction selector for G_GLOBAL_VALUE for ELF and
MachO for the static relocation model. We don't handle Windows yet
because that's Thumb-only, and we don't handle Thumb in general at the
moment.

Support for PIC, ROPI, RWPI and TLS will be added in subsequent commits.

Differential Revision: https://reviews.llvm.org/D35883

llvm-svn: 309927
2017-08-03 09:14:59 +00:00
Dinar Temirbulatov
a0beedef1c [X86] SET0 to use XMM registers where possible PR26018 PR32862
Differential Revision: https://reviews.llvm.org/D35965

llvm-svn: 309926
2017-08-03 08:50:18 +00:00
Roger Ferrer Ibanez
854980341b [ARM] Use ADDCARRY / SUBCARRY
This patch:

- makes nodes ISD::ADDCARRY and ISD::SUBCARRY legal for i32
- lowering is done by first converting the boolean value into the carry flag
  using (_, C) <- (ARMISD::ADDC R, -1) and converted back to an integer value
  using (R, _) <- (ARMISD::ADDE 0, 0, C). An ARMISD::ADDE between the two
  operations does the actual addition.
- for subtraction, given that ISD::SUBCARRY second result is actually a
  borrow, we need to invert the value of the second operand and result before
  and after using ARMISD::SUBE. We need to invert the carry result of
  ARMISD::SUBE to preserve the semantics.
- given that the generic combiner may lower ISD::ADDCARRY and
  ISD::SUBCARRY into ISD::UADDO and ISD::USUBO we need to update their lowering
  as well otherwise i64 operations now would require branches. This implies
  updating the corresponding test for unsigned.
- add new combiner to remove the redundant conversions from/to carry flags
  to/from boolean values (ARMISD::ADDC (ARMISD::ADDE 0, 0, C), -1) -> C

Differential Revision: https://reviews.llvm.org/D35192

llvm-svn: 309923
2017-08-03 07:45:10 +00:00
Rafael Espindola
79e238afee Delete Default and JITDefault code models
IMHO it is an antipattern to have a enum value that is Default.

At any given piece of code it is not clear if we have to handle
Default or if has already been mapped to a concrete value. In this
case in particular, only the target can do the mapping and it is nice
to make sure it is always done.

This deletes the two default enum values of CodeModel and uses an
explicit Optional<CodeModel> when it is possible that it is
unspecified.

llvm-svn: 309911
2017-08-03 02:16:21 +00:00
Tom Stellard
3337d74399 AMDGPU/GlobalISel: Mark 32-bit G_FMUL as legal
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D36218

llvm-svn: 309898
2017-08-02 22:56:30 +00:00
Stefan Pintilie
873889ca16 [Power9] Exploit vector absolute difference instructions on Power 9
Power 9 has instructions to do absolute difference (VABSDUB, VABSDUH, VABSDUW)
for byte, halfword and word. We should take advantage of these.

Differential Revision: https://reviews.llvm.org/D34684

llvm-svn: 309876
2017-08-02 20:07:21 +00:00
Evandro Menezes
3840db527b [AArch64] Add Exynos M2 feature test (NFC)
Test fusion of AES operations.

llvm-svn: 309855
2017-08-02 18:55:34 +00:00
Evandro Menezes
89a67b8353 [AArch64] Improve the test of conditional branch fusion
Separate the checking of the fused pairings with B.cc and CBcc.

llvm-svn: 309825
2017-08-02 15:34:06 +00:00
Diana Picus
d5a00b0ff6 [MIR] Print target-specific constant pools
This should enable us to test the generation of target-specific constant
pools, e.g. for ARM:

constants:
 - id:              0
   value:           'g(GOT_PREL)-(LPC0+8-.)'
   alignment:       4
   isTargetSpecific: true

I intend to use this to test PIC support in GlobalISel for ARM.

This is difficult to test outside of that context, since the existing
MIR tests usually rely on parser support as well, and that seems a bit
trickier to add. We could try to add a unit test, but the setup for that
seems rather convoluted and overkill.

We do test however that the parser reports a nice error when
encountering a target-specific constant pool.

Differential Revision: https://reviews.llvm.org/D36092

llvm-svn: 309806
2017-08-02 11:09:30 +00:00
Matt Arsenault
8e8f8f43b0 AMDGPU: Fix clobbering CSR VGPRs when spilling SGPR to it
llvm-svn: 309783
2017-08-02 01:52:45 +00:00
Matt Arsenault
1d6317c3ad AMDGPU: Fix emitting encoded calls
This was failing on out of bounds access to the extra operands
on the s_swappc_b64 beyond those in the instruction definition.

This was working, but somehow regressed within the past few weeks,
although I don't see any obvious commit.

llvm-svn: 309782
2017-08-02 01:42:04 +00:00
Matt Arsenault
6ed7b9bfc0 AMDGPU: Analyze callee resource usage in AsmPrinter
llvm-svn: 309781
2017-08-02 01:31:28 +00:00
Matt Arsenault
d1867c0345 AMDGPU: Don't place arguments in emergency stack slot
When finding the fixed offsets for function arguments,
this needs to skip over the 4 bytes reserved for the
emergency stack slot.

llvm-svn: 309776
2017-08-02 00:59:51 +00:00
Matt Arsenault
acc5e82b0e DAG: Undo and->or combine with FrameIndexes
This pattern shows up when lowering byval copies on AMDGPU.

The byval object access is split into 4-byte chunks, adding a
constant offset to the FixedStack base. When some of the offsets
turn into ors, this prevents combining the constant offsets.

This makes it not apparent that the object is there when matching
addressing modes, so it ends up using a scratch wave offset
relative access and the lengthy frame index expansion for that.

llvm-svn: 309775
2017-08-02 00:43:42 +00:00
Matthias Braun
6b898beb8e X86: Do not use llc -march in tests.
`llc -march` is problematic because it only switches the target
architecture, but leaves the operating system unchanged. This
occasionally leads to indeterministic tests because the OS from
LLVM_DEFAULT_TARGET_TRIPLE is used.

However we can simply always use `llc -mtriple` instead. This changes
all the tests to do this to avoid people using -march when they copy and
paste parts of tests.

See also the discussion in https://reviews.llvm.org/D35287

llvm-svn: 309774
2017-08-02 00:28:10 +00:00
Stanislav Mekhanoshin
da0edef1bd [AMDGPU] Turn s_and_saveexec_b64 into s_and_b64 if result is unused
With SI_END_CF elimination for some nested control flow we can now
eliminate saved exec register completely by turning a saveexec version
of instruction into just a logical instruction.

Differential Revision: https://reviews.llvm.org/D36007

llvm-svn: 309766
2017-08-01 23:44:35 +00:00
Stanislav Mekhanoshin
37e7f959c0 [AMDGPU] Collapse adjacent SI_END_CF
Add a pass to remove redundant S_OR_B64 instructions enabling lanes in
the exec. If two SI_END_CF (lowered as S_OR_B64) come together without any
vector instructions between them we can only keep outer SI_END_CF, given
that CFG is structured and exec bits of the outer end statement are always
not less than exec bit of the inner one.

This needs to be done before the RA to eliminate saved exec bits registers
but after register coalescer to have no vector registers copies in between
of different end cf statements.

Differential Revision: https://reviews.llvm.org/D35967

llvm-svn: 309762
2017-08-01 23:14:32 +00:00
Matthias Braun
7c91336efe ARM: Do not use llc -march in tests.
`llc -march` is problematic because it only switches the target
architecture, but leaves the operating system unchanged. This
occasionally leads to indeterministic tests because the OS from
LLVM_DEFAULT_TARGET_TRIPLE is used.

However we can simply always use `llc -mtriple` instead. This changes
all the tests to do this to avoid people using -march when they copy and
paste parts of tests.

See also the discussion in https://reviews.llvm.org/D35287

llvm-svn: 309755
2017-08-01 22:20:49 +00:00
Matthias Braun
e2d2ce9ff1 PowerPC: Do not use llc -march in tests.
`llc -march` is problematic because it only switches the target
architecture, but leaves the operating system unchanged. This
occasionally leads to indeterministic tests because the OS from
LLVM_DEFAULT_TARGET_TRIPLE is used.

However we can simply always use `llc -mtriple` instead. This changes
all the tests to do this to avoid people using -march when they copy and
paste parts of tests.

This patch:
- Removes -march if the .ll file already has a matching `target triple`
  directive or -mtriple argument.
- In all other cases changes -march=ppc32/-march=ppc64 to
  -mtriple=ppc32--/-mtriple=ppc64--

See also the discussion in https://reviews.llvm.org/D35287

llvm-svn: 309754
2017-08-01 22:20:41 +00:00
Adrian Prantl
032d2381bf Remove PrologEpilogInserter's usage of DBG_VALUE's offset field
In the last half-dozen commits to LLVM I removed code that became dead
after removing the offset parameter from llvm.dbg.value gradually
proceeding from IR towards the backend. Before I can move on to
DwarfDebug and friends there is one last side-called offset I need to
remove:  This patch modifies PrologEpilogInserter's use of the
DBG_VALUE's offset argument to use a DIExpression instead. Because the
PrologEpilogInserter runs at the Machine level I had to play a little
trick with a named llvm.dbg.mir node to get the DIExpressions to print
in MIR dumps (which print the llvm::Module followed by the
MachineFunction dump).

I also had to add rudimentary DwarfExpression support to CodeView and
as a side-effect also fixed a bug (CodeViewDebug::collectVariableInfo
was supposed to give up on variables with complex DIExpressions, but
would fail to do so for fragments, which are also modeled as
DIExpressions).

With this last holdover removed we will have only one canonical way of
representing offsets to debug locations which will simplify the code
in DwarfDebug (and future versions of CodeViewDebug once it starts
handling more complex expressions) and make it easier to reason about.

This patch is NFC-ish: All test case changes are for assembler
comments and the binary output does not change.

rdar://problem/33580047
Differential Revision: https://reviews.llvm.org/D36125

llvm-svn: 309751
2017-08-01 21:45:24 +00:00
Martin Storsjo
eacf4e408b [AArch64] Rewrite stack frame handling for win64 vararg functions
The previous attempt, which made do with a single offset in
computeCalleeSaveRegisterPairs, wasn't quite enough. The previous
attempt only worked as long as CombineSPBump == true (since the
offset would be adjusted later in fixupCalleeSaveRestoreStackOffset).

Instead include the size for the fixed stack area used for win64
varargs in calculations in emitPrologue/emitEpilogue. The stack
consists of mainly three parts;
- AFI->getLocalStackSize()
- AFI->getCalleeSavedStackSize()
- FixedObject

Most of the places in the code which previously used the CSStackSize
now use PrologueSaveSize instead, which is the sum of the latter
two, while some cases which need exactly the middle one use
AFI->getCalleeSavedStackSize() explicitly instead of a local variable.

In addition to moving the offsetting into emitPrologue/emitEpilogue
(which fixes functions with CombineSPBump == false), also set the
frame pointer to point to the right location, where the frame pointer
and link register actually are stored. In addition to the prologue/epilogue,
this also requires changes to resolveFrameIndexReference.

Add tests for a function that keeps a frame pointer and another one
that uses a VLA.

Differential Revision: https://reviews.llvm.org/D35919

llvm-svn: 309744
2017-08-01 21:13:54 +00:00
Matt Arsenault
206f826348 AMDGPU: Fix handling of div_scale with undef inputs
The src0 register must match src1 or src2, but if these
were undefined they could end up using different implicit_defed
virtual registers. Force these to use one undef vreg or pick the
defined other register.

Also fixes producing invalid nodes without the right number of
inputs when src2 is undef.

llvm-svn: 309743
2017-08-01 20:49:41 +00:00
Matt Arsenault
a5fcb83bfa AMDGPU: Add test for r308774
llvm-svn: 309733
2017-08-01 19:54:58 +00:00
Matt Arsenault
b62a4eb524 AMDGPU: Initial implementation of calls
Includes a hack to fix the type selected for
the GlobalAddress of the function, which will be
fixed by changing the default datalayout to use
generic pointers for 0.

llvm-svn: 309732
2017-08-01 19:54:18 +00:00
Simon Pilgrim
f0a5d09b8d [X86][SSE3] Add scheduler tests for MONITOR/MWAIT
llvm-svn: 309718
2017-08-01 18:16:44 +00:00
Nirav Dave
27a6605bdc Revert "[DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector."
This reverts commit r309680 which appears to be raising an assertion
in the test-suite.

llvm-svn: 309717
2017-08-01 18:09:25 +00:00
Simon Pilgrim
486072d3d6 [X86][SSE] Added missing vector logic intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Merged SSE_VEC_BIT_ITINS_P + SSE_BIT_ITINS_P as we were interchanging between them.

llvm-svn: 309715
2017-08-01 17:51:20 +00:00
Sanjay Patel
4dbdd470a8 [CGP] use narrower types in memcmp expansion when possible
This only affects very small memcmp on x86 for now, but it
will become more important if we allow vector-sized load and
compares.

llvm-svn: 309711
2017-08-01 17:24:54 +00:00
Craig Topper
2a5bba7325 [X86] Use BEXTR/BEXTRI for 64-bit 'and' with a large mask
Summary: The 64-bit 'and' with immediate instruction only supports a 32-bit immediate. So for larger constants we have to load the constant into a register first. If the immediate happens to be a mask we can use the BEXTRI instruction to perform the masking. We already do something similar using the BZHI instruction from the BMI2 instruction set.

Reviewers: RKSimon, spatel

Reviewed By: RKSimon

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D36129

llvm-svn: 309706
2017-08-01 17:18:14 +00:00
Simon Pilgrim
3f24ff6130 [X86][SSE] Added missing PACKSS/PACKUS intrinsic schedules
Improves atom scheduler test coverage (to make it easier to upgrade them for PR32431).

Checked on Agner that these actually match the UNPACK schedules, but better to include a separate class

llvm-svn: 309701
2017-08-01 16:47:48 +00:00
Craig Topper
e2cc589283 [X86] Split bmi.ll into a bmi test and a bmi2 test.
This moves all the bmi2 specific intrinsics to a separate test file and adds a bmi1 only command line to the existing bmi test.

This will allow us to see the missed opportunity to use bextr to handle 64-bit 'and' with a large mask. This will be improved in an upcoming patch.

llvm-svn: 309700
2017-08-01 16:45:11 +00:00
Simon Pilgrim
810677eba2 [X86][SSSE3] Added missing PHADDS/PHSUBS/PSIGN intrinsic schedules
llvm-svn: 309699
2017-08-01 16:18:25 +00:00
Manoj Gupta
7ebbe655ed [X86] Fix a crash in FEntryInserter Pass.
Summary:
FEntryInserter pass unconditionally derefs the first Instruction
in the first Basic Block. The pass crashes when the first
BasicBlock is empty. Fix the crash by not dereferencing the basic
Block iterator. This fixes an issue observed when building Linux kernel
4.4 with clang.

Fixes PR33971.

Reviewers: hfinkel, niravd, dblaikie

Reviewed By: niravd

Subscribers: davide, llvm-commits

Differential Revision: https://reviews.llvm.org/D35979

llvm-svn: 309694
2017-08-01 15:39:12 +00:00
Craig Topper
2462a713ae [AVX-512] Don't use unmasked VMOVDQU8/16 for 8-bit or 16-bit element stores even when BWI instructions are supported. Always use VMOVDQA32/VMOVDQU32.
We were already using the 32 bit element opcode if BWI isn't enabled, but there's no reason to change opcode if we have BWI. We will still use the 8/16 opcodes for masked stores though.

This allows us to use the aligned opcode when we can which makes our test output more consistent between different modes. It also reduces the number of isel patterns we need.

This is a slight inconsistency with loads which default to 64 bit element opcodes. I'll probably rectify that in a future patch.

Differential Revision: https://reviews.llvm.org/D35978

llvm-svn: 309693
2017-08-01 15:31:24 +00:00
Simon Pilgrim
78b305f8d5 [X86][SSSE3] Fix typos in pabsw/pmulhrsw tests for load folding scheduling.
llvm-svn: 309692
2017-08-01 15:31:24 +00:00
Simon Pilgrim
8484698321 [X86] Added missing cpu to fix generic scheduling model tests
llvm-svn: 309691
2017-08-01 15:14:35 +00:00
Nirav Dave
b5cb48c6ae [DAG] Extend visitSCALAR_TO_VECTOR optimization to truncated vector.
Summary:
Allow SCALAR_TO_VECTOR of EXTRACT_VECTOR_ELT to reduce to
EXTRACT_SUBVECTOR of vector shuffle when output is smaller. Marginally
improves vector shuffle computations.

Reviewers: efriedma, RKSimon, spatel

Subscribers: javed.absar, llvm-commits

Differential Revision: https://reviews.llvm.org/D35566

llvm-svn: 309680
2017-08-01 13:45:35 +00:00
Strahinja Petrovic
a2b4748bdc [Mips] Fix for BBIT octeon instruction
This patch enables control flow optimization for
variations of BBIT instruction. In this case
optimization removes unnecessary branch after
BBIT instruction.

Differential Revision: https://reviews.llvm.org/D35359

llvm-svn: 309679
2017-08-01 13:42:45 +00:00
Krzysztof Parzyszek
91ff5c6d47 [Hexagon] Convert HVX vector constants of i1 to i8
Certain operations require vector of i1 values. However, for Hexagon
architecture compatibility, they need to be represented as vector of i8.

Patch by Suyog Sarda.

llvm-svn: 309677
2017-08-01 13:12:53 +00:00
Simon Pilgrim
4e0b41450d [X86] Regenerate big structure return test and check on x86_64 as well.
llvm-svn: 309676
2017-08-01 13:12:15 +00:00
Tom Stellard
9d8337d857 AMDGPU/GlobalISel: Add support for amdgpu_vs calling convention
Reviewers: arsenm

Reviewed By: arsenm

Subscribers: kzhuravl, wdng, nhaehnle, yaxunl, rovka, kristof.beyls, igorb, dstuttard, tpr, llvm-commits, t-tye

Differential Revision: https://reviews.llvm.org/D35916

llvm-svn: 309675
2017-08-01 12:38:33 +00:00
Andrew V. Tischenko
d56595184b Support itineraries in TargetSubtargetInfo::getSchedInfoStr - Now if the given instr does not have sched model then we try to calculate the latecy/throughput with help of itineraries.
Differential Revision https://reviews.llvm.org/D35997

llvm-svn: 309666
2017-08-01 09:15:43 +00:00