Commit Graph

15100 Commits

Author SHA1 Message Date
Justin Lebar
b5ca00a58d [NVPTX] Use different, convergent MIs for convergent calls.
Summary:
Calls sometimes need to be convergent.  This is already handled at the
LLVM IR level, but it also needs to be handled at the MI level.

Ideally we'd propagate convergence from instructions, down through the
selection DAG, and into MIs.  But this is Hard, and would affect
optimizations in the SDNs -- right now only SDNs with two operands have
any flags at all.

Instead, here's a much simpler hack: Add new opcodes for NVPTX for
convergent calls, and generate these when lowering convergent LLVM
calls.

Reviewers: jholewinski

Subscribers: jholewinski, chandlerc, joker.eph, jhen, tra, llvm-commits

Differential Revision: http://reviews.llvm.org/D17423

llvm-svn: 262373
2016-03-01 19:24:03 +00:00
David Majnemer
791b88b6da [X86] Elide references to _chkstk for dynamic allocas
The _chkstk function is called by the compiler to probe the stack in an
order consistent with Windows' expectations.  However, it is possible to
elide the call to _chkstk and manually adjust the stack pointer if we
can prove that the allocation is fixed size and smaller than the probe
size.

This shrinks chrome.dll, chrome_child.dll and chrome.exe by a
cummulative ~133 KB.

Differential Revision: http://reviews.llvm.org/D17679

llvm-svn: 262370
2016-03-01 19:20:23 +00:00
Matt Arsenault
03dac8d8e4 DAGCombiner: Turn extract of bitcasted integer into truncate
This reduces the number of bitcast nodes and generally cleans up the
DAG when bitcasting between integers and vectors everywhere.

llvm-svn: 262358
2016-03-01 18:01:37 +00:00
Changpeng Fang
24f035af32 AMDGPU/SI: Implement DS_PERMUTE/DS_BPERMUTE Instruction Definitions and Intrinsics
Summary:
  This patch impleemnts DS_PERMUTE/DS_BPERMUTE instruction definitions and intrinsics,
which are new since VI.

Reviewers: tstellarAMD, arsenm

Subscribers: llvm-commits, arsenm

Differential Revision: http://reviews.llvm.org/D17614

llvm-svn: 262356
2016-03-01 17:51:23 +00:00
Michael Zuckerman
433b241570 [LLVM][AVX512] PSRL{DI|QI} Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17713

llvm-svn: 262353
2016-03-01 17:46:32 +00:00
Hans Wennborg
e64cf9dddb [X86] Check that attribute parameters match for tail calls (PR26590)
In the code below on 32-bit targets, x would previously get forwarded to g()
without sign-extension to 32 bits as required by the parameter attribute.

  void g(signed short);
  void f(unsigned short x) {
    g(x);
  }

llvm-svn: 262352
2016-03-01 17:45:23 +00:00
Michael Zuckerman
7878888690 [AVX512][PSRAQ][PSRAD] Change imm8 to int.
Differential Revision: http://reviews.llvm.org/D17692

llvm-svn: 262320
2016-03-01 11:36:23 +00:00
Amjad Aboud
719325fe11 Disallow generating vzeroupper before return instruction (iret) in interrupt handler function.
This resolves https://llvm.org/bugs/show_bug.cgi?id=26412

Differential Revision: http://reviews.llvm.org/D17542

llvm-svn: 262319
2016-03-01 11:32:03 +00:00
Vasileios Kalintiris
3a8f7f9e31 [mips] Promote the result of SETCC nodes to GPR width.
Summary:
This patch modifies the existing comparison, branch, conditional-move
and select patterns, and adds new ones where needed. Also, the updated
SLT{u,i,iu} set of instructions generate a GPR width result.

The majority of the code changes in the Mips back-end fix the wrong
assumption that the result of SETCC nodes always produce an i32 value.
The changes in the common code path account for the fact that in 64-bit
MIPS targets, i1 is promoted to i32 instead of i64.

Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D10970

llvm-svn: 262316
2016-03-01 10:08:01 +00:00
Matt Arsenault
59b8b77405 AMDGPU: Set HasExtractBitInsn
This currently does not have the control over the bitwidth,
and there are missing optimizations to reduce the integer to
32-bit if it can be.

But in most situations we do want the sinking to occur.

llvm-svn: 262296
2016-03-01 04:58:17 +00:00
David Majnemer
cb305dea1c [WinEH] Allocate the registration node before the catch objects
The CatchObjOffset is relative to the end of the EH registration node
for 32-bit x86 WinEH targets.  A special sentinel value, 0, is used to
indicate that no catch object should be initialized.

This means that a catch object allocated immediately before the
registration node would be assigned a CatchObjOffset of 0, leading the
runtime to believe that a catch object should not be initialized.

To handle this, allocate the registration node prior to any other frame
object.  This will ensure that catch objects will not be allocated
before the registration node.

This fixes PR26757.

Differential Revision: http://reviews.llvm.org/D17689

llvm-svn: 262294
2016-03-01 04:30:16 +00:00
Geoff Berry
f5ba61d18c [AArch64] Fix isLegalAddImmediate() to return true for valid negative values.
Reviewers: t.p.northover, jmolloy

Subscribers: mcrosier, aemerson, llvm-commits, rengolin

Differential Revision: http://reviews.llvm.org/D17463

llvm-svn: 262248
2016-02-29 19:53:22 +00:00
David Majnemer
e60ee3b8ce [WinEH] Make setjmp work correctly with EH
32-bit X86 EH on Windows utilizes a stack of registration nodes
allocated and deallocated on entry/exit.  A registration node contains a
bunch of EH personality specific information like which try-state we are
currently in.

Because a setjmp target allows control flow from arbitrary program
points, there is no way to ensure that the try-state we are in is
correctly updated once we transfer control.

MSVC compatible compilers, like MSVC and ICC, utilize runtime helpers to
reinitialize the try-state when a longjmp occurs.  This is implemented
by adding additional arguments to _setjmp3: the desired try-state and
a helper routine to update the try-state.

Differential Revision: http://reviews.llvm.org/D17721

llvm-svn: 262241
2016-02-29 19:16:03 +00:00
Nemanja Ivanovic
1a5706ca1b Fix for PR26180
Corresponds to Phabricator review:
http://reviews.llvm.org/D16592

This fix includes both an update to how we handle the "generic" CPU on LE
systems as well as Anton's fix for the Fast Isel issue.

llvm-svn: 262233
2016-02-29 16:42:27 +00:00
Vasileios Kalintiris
29620aca3e [mips] Do not use SLL for ANY_EXTEND nodes as the high bits are undefined.
Reviewers: dsanders

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D15420

llvm-svn: 262230
2016-02-29 15:58:12 +00:00
Daniel Sanders
611eb82953 [mips] Make isel select the correct DEXT variant up front.
Summary:
Previously, it would always select DEXT and substitute any invalid matches
for DEXTU/DEXTM during MipsMCCodeEmitter::encodeInstruction(). This works
but causes problems when adding range checked immediates to IAS.

Now isel selects the correct variant up front.

Reviewers: vkalintiris

Subscribers: dsanders, llvm-commits

Differential Revision: http://reviews.llvm.org/D16810

llvm-svn: 262229
2016-02-29 15:26:54 +00:00
JF Bastien
3a0814ac1a WebAssembly: fix test
Operand order seems to have changed, the new one is nicer.

llvm-svn: 262180
2016-02-28 15:44:54 +00:00
Michael Zuckerman
96836fc81c [AVX512][PSLLW ][PSLLV] Change imm8 to int
Differential Revision: http://reviews.llvm.org/D17684

llvm-svn: 262176
2016-02-28 07:32:10 +00:00
Matt Arsenault
3a61985b2f AMDGPU: More bits of frame index are known to be zero
The maximum private allocation for the whole GPU is 4G,
so the maximum possible index for a single workitem is the
maximum size divided by the smallest granularity for a dispatch.

This increases the number of known zero high bits, which
enables more offset folding. The maximum private size per
workitem with this is 128M but may be smaller still.

llvm-svn: 262153
2016-02-27 20:26:57 +00:00
Matt Arsenault
982224cfb8 DAGCombiner: Don't unnecessarily swap operands in ReassociateOps
In the case where op = add, y = base_ptr, and x = offset, this
transform:

(op y, (op x, c1)) -> (op (op x, y), c1)

breaks the canonical form of add by putting the base pointer in the
second operand and the offset in the first.

This fix is important for the R600 target, because for some address
spaces the base pointer and the offset are stored in separate register
classes. The old pattern caused the ISel code for matching addressing
modes to put the base pointer and offset in the wrong register classes,
which required no-trivial code transformations to fix.

llvm-svn: 262148
2016-02-27 19:57:45 +00:00
Simon Pilgrim
83e76327e8 [X86][AVX] vpermilvar.pd mask element indices only use bit1
llvm-svn: 262134
2016-02-27 12:51:46 +00:00
Simon Pilgrim
a9a7bf68ee [X86][AVX] Added AVX1 target shuffle combine tests
llvm-svn: 262132
2016-02-27 12:33:08 +00:00
Matt Arsenault
360d244d5b DAGCombiner: Relax sqrt NaN folding check
This is OK for +0 since compares to +/-0 give the same result.

llvm-svn: 262125
2016-02-27 09:38:05 +00:00
Matt Arsenault
274d34e725 AMDGPU: Add s_sleep intrinsic
llvm-svn: 262120
2016-02-27 08:53:52 +00:00
Matt Arsenault
61738cbcb6 AMDGPU: Implement readcyclecounter
This matches the behavior of the HSAIL clock instruction.
s_realmemtime is used if the subtarget supports it, and falls
back to s_memtime if not.

Also introduces new intrinsics for each of s_memtime / s_memrealtime.

llvm-svn: 262119
2016-02-27 08:53:46 +00:00
Cong Hou
e0eb8bfe37 Fix a bug in isVectorReductionOp() in SelectionDAGBuilder.cpp that may cause assertion failure on AArch64.
llvm-svn: 262091
2016-02-26 23:25:30 +00:00
Ahmed Bougacha
0c95decaaa [X86] Move an encoding test from CodeGen to MC. NFC.
llvm-svn: 262089
2016-02-26 23:00:03 +00:00
Ahmed Bougacha
ccf38fd0e2 [X86] Delete old redundant test. NFC.
llvm-svn: 262088
2016-02-26 23:00:00 +00:00
Kit Barton
915c5ecee1 [PPC] Legalize FNEG on PPC when possible
Currently we always expand ISD::FNEG. For v4f32 and v2f64 vector types VSX has
native support for this opcode

Phabricator: http://reviews.llvm.org/D17647
llvm-svn: 262079
2016-02-26 21:59:44 +00:00
Paul Robinson
1d412f6457 Reapply r262054 with triple fix.
llvm-svn: 262069
2016-02-26 21:18:34 +00:00
Paul Robinson
d68c435a5d Revert r262054 on one file that fails sometimes.
llvm-svn: 262060
2016-02-26 20:41:07 +00:00
Paul Robinson
51fa0a87c3 Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT.
FileCheck actually doesn't support combo suffixes.

Differential Revision: http://reviews.llvm.org/D17588

llvm-svn: 262054
2016-02-26 19:40:34 +00:00
Nirav Dave
2993854bb4 Fix Sparc 32bit Lowering to rebundle up v2i32 values.
Summary: Fix LowerCall to rebundle v2i32 values after lowering and add testcase

Reviewers: jyknight

Subscribers: llvm-commits, jyknight

Differential Revision: http://reviews.llvm.org/D17615

llvm-svn: 262048
2016-02-26 18:55:22 +00:00
Sanjay Patel
155193c3aa [x86, AVX] fold 'isPositive' 256-bit vector integer operations (PR26701)
This extends the fold introduced with:
http://reviews.llvm.org/rL262036

llvm-svn: 262047
2016-02-26 18:42:50 +00:00
Sanjay Patel
334685b486 [x86, AVX] add 256-bit tests
llvm-svn: 262044
2016-02-26 18:07:58 +00:00
Sanjay Patel
4402a32b32 [x86, SSE] fold 'isPositive' vector integer operations (PR26701)
This is one of the cases shown in:
https://llvm.org/bugs/show_bug.cgi?id=26701

Shift and negate is what InstCombine appears to prefer, so I've started with that pattern. 
Note that the 'pcmpeq' instructions are always generating the negative one for the actual
'pcmpgt' comparison in each case (side note: why isn't there an alias mnemonic for that?).

Differential Revision: http://reviews.llvm.org/D17630

llvm-svn: 262036
2016-02-26 16:56:03 +00:00
Reid Kleckner
70c9bc71d4 [WinEH] Fix funclet return block clobber mask placement
MBB slot index intervals are half open, not closed. getMBBEndIndex()
returns the slot index of the start of the next block in layout order.
Placing a register mask there is incorrect if the successor of the
funclet return is not laid out after the return. Clang generates IR for
catch bodies before generating the following normal code, so we never
noticed this issue until the D frontend authors filed a bug about it.

Instead, we can put the clobber mask on the last instruction of the
funclet return block. We still aren't using a register mask operand on
the CATCHRET instruction because it would cause PEI to spill all CSRs,
including XMM regs, in the prologue.

Fixes PR26679.

llvm-svn: 262035
2016-02-26 16:53:19 +00:00
Nikolay Haustov
2f684f1347 [AMDGPU] Assembler: Basic support for MIMG
Add parsing and printing of image operands. Matches legacy sp3 assembler.
Change image instruction order to have data/image/sampler operands in the beginning. This is needed because optional operands in MC are always last.
Update SITargetLowering for new order.
Add basic MC test.
Update CodeGen tests.

Review: http://reviews.llvm.org/D17574
llvm-svn: 261995
2016-02-26 09:51:05 +00:00
Simon Pilgrim
cf5352db84 [X86][F16C] Added native IR half/float conversion tests.
Placeholder tests until we start improving native vector support.

llvm-svn: 261989
2016-02-26 08:52:29 +00:00
Matthias Braun
9dcd65f478 MachineCopyPropagation: Catch copies of the form A<-B;A<-B
Differential Revision: http://reviews.llvm.org/D17475

llvm-svn: 261966
2016-02-26 03:18:55 +00:00
Matthias Braun
e39ff70685 MachineCopyPropagation: Keep scanning through instructions with regmasks
This also simplifies the code by removing the overly conservative
NoInterveningSideEffect() function. This function checked:
- That the two copies belong to the same block: We only process one
  block at a time and clear our maps in between it is impossible to find a
  copy from a different block.
- There is no terminator between the two copy instructions: This is not
  allowed anyway (the MachineVerifier would complain)
- Does not have instructions with hasUnmodeledSideEffects() or isCall()
  set: Even for those instructuction we must have all clobbers/defs of
  registers explicit as an operand. If the register is explicitely
  clobbered we would never come to the point of checking for
  NoInterveningSideEffect() anyway.

(I also checked this with a temporary build of the test-suite with all
 potentially failing conditions in NoInterveningSideEffect() turned into
 asserts)

Differential Revision: http://reviews.llvm.org/D17474

llvm-svn: 261965
2016-02-26 03:18:50 +00:00
Sanjay Patel
7ed9361896 [x86, SSE] add tests to show missing pcmp folds
llvm-svn: 261948
2016-02-26 01:14:27 +00:00
David Majnemer
08dd52dc75 [WinEH] Don't remove unannotated inline-asm calls
Inline-asm calls aren't annotated with funclet bundle operands because
they don't throw and cannot be inlined through.  We shouldn't require
them to bear an funclet bundle operand.

llvm-svn: 261942
2016-02-26 00:04:25 +00:00
Simon Pilgrim
e4178ae510 [X86][SSE3] Added combine support for MOVDDUP/MOVSHDUP/MOVSLDUP target shuffles
Now that PerformShuffleCombine can handle unary shuffles.

llvm-svn: 261843
2016-02-25 09:12:12 +00:00
Elena Demikhovsky
e5bbca6ae2 Optimized loading (zextload) of i1 value from memory.
This patch is a partial revert of https://llvm.org/svn/llvm-project/llvm/trunk@237793.
Extra "and" causes performance degradation.

We assume that i1 is stored in zero-extended form. And store operation is responsible for zeroing upper bits.

Differential Revision: http://reviews.llvm.org/D17541

llvm-svn: 261828
2016-02-25 07:05:12 +00:00
Junmo Park
161dc1c605 [CodeGenPrepare] Remove load-based heuristic
Summary:
Both the hardware and LLVM have changed since 2012.
Now, load-based heuristic don't show big differences any more on OoO cores.

There is no notable regressons and improvements on spec2000/2006. (Cortex-A57, Core i5).

Reviewers: spatel, zansari
    
Differential Revision: http://reviews.llvm.org/D16836

llvm-svn: 261809
2016-02-25 00:23:27 +00:00
Cong Hou
ce16649d49 Move test/CodeGen/Generic/pr26652.ll to test/CodeGen/X86/pr26652.ll and test it only on X86.
llvm-svn: 261807
2016-02-25 00:12:18 +00:00
Cong Hou
4ce0280a41 Detecte vector reduction operations just before instruction selection.
(This is the second attemp to commit this patch, after fixing pr26652 & pr26653).

This patch detects vector reductions before instruction selection. Vector
reductions are vectorized reduction operations, and for such operations we have
freedom to reorganize the elements of the result as long as the reduction of them
stay unchanged. This will enable some reduction pattern recognition during
instruction combine such as SAD/dot-product on X86. A flag is added to
SDNodeFlags to mark those vector reduction nodes to be checked during instruction
combine.

To detect those vector reductions, we search def-use chains starting from the
given instruction, and check if all uses fall into two categories:

1. Reduction with another vector.
2. Reduction on all elements.

in which 2 is detected by recognizing the pattern that the loop vectorizer
generates to reduce all elements in the vector outside of the loop, which
includes several ShuffleVector and one ExtractElement instructions.


Differential revision: http://reviews.llvm.org/D15250

llvm-svn: 261804
2016-02-24 23:40:36 +00:00
Matthias Braun
aca625a4fe MachineInstr: Respect register aliases in clearRegiserKills()
This fixes bugs in copy elimination code in llvm. It slightly changes the
semantics of clearRegisterKills(). This is appropriate because:
- Users in lib/CodeGen/MachineCopyPropagation.cpp and
  lib/Target/AArch64RedundantCopyElimination.cpp and
  lib/Target/SystemZ/SystemZElimCompare.cpp are incorrect without it
  (see included testcase).
- All other users in llvm are unaffected (they pass TRI==nullptr)
- (Kill flags are optional anyway so removing too many shouldn't hurt.)

Differential Revision: http://reviews.llvm.org/D17554

llvm-svn: 261763
2016-02-24 19:21:48 +00:00
Simon Pilgrim
cd25a2bef0 [X86][SSSE3] Added target shuffle combine tests for SSE3/SSSE3 specific shuffles.
Allows us to test SSSE3 PSHUFB intrinsic.

llvm-svn: 261753
2016-02-24 17:08:59 +00:00