Commit Graph

214 Commits

Author SHA1 Message Date
Alexey Bataev
e951e5eb7b [SLP] Add a new test for tree vectorization starting from insertelement
instruction.

llvm-svn: 288148
2016-11-29 15:37:52 +00:00
Alexey Bataev
4fa063ebc9 [SLPVectorizer] Improved support of partial tree vectorization.
Currently SLP vectorizer tries to vectorize a binary operation and dies
immediately after unsuccessful the first unsuccessfull attempt. Patch
tries to improve the situation, trying to vectorize all binary
operations of all children nodes in the binop tree.

Differential Revision: https://reviews.llvm.org/D25517

llvm-svn: 288115
2016-11-29 08:21:14 +00:00
Mohammad Shahid
2f5cb60b07 [SLP] Add new and update existing lit testfor providing more context to incoming patch for vectorization of jumbled load
Change-Id: Ifb9091bb0f84c1937c2c8bd2fc345734f250d2f9
llvm-svn: 287992
2016-11-27 03:35:31 +00:00
Simon Pilgrim
841d7ca463 [X86][AVX512] Add support for v2i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287882
2016-11-24 14:46:55 +00:00
Alexey Bataev
2eaacda53e [SLP] Add more tests for SLP Vectorizer.
llvm-svn: 287801
2016-11-23 20:10:32 +00:00
Simon Pilgrim
4e9b9cbee9 [X86][AVX512] Add support for v4i64 fptosi/fptoui/sitofp/uitofp on AVX512DQ-only targets
Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances

llvm-svn: 287762
2016-11-23 14:01:18 +00:00
Simon Pilgrim
03cd8f887c [CostModel][X86] Add missing AVX512DQ v8i64 fptosi/sitofp costs
llvm-svn: 287760
2016-11-23 13:42:09 +00:00
Craig Topper
07f1c15995 [AVX-512] Support FCOPYSIGN for v16f32 and v8f64
Summary:
This extends FCOPYSIGN support to 512-bit vectors.

I've also added tests to show what the 128-bit and 256-bit cases look like with broadcast loads.

Reviewers: delena, zvi, RKSimon, spatel

Subscribers: llvm-commits

Differential Revision: https://reviews.llvm.org/D26791

llvm-svn: 287298
2016-11-18 02:25:34 +00:00
Vyacheslav Klochkov
b3dc774a99 Fixed the lost FastMathFlags for CALL operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26575

llvm-svn: 287064
2016-11-16 00:55:50 +00:00
Vyacheslav Klochkov
f1a12fe0f5 Fixed the lost FastMathFlags for FCmp operations in SLPVectorizer.
Reviewer: Michael Zolotukhin.
Differential Revision: https://reviews.llvm.org/D26543

llvm-svn: 286626
2016-11-11 19:55:29 +00:00
Simon Pilgrim
d02c55204b [VectorLegalizer] Expansion of CTLZ using CTPOP when possible
This patch avoids scalarization of CTLZ by instead expanding to use CTPOP (ref: "Hacker's Delight") when the necessary operations are available.

This also adds the necessary cost models for X86 SSE2 targets (the main beneficiary) to ensure vectorization only happens when its useful.

Differential Revision: https://reviews.llvm.org/D25910

llvm-svn: 286233
2016-11-08 14:10:28 +00:00
Alexey Bataev
46c0278e7d [SLP] Fix for PR30626: Compiler crash inside SLP Vectorizer.
After successfull horizontal reduction vectorization attempt for PHI node
vectorizer tries to update root binary op by combining vectorized tree
and the ReductionPHI node. But during vectorization this ReductionPHI
can be vectorized itself and replaced by the `undef` value, while the
instruction itself is marked for deletion. This 'marked for deletion'
PHI node then can be used in new binary operation, causing "Use still
stuck around after Def is destroyed" crash upon PHI node deletion.

Also the test is fixed to make it perform actual testing.

Differential Revision: https://reviews.llvm.org/D25671

llvm-svn: 285286
2016-10-27 12:02:28 +00:00
Simon Pilgrim
4ddc92b6cd [X86][SSE] Add lowering to cvttpd2dq/cvttps2dq for sitofp v2f64/2f32 to 2i32
As discussed on PR28461 we currently miss the chance to lower "fptosi <2 x double> %arg to <2 x i32>" to cvttpd2dq due to its use of illegal types.

This patch adds support for fptosi to 2i32 from both 2f64 and 2f32.

It also recognises that cvttpd2dq zeroes the upper 64-bits of the xmm result (similar to D23797) - we still don't do this for the cvttpd2dq/cvttps2dq intrinsics - this can be done in a future patch.

Differential Revision: https://reviews.llvm.org/D23808

llvm-svn: 284459
2016-10-18 07:42:15 +00:00
Simon Pilgrim
cfef627b1f [SLPVectorizer][X86] Add 512-bit sitofp/uitofp tests
llvm-svn: 283756
2016-10-10 14:28:06 +00:00
Simon Pilgrim
2c0733c678 [SLPVectorizer][X86] Add avx512 sitofp/uitofp tests
llvm-svn: 283751
2016-10-10 14:14:31 +00:00
Simon Pilgrim
6cadb5610e [SLPVectorizer][X86] Fixed alignments of scalar loads in sitofp/uitofp tests
Fixed copy+paste vector alignment to correct for per-element scalar loads

Increased to 512-bit data sizes in preparation of avx512 tests

llvm-svn: 283748
2016-10-10 14:10:41 +00:00
Alexey Bataev
6ad5da7c81 [SLPVectorizer] Fix for PR25748: reduction vectorization after loop
unrolling.

The next code is not vectorized by the SLPVectorizer:
```
 int test(unsigned int *p) {
  int sum = 0;
  for (int i = 0; i < 8; i++)
    sum += p[i];
  return sum;
 }
```
During optimization this loop is fully unrolled and SLPVectorizer is
unable to vectorize it. Patch tries to fix this problem.

Differential Revision: https://reviews.llvm.org/D24796

llvm-svn: 283535
2016-10-07 09:39:22 +00:00
Alexey Bataev
7e217c2402 [SLPVectorizer] Add a test with non-vectorizable IR.
llvm-svn: 283225
2016-10-04 15:07:23 +00:00
Sanjay Patel
d27a21874b [x86, SSE/AVX] allow 128/256-bit lowering for copysign vector intrinsics (PR30433)
This should fix:
https://llvm.org/bugs/show_bug.cgi?id=30433

There are a couple of open questions about the codegen:
1. Should we let scalar ops be scalars and avoid vector constant loads/splats?
2. Should we have a pass to combine constants such as the inverted pair that we have here?

Differential Revision: https://reviews.llvm.org/D25165
 

llvm-svn: 283119
2016-10-03 16:38:27 +00:00
Simon Pilgrim
04e249b128 [SLPVectorizer][X86] Added fptosi/fptoui tests
llvm-svn: 283048
2016-10-01 19:35:59 +00:00
Simon Pilgrim
567c4fbdae [SLPVectorizer][X86] Added fcopysign tests
llvm-svn: 283046
2016-10-01 17:00:26 +00:00
Simon Pilgrim
cceeb2a4fa [SLPVectorizer][X86] Added fabs tests
llvm-svn: 283045
2016-10-01 16:54:01 +00:00
Simon Pilgrim
34032cba14 Rename tests
llvm-svn: 281863
2016-09-18 20:25:41 +00:00
Simon Pilgrim
5d5ca9c0cb [X86][SSE] Add initial costs for vector CTTZ/CTLZ
llvm-svn: 277716
2016-08-04 10:51:41 +00:00
Simon Pilgrim
9e201eac32 [SLPVectorizer][X86] Added vXi8/vXi16 sitofp/uitofp tests
Dropped useless 2i32-2f32 test

llvm-svn: 277281
2016-07-30 21:01:34 +00:00
Simon Pilgrim
f5134a2867 [SLPVectorizer][X86] Added SITOFP/UITOFP vectorization tests
llvm-svn: 277275
2016-07-30 18:43:30 +00:00
Michael Kuperstein
38e7298093 [SLPVectorizer] Vectorize reverse-order loads in horizontal reductions
When vectorizing a tree rooted at a store bundle, we currently try to sort the
stores before building the tree, so that the stores can be vectorized. For other
trees, the order of the root bundle - which determines the order of all other
bundles - is arbitrary. That is bad, since if a leaf bundle of consecutive loads
happens to appear in the wrong order, we will not vectorize it.

This is partially mitigated when the root is a binary operator, by trying to
build a "reversed" tree when that's considered profitable. This patch extends the
workaround we have for binops to trees rooted in a horizontal reduction.

This fixes PR28474.

Differential Revision: https://reviews.llvm.org/D22554

llvm-svn: 276477
2016-07-22 21:28:48 +00:00
Simon Pilgrim
1b4f511aaa [X86][SSE] Add cost model values for CTPOP of vectors
This patch adds costs for the vectorized implementations of CTPOP, the default values were seriously underestimating the cost of these and was encouraging vectorization on targets where serialized use of POPCNT would be much better.

Differential Revision: https://reviews.llvm.org/D22456

llvm-svn: 276104
2016-07-20 10:41:28 +00:00
Simon Pilgrim
1b2ab113fb [SLPVectorizer][X86] Added sqrt vectorization tests
llvm-svn: 275788
2016-07-18 13:20:54 +00:00
Simon Pilgrim
4ca42e232d [SLPVectorizer][X86] Added fma vectorization tests
llvm-svn: 274889
2016-07-08 17:19:13 +00:00
Elena Demikhovsky
971fbfda1e Vector GEP test: renamed + some comments
Differential revision: http://reviews.llvm.org/D21957

llvm-svn: 274611
2016-07-06 08:11:23 +00:00
Elena Demikhovsky
6f2ec8104a Fixed crash of SLP Vectorizer on KNL
The bug is connected to vector GEPs.
https://llvm.org/bugs/show_bug.cgi?id=28313

llvm-svn: 273919
2016-06-27 20:07:00 +00:00
Simon Pilgrim
bc35f9f702 [SLPVectorizer][X86] Added ceil/floor/nearbyint/rint/trunc vectorization tests
llvm-svn: 273420
2016-06-22 14:07:46 +00:00
Simon Pilgrim
356e823b51 [X86][SSE] Add cost model for BSWAP of vectors
The BSWAP of vector types is quite efficiently implemented using vector shuffles on SSE/AVX targets, we should reflect the typical cost of this to encourage vectorization.

Differential Revision: http://reviews.llvm.org/D21521

llvm-svn: 273217
2016-06-20 23:08:21 +00:00
Sean Silva
e0a9e66040 [PM] Port SLPVectorizer to the new PM
This uses the "runImpl" approach to share code with the old PM.

Porting to the new PM meant abandoning the anonymous namespace enclosing
most of SLPVectorizer.cpp which is a bit of a bummer (but not a big deal
compared to having to pull the pass class into a header which the new PM
requires since it calls the constructor directly).

llvm-svn: 272766
2016-06-15 08:43:40 +00:00
Simon Pilgrim
3fc09f7be6 [CostModel][X86][SSE] Updated costs for vector BITREVERSE ops on SSSE3+ targets
To account for the fast PSHUFB implementation now available

llvm-svn: 272484
2016-06-11 19:23:02 +00:00
Michael Zolotukhin
987ab631fa [SLPVectorizer] Handle GEP with differing constant index types
Summary:
This fixes PR27617.

Bug description: The SLPVectorizer asserts on encountering GEPs with different index types, such as i8 and i64.

The patch includes a simple relaxation of the assert to allow constants being of different types, along with a regression test that will provoke the unrelaxed assert.

Reviewers: nadav, mzolotukhin

Subscribers: JesperAntonsson, llvm-commits, mzolotukhin

Differential Revision: http://reviews.llvm.org/D20685

Patch by Jesper Antonsson!

llvm-svn: 272206
2016-06-08 21:55:16 +00:00
Simon Pilgrim
ba319ded5e [Analysis] Enabled BITREVERSE as a vectorizable intrinsic
Allows XOP to vectorize BITREVERSE - other targets will follow as their costmodels improve.

llvm-svn: 271803
2016-06-04 20:21:07 +00:00
Simon Pilgrim
45964c3742 [SLPVectorizer][X86] Regenerated SEXT/ZEXT cast vectorization tests
Added 256-bit vector test as well

llvm-svn: 268811
2016-05-06 22:22:18 +00:00
Simon Pilgrim
2def0a878a [SLPVectorizer][X86] Added BSWAP/BITREVERSE vectorization tests
llvm-svn: 268803
2016-05-06 21:41:55 +00:00
Simon Pilgrim
a2220ea456 [SLPVectorizer][X86] Added CTPOP/CTLZ/CTTZ vectorization tests
llvm-svn: 268800
2016-05-06 21:33:01 +00:00
David Majnemer
13d5526392 [SLPVectorizer] Add operand bundles to vectorized functions
SLPVectorizing a call site should result in further propagation of its
bundles.

llvm-svn: 268004
2016-04-29 07:09:51 +00:00
Arch D. Robison
0e61034018 [SLPVectorizer] Extend SLP Vectorizer to deal with aggregates.
The refactoring portion part was done as r267748.

http://reviews.llvm.org/D14185

llvm-svn: 267899
2016-04-28 16:11:45 +00:00
Adrian Prantl
75819aedf6 [PR27284] Reverse the ownership between DICompileUnit and DISubprogram.
Currently each Function points to a DISubprogram and DISubprogram has a
scope field. For member functions the scope is a DICompositeType. DIScopes
point to the DICompileUnit to facilitate type uniquing.

Distinct DISubprograms (with isDefinition: true) are not part of the type
hierarchy and cannot be uniqued. This change removes the subprograms
list from DICompileUnit and instead adds a pointer to the owning compile
unit to distinct DISubprograms. This would make it easy for ThinLTO to
strip unneeded DISubprograms and their transitively referenced debug info.

Motivation
----------

Materializing DISubprograms is currently the most expensive operation when
doing a ThinLTO build of clang.

We want the DISubprogram to be stored in a separate Bitcode block (or the
same block as the function body) so we can avoid having to expensively
deserialize all DISubprograms together with the global metadata. If a
function has been inlined into another subprogram we need to store a
reference the block containing the inlined subprogram.

Attached to https://llvm.org/bugs/show_bug.cgi?id=27284 is a python script
that updates LLVM IR testcases to the new format.

http://reviews.llvm.org/D19034
<rdar://problem/25256815>

llvm-svn: 266446
2016-04-15 15:57:41 +00:00
David Majnemer
12fd50410d [SLPVectorizer] Vectorizing the libm sqrt to llvm's sqrt intrinsic requires nnan
To quote the langref "Unlike sqrt in libm, however, llvm.sqrt has
undefined behavior for negative numbers other than -0.0 (which allows
for better optimization, because there is no need to worry about errno
being set). llvm.sqrt(-0.0) is defined to return -0.0 like IEEE sqrt."

This means that it's unsafe to replace sqrt with llvm.sqrt unless the
call is annotated with nnan.

Thanks to Hal Finkel for pointing this out!

llvm-svn: 265521
2016-04-06 07:04:53 +00:00
David Majnemer
25d03dbcde [SLPVectorizer] Vectorize libcalls of sqrt
We didn't realize that we could transform the libcall into a vectorized
intrinsic.

llvm-svn: 265493
2016-04-06 00:14:59 +00:00
David Majnemer
6f1f85f0e1 [SLPVectorizer] Don't insert an extractelement before a catchswitch
A catchswitch cannot be preceded by another instruction in the same
basic block (other than a PHI node).

Instead, insert the extract element right after the materialization of
the vectorized value.  This isn't optimal but is a reasonable compromise
given the constraints of WinEH.

This fixes PR27163.

llvm-svn: 265157
2016-04-01 17:28:15 +00:00
Adrian Prantl
b8089516a5 testcase gardening: update the emissionKind enum to the new syntax. (NFC)
llvm-svn: 265081
2016-04-01 00:16:49 +00:00
Adrian Prantl
b939a25707 Move the DebugEmissionKind enum from DIBuilder into DICompileUnit.
This mostly cosmetic patch moves the DebugEmissionKind enum from DIBuilder
into DICompileUnit. DIBuilder is not the right place for this enum to live
in — a metadata consumer should not have to include DIBuilder.h.
I also added a Verifier check that checks that the emission kind of a
DICompileUnit is actually legal.

http://reviews.llvm.org/D18612
<rdar://problem/25427165>

llvm-svn: 265077
2016-03-31 23:56:58 +00:00
Paul Robinson
51fa0a87c3 Fix tests that used CHECK-NEXT-NOT and CHECK-DAG-NOT.
FileCheck actually doesn't support combo suffixes.

Differential Revision: http://reviews.llvm.org/D17588

llvm-svn: 262054
2016-02-26 19:40:34 +00:00