Commit Graph

3350 Commits

Author SHA1 Message Date
Nadav Rotem
e7785686a5 Fix a bug in the code that checks if we can vectorize loops while using dynamic
memory bound checks.  Before the fix we were able to vectorize this loop from
the Livermore Loops benchmark:

for ( k=1 ; k<n ; k++ )
  x[k] = x[k-1] + y[k];

llvm-svn: 170811
2012-12-21 00:07:35 +00:00
Nadav Rotem
2ababf68d7 LoopVectorize: Fix a bug in the scalarization of instructions.
Before if-conversion we could check if a value is loop invariant
if it was declared inside the basic block. Now that loops have
multiple blocks this check is incorrect.

This fixes External/SPEC/CINT95/099_go/099_go

llvm-svn: 170756
2012-12-20 20:24:40 +00:00
James Molloy
4f6fb953a7 Add a new attribute, 'noduplicate'. If a function contains a noduplicate call, the call cannot be duplicated - Jump threading, loop unrolling, loop unswitching, and loop rotation are inhibited if they would duplicate the call.
Similarly inlining of the function is inhibited, if that would duplicate the call (in particular inlining is still allowed when there is only one callsite and the function has internal linkage).

llvm-svn: 170704
2012-12-20 16:04:27 +00:00
Paul Redmond
5917f4c715 Transform (x&C)>V into (x&C)!=0 where possible
When the least bit of C is greater than V, (x&C) must be greater than V
if it is not zero, so the comparison can be simplified.

Although this was suggested in Target/X86/README.txt, it benefits any
architecture with a directly testable form of AND.

Patch by Kevin Schoedel

llvm-svn: 170576
2012-12-19 19:47:13 +00:00
Benjamin Kramer
ae0bb61053 Make TargetLowering::getTypeConversion more resilient against odd illegal MVTs.
- An MVT can become an EVT when being split (e.g. v2i8 -> v1i8, the latter doesn't exist)
- Return the scalar value when an MVT is scalarized (v1i64 -> i64)

Fixes PR14639ff.

llvm-svn: 170546
2012-12-19 14:34:28 +00:00
Shuxin Yang
37a1efe1c6 rdar://12801297
InstCombine for unsafe floating-point add/sub.

llvm-svn: 170471
2012-12-18 23:10:12 +00:00
Benjamin Kramer
f0e5d2f032 LoopVectorize: Emit reductions as log2(vectorsize) shuffles + vector ops instead of scalar operations.
For example on x86 with SSE4.2 a <8 x i8> add reduction becomes
	movdqa	%xmm0, %xmm1
	movhlps	%xmm1, %xmm1            ## xmm1 = xmm1[1,1]
	paddw	%xmm0, %xmm1
	pshufd	$1, %xmm1, %xmm0        ## xmm0 = xmm1[1,0,0,0]
	paddw	%xmm1, %xmm0
	phaddw	%xmm0, %xmm0
	pextrb	$0, %xmm0, %edx

instead of
	pextrb	$2, %xmm0, %esi
	pextrb	$0, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$4, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$6, %xmm0, %edx
	addb	%sil, %dl
	pextrb	$8, %xmm0, %esi
	addb	%dl, %sil
	pextrb	$10, %xmm0, %edi
	pextrb	$14, %xmm0, %edx
	addb	%sil, %dil
	pextrb	$12, %xmm0, %esi
	addb	%dil, %sil
	addb	%sil, %dl

llvm-svn: 170439
2012-12-18 18:40:20 +00:00
Nadav Rotem
cb23342876 Rename the test so that we can add additional vectors-of-pointers tests
into the same file in the future.

llvm-svn: 170414
2012-12-18 05:50:54 +00:00
Nadav Rotem
a5024fc3e1 SROA: Replace calls to getScalarSizeInBits to DataLayout's API because
getScalarSizeInBits could not handle vectors of pointers.

llvm-svn: 170412
2012-12-18 05:23:31 +00:00
Chandler Carruth
e3f4119b06 Fix another SROA crasher, PR14601.
This was a silly oversight, we weren't pruning allocas which were used
by variable-length memory intrinsics from the set that could be widened
and promoted as integers. Fix that.

llvm-svn: 170353
2012-12-17 18:48:07 +00:00
Chandler Carruth
21eb4e96c2 Teach the rewriting of memcpy calls to support subvector copies.
This also cleans up a bit of the memcpy call rewriting by sinking some
irrelevant code further down and making the call-emitting code a bit
more concrete.

Previously, memcpy of a subvector would actually miscompile (!!!) the
copy into a single vector element copy. I have no idea how this ever
worked. =/ This is the memcpy half of PR14478 which we probably weren't
noticing previously because it didn't actually assert.

The rewrite relies on the newly refactored insert- and extractVector
functions to do the heavy lifting, and those are the same as used for
loads and stores which makes the test coverage a bit more meaningful
here.

llvm-svn: 170338
2012-12-17 14:51:24 +00:00
Chandler Carruth
cacda256a1 Fix a secondary bug I introduced while fixing the first part of PR14478.
The first half of fixing this bug was actually in r170328, but was
entirely coincidental. It did however get me to realize the nature of
the bug, and adapt the test case to test more interesting behavior. In
turn, that uncovered the rest of the bug which I've fixed here.

This should fix two new asserts that showed up in the vectorize nightly
tester.

llvm-svn: 170333
2012-12-17 14:03:01 +00:00
Chandler Carruth
ccca504f3a Fix the first part of PR14478: memset now works.
PR14478 highlights a serious problem in SROA that simply wasn't being
exercised due to a lack of vector input code mixed with C-library
function calls. Part of SROA was written carefully to handle subvector
accesses via memset and memcpy, but the rewriter never grew support for
this. Fixing it required refactoring the subvector access code in other
parts of SROA so it could be shared, and then fixing the splat formation
logic and using subvector insertion (this patch).

The PR isn't quite fixed yet, as memcpy is still broken in the same way.
I'm starting on that series of patches now.

Hopefully this will be enough to bring the bullet benchmark back to life
with the bb-vectorizer enabled, but that may require fixing memcpy as
well.

llvm-svn: 170301
2012-12-17 04:07:37 +00:00
Chandler Carruth
c50394fcfa Add a corollary test for PR14572. We got this code path correct already.
llvm-svn: 170271
2012-12-15 09:31:54 +00:00
Chandler Carruth
067edd342f Relax an overly aggressive assert to fix PR14572.
The alloca width is based on the alloc size, not the type size.

llvm-svn: 170270
2012-12-15 09:26:06 +00:00
Michael Ilseman
e2754dc887 Add back FoldOpIntoPhi optimizations with fix. Included test cases to help catch these errors and to test the presence of the optimization itself
llvm-svn: 170248
2012-12-14 22:08:26 +00:00
Nadav Rotem
aa3e2a907e Fix a crash in ValueTracking on vectors of pointers.
llvm-svn: 170240
2012-12-14 20:43:49 +00:00
Shuxin Yang
f8e9a5a061 rdar://12753946
Implement rule : "x * (select cond 1.0, 0.0) -> select cond x, 0.0"

llvm-svn: 170226
2012-12-14 18:46:06 +00:00
NAKAMURA Takumi
38d2b2442f Revert r170020, "Simplify negated bit test", for now.
This assumes (1 << n) is always not zero. Consider n is greater than word size.
Although I know it is undefined, this transforms undefined behavior hidden.

This led clang unexpected behavior with some failures. I will investigate to fix undefined shl in clang.

llvm-svn: 170128
2012-12-13 14:28:16 +00:00
Quentin Colombet
c0dba2035a Take into account minimize size attribute in the inliner.
Better controls the inlining of functions when the caller function has MinSize attribute.
Basically, when the caller function has this attribute, we do not "force" the inlining
of callee functions carrying the InlineHint attribute (i.e., functions defined with
inline keyword)

llvm-svn: 170065
2012-12-13 01:05:25 +00:00
Nadav Rotem
36510f7194 Teach the cost model about the optimization in r169904: Truncation of induction variables costs the same as scalar trunc.
llvm-svn: 170051
2012-12-13 00:21:03 +00:00
Jakub Staszak
a3619d31d8 unHECKify test fixed by Jacob in r159003.
llvm-svn: 170023
2012-12-12 20:58:42 +00:00
David Majnemer
5226aa94ce Simplify negated bit test
llvm-svn: 170020
2012-12-12 20:48:54 +00:00
Jakub Staszak
0a74fc8d6c unHECKify test. It was fixed by Chris in 2009.
llvm-svn: 170017
2012-12-12 20:43:00 +00:00
Jakub Staszak
aee9cca331 Fix typo in test-case.
llvm-svn: 170015
2012-12-12 20:29:06 +00:00
Jakub Staszak
67bf76ebbc Fix typo.
llvm-svn: 170006
2012-12-12 19:47:04 +00:00
Nadav Rotem
d0bb22bba3 LoopVectorizer: Use the "optsize" attribute to decide if we are allowed to increase the function size.
llvm-svn: 170004
2012-12-12 19:29:45 +00:00
Shuxin Yang
81b3678564 - Fix a problematic way in creating all-the-1 APInt.
- Propagate "exact" bit of [l|a]shr instruction.

llvm-svn: 169942
2012-12-12 00:29:03 +00:00
Michael Ilseman
bb6f691b01 Added a slew of SimplifyInstruction floating-point optimizations, many of which take advantage of fast-math flags. Test cases included.
fsub X, +0 ==> X
  fsub X, -0 ==> X, when we know X is not -0
  fsub +/-0.0, (fsub -0.0, X) ==> X
  fsub nsz +/-0.0, (fsub +/-0.0, X) ==> X
  fsub nnan ninf X, X ==> 0.0
  fadd nsz X, 0 ==> X
  fadd [nnan ninf] X, (fsub [nnan ninf] 0, X) ==> 0
    where nnan and ninf have to occur at least once somewhere in this expression
  fmul X, 1.0 ==> X

llvm-svn: 169940
2012-12-12 00:27:46 +00:00
Nadav Rotem
f707bf4ca3 PR14574. Fix a bug in the code that calculates the mask the converted PHIs in if-conversion.
llvm-svn: 169916
2012-12-11 21:30:14 +00:00
Nadav Rotem
e266efb70b Loop Vectorize: optimize the vectorization of trunc(induction_var). The truncation is now done on scalars.
llvm-svn: 169904
2012-12-11 18:58:10 +00:00
Nadav Rotem
dbb3328194 Fix PR14565. Don't if-convert loops that have switch statements in them.
llvm-svn: 169813
2012-12-11 04:55:10 +00:00
Nadav Rotem
7b5b55c195 Add support for reverse induction variables. For example:
while (i--)
 sum+=A[i];

llvm-svn: 169752
2012-12-10 19:25:06 +00:00
Chandler Carruth
e45f4658a3 Fix PR14548: SROA was crashing on a mixture of i1 and i8 loads and stores.
When SROA was evaluating a mixture of i1 and i8 loads and stores, in
just a particular case, it would tickle a latent bug where we compared
bits to bytes rather than bits to bits. As a consequence of the latent
bug, we would allow integers through which were not byte-size multiples,
a situation the later rewriting code was never intended to handle.

In release builds this could trigger all manner of oddities, but the
reported issue in PR14548 was forming invalid bitcast instructions.

The only downside of this fix is that it makes it more clear that SROA
in its current form is not capable of handling mixed i1 and i8 loads and
stores. Sometimes with the previous code this would work by luck, but
usually it would crash, so I'm not terribly worried. I'll watch the LNT
numbers just to be sure.

llvm-svn: 169719
2012-12-10 00:54:45 +00:00
Paul Redmond
2adb13c100 LoopVectorize: support vectorizing intrinsic calls
- added function to VectorTargetTransformInfo to query cost of intrinsics
- vectorize trivially vectorizable intrinsic calls such as sin, cos, log, etc.

Reviewed by: Nadav

llvm-svn: 169711
2012-12-09 20:42:17 +00:00
Shuxin Yang
95de7c37e2 - Re-enable population count loop idiom recognization
- fix a bug which cause sigfault.
- add two testing cases which was causing crash

llvm-svn: 169687
2012-12-09 03:12:46 +00:00
Chandler Carruth
91e47532fe Revert the patches adding a popcount loop idiom recognition pass.
There are still bugs in this pass, as well as other issues that are
being worked on, but the bugs are crashers that occur pretty easily in
the wild. Test cases have been sent to the original commit's review
thread.

This reverts the commits:
  r169671: Fix a logic error.
  r169604: Move the popcnt tests to an X86 subdirectory.
  r168931: Initial commit adding the pass.

llvm-svn: 169683
2012-12-08 22:18:29 +00:00
David Tweed
cfeb8fc49a The test unconditionally assumes a particular cpu has a backend build in the target.
Buildbots for some hosts may choose to build only their own backend in order to
maximise testing-turnaround time. Move the test into a prefixed directory so
lit's standard "backend specific" suppression can be done.

llvm-svn: 169604
2012-12-07 15:57:45 +00:00
Chandler Carruth
80d3e56c73 Add support to ValueTracking for determining that a pointer is non-null
by virtue of inbounds GEPs that preclude a null pointer.

This is a very common pattern in the code generated by std::vector and
other standard library routines which use allocators that test for null
pervasively. This is one step closer to teaching Clang+LLVM to be able
to produce an empty function for:

  void f() {
    std::vector<int> v;
    v.push_back(1);
    v.push_back(2);
    v.push_back(3);
    v.push_back(4);
  }

Which is related to getting them to completely fold SmallVector
push_back sequences into constants when inlining and other optimizations
make that a possibility.

llvm-svn: 169573
2012-12-07 02:08:58 +00:00
Dmitri Gribenko
1c704355cf Fix typos in CHECK lines.
Patch by Alexander Zinenko.

llvm-svn: 169547
2012-12-06 21:24:47 +00:00
Shuxin Yang
ada92f5018 fix a typo
llvm-svn: 169345
2012-12-05 00:33:16 +00:00
Nadav Rotem
93fa5ef957 Fix a bug in vectorization of if-converted reduction variables. If the
reduction variable is not used outside the loop then we ran into an
endless loop. This change checks if we found the original PHI.

llvm-svn: 169324
2012-12-04 22:40:22 +00:00
Shuxin Yang
73285933c9 For rdar://12329730, last piece.
This change attempts to simplify (X^Y) -> X or Y in the user's context if we know that
only bits from X or Y are demanded.

  A minimized case is provided bellow. This change will simplify "t>>16" into "var1 >>16".

  =============================================================
  unsigned foo (unsigned val1, unsigned val2) {
    unsigned t = val1 ^ 1234;
    return (t >> 16) | t; // NOTE: t is used more than once.
  }
  =============================================================

  Note that if the "t" were used only once, the expression would be finally optimized as well.
However, with with this change, the optimization will take place earlier.

  Reviewed by Nadav, Thanks a lot!

llvm-svn: 169317
2012-12-04 22:15:32 +00:00
Nadav Rotem
a10b311aec Add support for reduction variables when IF-conversion is enabled.
llvm-svn: 169288
2012-12-04 18:17:33 +00:00
Nadav Rotem
628c2dba60 Add the last part that is needed for vectorization of if-converted code.
Added the code that actually performs the if-conversion during vectorization.

We can now vectorize this code:

for (int i=0; i<n; ++i) {
  unsigned k = 0;

  if (a[i] > b[i])   <------ IF inside the loop.
    k = k * 5 + 3;

  a[i] = k;          <---- K is a phi node that becomes vector-select.
}

llvm-svn: 169217
2012-12-04 06:15:11 +00:00
Shuxin Yang
86c0e232b7 rdar://12329730 (2nd part, revised)
The type of shirt-right (logical or arithemetic) should remain unchanged 
when transforming  "X << C1 >> C2" into "X << (C1-C2)"

llvm-svn: 169209
2012-12-04 03:28:32 +00:00
Shuxin Yang
63e999edbf rdar://12329730 (2nd part)
This change tries to simmplify E1 = " X >> C1 << C2" into :
  - E2 = "X << (C2 - C1)" if C2 > C1, or
  - E2 = "X >> (C1 - C2)" if C1 > C2, or
  - E2 = X if C1 == C2.

 Reviewed by Nadav. Thanks!

llvm-svn: 169182
2012-12-04 00:04:54 +00:00
Benjamin Kramer
47534c7440 SROA: Avoid struct and array types early to avoid creating an overly large integer type.
Fixes PR14465.

Differential Revision: http://llvm-reviews.chandlerc.com/D148

llvm-svn: 169084
2012-12-01 11:53:32 +00:00
Zhou Sheng
8e6d64a71d Revert previous check in r168581, r169079 as they are still in code review status.
llvm-svn: 169083
2012-12-01 10:54:28 +00:00
Zhou Sheng
13fb1ca44a The patch is to improve the memory footprint of pass GlobalOpt.
Also check in a case to repeat the issue, on which 'opt -globalopt' consumes 1.6GB memory.
The big memory footprint cause is that current GlobalOpt one by one hoists and stores the leaf element constant into the global array, in each iteration, it recreates the global array initializer constant and leave the old initializer alone. This may result in many obsolete constants left.
For example:  we have global array @rom = global [16 x i32] zeroinitializer
After the first element value is hoisted and installed:   @rom = global [16 x i32] [ 1, 0, 0, ... ]
After the second element value is installed:  @rom = global [16 x 32] [ 1, 2, 0, 0, ... ]        // here the previous initializer is obsolete
...
When the transform is done, we have 15 obsolete initializers left useless.

llvm-svn: 169079
2012-12-01 04:38:53 +00:00