Commit Graph

16209 Commits

Author SHA1 Message Date
Craig Topper
13cf7cac07 [AVX512] Remove maksed pshufd, pshuflw, and phufhw intrinsics and autoupgrade them to selects and shufflevector.
llvm-svn: 272527
2016-06-13 02:36:48 +00:00
Sanjay Patel
977530a8c9 [x86, SSE] change patterns for CMPP to float types to allow matching with SSE1 (PR28044)
This patch is intended to solve:
https://llvm.org/bugs/show_bug.cgi?id=28044

By changing the definition of X86ISD::CMPP to use float types, we allow it to be created 
and pass legalization for an SSE1-only target where v4i32 is not legal.

The motivational trail for this change includes:
https://llvm.org/bugs/show_bug.cgi?id=28001

and eventually makes this trigger:
http://reviews.llvm.org/D21190

Ie, after this step, we should be free to have Clang generate FP compare IR instead of x86
intrinsics for SSE C packed compare intrinsics. (We can auto-upgrade and remove the LLVM 
sse.cmp intrinsics as a follow-up step.) Once we're generating vector IR instead of x86
intrinsics, a big pile of generic optimizations can trigger.

Differential Revision: http://reviews.llvm.org/D21235

llvm-svn: 272511
2016-06-12 15:03:25 +00:00
Craig Topper
1067986c5b [X86] Remove sse2 pshufd/pshuflw/pshufhw intrinsics and upgrade them to shufflevector.
llvm-svn: 272510
2016-06-12 14:11:32 +00:00
Simon Pilgrim
9d8bed1796 [X86][BMI] Added fast-isel tests for BMI1 intrinsics
A lot of the codegen is pretty awful for these as they are mostly implemented as generic bit twiddling ops 

llvm-svn: 272508
2016-06-12 09:56:05 +00:00
Craig Topper
b7713e413b [X86] Move tests for llvm.x86.avx.vpermil.* intrinsics to a -upgrade test since they are autoupgraded to shufflevector.
llvm-svn: 272494
2016-06-12 01:41:06 +00:00
Simon Pilgrim
2b7c02a04f [X86] Updated test checks script to generalise LCPI symbol refs
The script now replace '.LCPI888_8' style asm symbols with the {{\.LCPI.*}} re pattern - this helps stop hardcoded symbols in 32-bit x86 tests changing with every edit of the file

Refreshed some tests to demonstrate the new check

llvm-svn: 272488
2016-06-11 20:39:21 +00:00
Simon Pilgrim
5b9bade8dd [X86][SSSE3] Added PSHUFB LUT implementation of BITREVERSE
PSHUFB can speed up BITREVERSE of byte vectors by performing LUT on the low/high nibbles separately and ORing the results. Wider integer vector types are already BSWAP'd beforehand so also make use of this approach.

llvm-svn: 272477
2016-06-11 15:44:13 +00:00
Craig Topper
46f49fb407 [AVX512] Re-generate v8i64 shuffle test now that we use pshufd for some cases.
llvm-svn: 272474
2016-06-11 13:57:08 +00:00
Craig Topper
504fba5c8a [AVX512] Lower v8i64 and v16i32 to pshufd when possible.
llvm-svn: 272473
2016-06-11 13:43:21 +00:00
Simon Pilgrim
6800a45790 [X86][SSE] Added PSLLDQ/PSRLDQ as a target shuffle type
Ensure that PALIGNR/PSLLDQ/PSRLDQ are byte vectors so that they can be correctly decoded for target shuffle combining

llvm-svn: 272471
2016-06-11 13:38:28 +00:00
Simon Pilgrim
8dd73e3ffa [X86][AVX2] Added PSLLDQ/PSRLDQ shuffle combining tests
llvm-svn: 272469
2016-06-11 13:18:21 +00:00
Craig Topper
40abd1cc61 [AVX512] Add support for lowering v32i16 shuffles with repeated lanes. This allows us to create 512-bit PSHUFLW/PSHUFHW.
llvm-svn: 272450
2016-06-11 03:27:42 +00:00
Quentin Colombet
f2a1909bb5 [IRTranslator] Support the translation of or.
Now or instructions get translated into G_OR.

llvm-svn: 272433
2016-06-10 20:50:35 +00:00
Sanjay Patel
b114fd65fc [x86] enable bitcasted fabs/fneg transforms
The vector cases don't change because we already have folds in X86ISelLowering
to look through and remove bitcasts.

llvm-svn: 272427
2016-06-10 20:33:50 +00:00
Zhan Jun Liau
ab42cbce98 [SystemZ] Support Compare and Traps
Support and generate Compare and Traps like CRT, CIT, etc.

Support Trap as legal DAG opcodes and generate "j .+2" for them by default.
Add support for Conditional Traps and use the If Converter to convert them into
the corresponding compare and trap opcodes.

Differential Revision: http://reviews.llvm.org/D21155

llvm-svn: 272419
2016-06-10 19:58:10 +00:00
Tom Stellard
f3af841462 AMDGPU/SI: Don't use fixup_si_rodata for scratch rsrc relocations
Summary:
We need to set the fixup type to FK_Data_4 for the
SCRATCH_RSRC_DWORD[01] symbols, since these require absolute
relocations, and fixup_si_rodata is for relative relocations.

Reviewers: arsenm, kzhuravl

Subscribers: arsenm, kzhuravl, llvm-commits

Differential Revision: http://reviews.llvm.org/D21153

llvm-svn: 272417
2016-06-10 19:26:38 +00:00
Mehdi Amini
cbd68ecf04 Move CodeGen test from Generic to X86 specific directory
llvm-svn: 272416
2016-06-10 19:14:01 +00:00
Mehdi Amini
1d396832d3 Interprocedural Register Allocation (IPRA): add a Transformation Pass
Adds a MachineFunctionPass that scans the body to find calls, and
update the register mask with the one saved by the
RegUsageInfoCollector analysis in PhysicalRegisterUsageInfo.

Patch by Vivek Pandya <vivekvpandya@gmail.com>

Differential Revision: http://reviews.llvm.org/D21180

llvm-svn: 272414
2016-06-10 18:37:21 +00:00
Sanjay Patel
d558bdadd2 [x86] add test for PR28044
llvm-svn: 272411
2016-06-10 18:05:55 +00:00
Mehdi Amini
bbacddfe92 Interprocedural Register Allocation (IPRA) Analysis
Add an option to enable the analysis of MachineFunction register
usage to extract the list of clobbered registers.

When enabled, the CodeGen order is changed to be bottom up on the Call
Graph.

The analysis is split in two parts, RegUsageInfoCollector is the
MachineFunction Pass that runs post-RA and collect the list of
clobbered registers to produce a register mask.

An immutable pass, RegisterUsageInfo, stores the RegMask produced by
RegUsageInfoCollector, and keep them available. A future tranformation
pass will use this information to update every call-sites after
instruction selection.

Patch by Vivek Pandya <vivekvpandya@gmail.com>

Differential Revision: http://reviews.llvm.org/D20769

llvm-svn: 272403
2016-06-10 16:19:46 +00:00
Sanjay Patel
27f06ae7a5 [x86] fix test attributes and autogenerate checks
llvm-svn: 272398
2016-06-10 15:30:52 +00:00
Sanjay Patel
cccccd9ab5 [x86] add missing tests for fcmp ueq/one
Somehow, the codegen logic for these sequences has gone completely untested
until now (note the 2 compare instructions generated per test).

There's also an *Intel* AVX optimization opportunity exposed in these cases
and the existing tests. Intel's (but not AMD's) AVX spec shows that extra FP
predicates were added, so a single comparison should always be sufficient,
and operand commutation should never be necessary.

llvm-svn: 272397
2016-06-10 15:17:54 +00:00
Sanjay Patel
330a359fb3 [x86] regenerate checks
llvm-svn: 272396
2016-06-10 14:48:50 +00:00
Simon Pilgrim
2fa2690bca [X86][SSE] Added target shuffle combine tests for byte shift/rotates (PSLLDQ/PSRLDQ/PALIGNR)
llvm-svn: 272392
2016-06-10 13:03:22 +00:00
Simon Pilgrim
34263ad995 [X86][AVX512] Added VPSLLDQ/VPSRLDQ memory fold tests
Memory operand is new for AVX512 (SSE/AVX2 didn't support it).

Also dropped the 'mask' from the tests (VPSLLDQ/VPSRLDQ don't support masked operations).

Regenerated VPALIGNR test now that the shuffle comments work

llvm-svn: 272383
2016-06-10 09:56:20 +00:00
Craig Topper
200d237e57 [AVX512] Add shuffle comment printing for masked VPERMPD/VPERMQ.
llvm-svn: 272371
2016-06-10 05:12:40 +00:00
Craig Topper
89c1761474 [AVX512] Fix shuffle comment printing to handle the masked versions of some shuffles. Previously we were printing the mask operands as the register names.
llvm-svn: 272367
2016-06-10 04:48:05 +00:00
Quentin Colombet
3198649199 [LiveRangeEdit] Add a test case for r272314.
The test case is not great espicially because it is still cumbersome to
run the regalloc pass with run-pass. (We miss a bunch of initiliazier to
be properly implemented.)

Related to llvm.org/PR27983

llvm-svn: 272360
2016-06-10 01:57:48 +00:00
Quentin Colombet
129458a7ed [llc] Add support for several run-pass options.
Previously we could run only one machine pass with the run-pass option.
With that patch, we can now specify several passes with several run-pass
options (or just one option with a list of comma separated passes) and
llc will build the related pipeline.
This is great to test the interaction of two passes that are not
necessarily next to each other in the pipeline, or play with pass
ordering.
Now, we should be at parity with opt for the flexibility of running
passes.

Note: I also moved the run pass option from CommandFlags.h to llc.cpp
because, really, this is needed only there!

llvm-svn: 272356
2016-06-10 00:52:10 +00:00
Matt Arsenault
58ddad5bd6 AMDGPU: v_cndmask_b32 does not def vcc
Fixes verifier errors after SIShrinkInstructions.

llvm-svn: 272351
2016-06-10 00:18:41 +00:00
Tom Stellard
26a2ab7477 AMDGPU/SI: Make sure to emit TargetConstant nodes when matching ds_*permute
Summary:
This fixes a bug with ds_*permute instructions where if it was passed a
constant address, then the offset operand would get assigned a register
operand instead of an immediate.

Reviewers: scchan, arsenm

Subscribers: arsenm, llvm-commits

Differential Revision: http://reviews.llvm.org/D19994

llvm-svn: 272349
2016-06-10 00:01:04 +00:00
Matt Arsenault
7757c59e48 AMDGPU: Fix flat atomics
The flat atomics could already be selected, but only
when using flat instructions for global memory. Add
patterns for flat addresses.

llvm-svn: 272345
2016-06-09 23:42:54 +00:00
Matt Arsenault
887018179a AMDGPU: Fix i64 global cmpxchg
This was using extract_subreg sub0 to extract the low register
of the result instead of sub0_sub1, producing an invalid copy.

There doesn't seem to be a way to use the compound subreg indices
in tablegen since those are generated, so manually select it.

llvm-svn: 272344
2016-06-09 23:42:48 +00:00
Matt Arsenault
25363d37fc AMDGPU: Fix missing and broken check lines in atomic tests
llvm-svn: 272343
2016-06-09 23:42:44 +00:00
Eric Christopher
1dbb23e162 Add aliases for mfvrsave/mtvrsave.
Update a test as we're now going to emit it for easier reading of
generated assembly as well.

llvm-svn: 272339
2016-06-09 23:27:48 +00:00
Simon Pilgrim
643734c565 [X86][AVX512] Added avx512 VPSLLDQ/VPSRLDQ instruction comments
llvm-svn: 272319
2016-06-09 22:03:15 +00:00
Simon Pilgrim
f718682eb9 [X86][AVX512] Dropped avx512 VPSLLDQ/VPSRLDQ intrinsics
Auto-upgrade to generic shuffles like sse/avx2 implementations now that we can lower to VPSLLDQ/VPSRLDQ 

llvm-svn: 272308
2016-06-09 21:09:03 +00:00
Simon Pilgrim
47c76e201a [X86][AVX512] Fixed issue with v16i32 shuffles lowering to VPALIGNR
llvm-svn: 272307
2016-06-09 20:53:12 +00:00
Simon Pilgrim
0ab9d3026a [X86][AVX512] Added support for lowering 512-bit vector shuffles to bit/byte shifts
512-bit VPSLLDQ/VPSRLDQ can only be used for avx512bw targets so lowerVectorShuffleAsShift had to be adjusted to include the subtarget

llvm-svn: 272300
2016-06-09 20:13:58 +00:00
Justin Lebar
ed2c282d4b [NVPTX] Add intrinsics for shfl instructions.
Summary:
Currently clang emits these instructions via inline (volatile) asm in
the CUDA headers.  Switching to intrinsics will let the optimizer reason
across calls to these intrinsics.

Reviewers: tra

Subscribers: llvm-commits, jholewinski

Differential Revision: http://reviews.llvm.org/D21160

llvm-svn: 272298
2016-06-09 20:04:08 +00:00
Wei Ding
ed0f97fad2 AMDGPU/SI: Fix 32-bit fdiv lowering
We were using the fast fdiv lowering for all division, implementation of
IEEE754 fdiv is added.

http://reviews.llvm.org/D20557

llvm-svn: 272292
2016-06-09 19:17:15 +00:00
Davide Italiano
1a7e32cc48 Also fix a typo. Need more coffee today.
llvm-svn: 272278
2016-06-09 17:06:01 +00:00
Davide Italiano
f326b30a15 Improve r272262, check that __stack_chk_guard is used.
Thanks to Rafael for the suggestion.

llvm-svn: 272277
2016-06-09 17:04:38 +00:00
Jan Vesely
2da0cba5fb SelectionDAG: Implement expansion of {S,U}MIN/MAX in integer legalization
Fixes {u,}long_{min,max,clamp} opencl piglit regressions on EG.

Reviewers: arsenm
Differential Revision: http://reviews.llvm.org/D17898

llvm-svn: 272272
2016-06-09 16:04:00 +00:00
Haicheng Wu
5b458cc1f6 Reapply "[MBP] Reduce code size by running tail merging in MBP.""
This reapplies commit r271930, r271915, r271923.  They hit a bug in
Thumb which is fixed in r272258 now.

The original message:

The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.

This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.

llvm-svn: 272267
2016-06-09 15:24:29 +00:00
Ulrich Weigand
79564611d9 [SystemZ] Enable long displacement constraints for inline ASM operands
This enables use of the 'S' constraint for inline ASM operands on
SystemZ, which allows for a memory reference with a signed 20-bit
immediate displacement. This patch includes corresponding documentation
and test case updates.

I've changed the 'T' constraint to match the new behavior for 'S', as
'T' also uses a long displacement (though index constraints are still
not implemented). I also changed 'm' to match the behavior for 'S' as
this will allow for a wider range of displacements for 'm', though
correct me if that's not the right decision.

Author: colpell
Differential Revision: http://reviews.llvm.org/D21097

llvm-svn: 272266
2016-06-09 15:19:16 +00:00
Davide Italiano
24f1f62dca Move stackguard test to X86/ directory as it's not generic.
llvm-svn: 272264
2016-06-09 15:16:58 +00:00
Davide Italiano
bd4243c519 [CodeGen] Change getSDagStackGuard to get an internal sym.
Fixes a crash in the backend during an LTO build of rtld(1) in
FreeBSD.

llvm-svn: 272262
2016-06-09 14:23:38 +00:00
Igor Breger
f635367e2b [AVX512] Remove masked_move/blendm intrinsic from back-end.
This is complement patch to D21060.

Differential Revision: http://reviews.llvm.org/D21174

llvm-svn: 272257
2016-06-09 11:46:55 +00:00
Zlatko Buljan
cd242c1655 [mips][microMIPS] Add CodeGen support for SEL.*, SELEQZ, SELNEZ, SELEQZ.*, SELNEZ.* and CMP.condn.fmt instructions
Differential Revision: http://reviews.llvm.org/D20862

llvm-svn: 272256
2016-06-09 11:15:53 +00:00