Currently shuffles may only be combined if they are of the same type, despite the fact that bitcasts are often introduced in between shuffle nodes (e.g. x86 shuffle type widening).
This patch allows a single input shuffle to peek through bitcasts and if the input is another shuffle will merge them, shuffling using the smallest sized type, and re-applying the bitcasts at the inputs and output instead.
Dropped old ShuffleToZext test - this patch removes the use of the zext and vector-zext.ll covers these anyhow.
Differential Revision: http://reviews.llvm.org/D7939
llvm-svn: 231380
Added lowering for ISD::CONCAT_VECTORS and ISD::INSERT_SUBVECTOR for i1 vectors,
it is needed to pass all masked_memop.ll tests for SKX.
llvm-svn: 231371
Also it extracts getCopyFromRegs helper function in SelectionDAGBuilder as we need to be able to customize type of the register exported from basic block during lowering of the gc.result.
llvm-svn: 231366
When trying to convert a BUILD_VECTOR into a shuffle, we try to split a single source vector that is twice as wide as the destination vector.
We can not do this when we also need the zero vector to create a blend.
This fixes PR22774.
Differential Revision: http://reviews.llvm.org/D8040
llvm-svn: 231219
Accidentally committed a few more of these cleanup changes than
intended. Still breaking these out & tidying them up.
This reverts commit r231135.
llvm-svn: 231136
There doesn't seem to be any need to assert that iterator assignment is
between iterators over the same node - if you want to reuse an iterator
variable to iterate another node, that's perfectly acceptable. Just
don't mix comparisons between iterators into disjoint sequences, as
usual.
llvm-svn: 231135
We were missing a check for the following fold in DAGCombiner:
// fold (fmul (fmul x, c1), c2) -> (fmul x, (fmul c1, c2))
If 'x' is also a constant, then we shouldn't do anything. Otherwise, we could end up swapping the operands back and forth forever.
This should fix:
http://llvm.org/bugs/show_bug.cgi?id=22698
Differential Revision: http://reviews.llvm.org/D7917
llvm-svn: 230884
Level 1 should abort for all instructions but call/terminators/args.
Instead it was aborting only if the level was > 2
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230861
Summary:
Currently fast-isel-abort will only abort for regular instructions,
and just warn for function calls, terminators, function arguments.
There is already fast-isel-abort-args but nothing for calls and
terminators.
This change turns the fast-isel-abort options into an integer option,
so that multiple levels of strictness can be defined.
This will help no being surprised when the "abort" option indeed does
not abort, and enables the possibility to write test that verifies
that no intrinsics are forgotten by fast-isel.
Reviewers: resistor, echristo
Subscribers: jfb, llvm-commits
Differential Revision: http://reviews.llvm.org/D7941
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 230775
a lookup, pass that in rather than use a naked call to getSubtargetImpl.
This involved passing down and around either a TargetMachine or
TargetRegisterInfo. Update all callers/definitions around the targets
and SelectionDAG.
llvm-svn: 230699
have the debugger step through each one individually. Turn off the
combine for adjacent stores at -O0 so we get this behavior.
Possibly, DAGCombine shouldn't run at all at -O0, but that's for
another day; see PR22346.
Differential Revision: http://reviews.llvm.org/D7181
llvm-svn: 230659
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Mon Feb 23 23:04:28 2015 +0000
Fix based on post-commit comment on D7816 & rL230177 - BUILD_VECTOR operand truncation was using the the BV's output scalar type instead of the input type.
and
Author: Simon Pilgrim <llvm-dev@redking.me.uk>
Date: Sun Feb 22 18:17:28 2015 +0000
[DagCombiner] Generalized BuildVector Vector Concatenation
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
as the root cause of PR22678 which is causing an assertion inside the DAG combiner.
I'll follow up to the main thread as well.
llvm-svn: 230358
The logic is almost there already, with our special homogeneous aggregate
handling. Tweaking it like this allows front-ends to emit AAPCS compliant code
without ever having to count registers or add discarded padding arguments.
Only arrays of i32 and i64 are needed to model AAPCS rules, but I decided to
apply the logic to all integer arrays for more consistency.
llvm-svn: 230348
For almost all node types, if the target requested custom lowering, and
LowerOperation returned its input, we'd treat the original node as legal. This
did not work, however, for many loads and stores, because they follow
slightly different code paths, and we did not account for the possibility of
LowerOperation returning its input at those call sites.
I think that we now handle this consistently everywhere. At the call sites in
LegalizeDAG, we used to assert in this case, so there's no functional change
for any existing code there. For the call sites in LegalizeVectorOps, this
really only affects whether or not we set Changed = true, but I think makes the
semantics clearer.
No test case here, but it will be covered by an upcoming PowerPC commit adding
QPX support.
llvm-svn: 230332
This patch teaches the backend how to expand a double-half conversion into
a double-float conversion immediately followed by a float-half conversion.
We do this only under fast-math, and if float-half conversions are legal
for the target.
Added test CodeGen/X86/fastmath-float-half-conversion.ll
Differential Revision: http://reviews.llvm.org/D7832
llvm-svn: 230276
The CONCAT_VECTORS combiner pass can transform the concat of two BUILD_VECTOR nodes into a single BUILD_VECTOR node.
This patch generalises this to support any number of BUILD_VECTOR nodes, and also permits UNDEF nodes to be included as well.
This was noticed as AVX vec128 -> vec256 canonicalization sometimes creates a CONCAT_VECTOR with a real vec128 lower and an vec128 UNDEF upper.
Differential Revision: http://reviews.llvm.org/D7816
llvm-svn: 230177
DAGCombine will rewrite an BUILD_VECTOR where all non-undef inputs some from
[US]INT_TO_FP, as a BUILD_VECTOR of integers with the conversion applied as a
vector operation. We check operation legality of the conversion, but fail to
check legality of the integer vector type itself. Because targets don't
normally override operation legality defaults for illegal types, we need to
check this also.
This came up in the context of the QPX vector entensions for PowerPC (which can
have legal floating-point vector types without corresponding legal integer
vector types). No in-tree test case for this yes, but one can be added once
the QPX support has been committed.
llvm-svn: 230176
When expanding a truncating store or extending load using vector extracts or
inserts and scalar stores and loads, we were giving each of these scalar stores
or loads the same alignment as the original vector operation. While this will
often be right (most vector operations, especially those produced by
autovectorization, have the alignment of the underlying scalar type), the
vector operation could certainly have a larger alignment.
No test case (yet); noticed by inspection.
llvm-svn: 230175
Synthesizing a call directly using the MI layer would confuse the frame
lowering code. This is problematic as frame lowering is highly
sensitive the particularities of calls, etc.
llvm-svn: 230129
This allows sharing of FMA forming combines to work
with instructions that have the same semantics as a separate
multiply and add.
This is expand by default, and only formed post legalization
so it shouldn't have much impact on targets that do not want it.
llvm-svn: 230070
First, don't combine bit masking into vector shuffles (even ones the
target can handle) once operation legalization has taken place. Custom
legalization of vector shuffles may exist for these patterns (making the
predicate return true) but that custom legalization may in some cases
produce the exact bit math this matches. We only really want to handle
this prior to operation legalization.
However, the x86 backend, in a fit of awesome, relied on this. What it
would do is mark VSELECTs as expand, which would turn them into
arithmetic, which this would then match back into vector shuffles, which
we would then lower properly. Amazing.
Instead, the second change is to teach the x86 backend to directly form
vector shuffles from VSELECT nodes with constant conditions, and to mark
all of the vector types we support lowering blends as shuffles as custom
VSELECT lowering. We still mark the forms which actually support
variable blends as *legal* so that the custom lowering is bypassed, and
the legal lowering can even be used by the vector shuffle legalization
(yes, i know, this is confusing. but that's how the patterns are
written).
This makes the VSELECT lowering much more sensible, and in fact should
fix a bunch of bugs with it. However, as you'll see in the test cases,
right now what it does is point out the *hilarious* deficiency of the
new vector shuffle lowering when it comes to blends. Fortunately, my
very next patch fixes that. I can't submit it yet, because that patch,
somewhat obviously, forms the exact and/or pattern that the DAG combine
is matching here! Without this patch, teaching the vector shuffle
lowering to produce the right code infloops in the DAG combiner. With
this patch alone, we produce terrible code but at least lower through
the right paths. With both patches, all the regressions here should be
fixed, and a bunch of the improvements (like using 2 shufps with no
memory loads instead of 2 andps with memory loads and an orps) will
stay. Win!
There is one other change worth noting here. We had hilariously wrong
vectorization cost estimates for vselect because we fell through to the
code path that assumed all "expand" vector operations are scalarized.
However, the "expand" lowering of VSELECT is vector bit math, most
definitely not scalarized. So now we go back to the correct if horribly
naive cost of "1" for "not scalarized". If anyone wants to add actual
modeling of shuffle costs, that would be cool, but this seems an
improvement on its own. Note the removal of 16 and 32 "costs" for doing
a blend. Even in SSE2 we can blend in fewer than 16 instructions. ;] Of
course, we don't right now because of OMG bad code, but I'm going to fix
that. Next patch. I promise.
llvm-svn: 229835
1) We should not try to simplify if the sext has multiple uses
2) There is no need to simplify is the source value is already sign-extended.
Patch by Gil Rapaport <gil.rapaport@intel.com>
Differential Revision: http://reviews.llvm.org/D6949
llvm-svn: 229659
This is a follow-on patch to:
http://reviews.llvm.org/D7093
That patch canonicalized constant splats as build_vectors,
and this patch removes the constant check so we can canonicalize
all splats as build_vectors.
This fixes the 2nd test case in PR22283:
http://llvm.org/bugs/show_bug.cgi?id=22283
The unfortunate code duplication between SelectionDAG and DAGCombiner
is discussed in the earlier patch review. At least this patch is just
removing code...
This improves an existing x86 AVX test and changes codegen in an ARM test.
Differential Revision: http://reviews.llvm.org/D7389
llvm-svn: 229511
This adds a safe interface to the machine independent InputArg struct
for accessing the index of the original (IR-level) argument. When a
non-native return type is lowered, we generate the hidden
machine-level sret argument on-the-fly. Before this fix, we were
representing this argument as OrigArgIndex == 0, which is an outright
lie. In particular this crashed in the AArch64 backend where we
actually try to access the type of the original argument.
Now we use a sentinel value for machine arguments that have no
original argument index. AArch64, ARM, Mips, and PPC now check for this
case before accessing the original argument.
Fixes <rdar://19792160> Null pointer assertion in AArch64TargetLowering
llvm-svn: 229413
directly into blends of the splats.
These patterns show up even very late in the vector shuffle lowering
where we don't have any chance for DAG combining to kick in, and
blending is a tremendously simpler operation to model. By coercing the
shuffle into a blend we can much more easily match and lower shuffles of
splats.
Immediately with this change there are significantly more blends being
matched in the x86 vector shuffle lowering.
llvm-svn: 229308
test.
This was just a matter of the DAG combine for vector shuffles being too
aggressive. This is a bit of a grey area, but I think generally if we
can re-use intermediate shuffles, we should. Certainly, given the test
cases I have available, this seems like the right call.
llvm-svn: 229285