In AVX 256bit vectors are valid vectors and therefore the Type Legalizer doesn't
split the VSELECT and SETCC nodes. AVX only supports MIN/MAX on 128bit vectors
and this fix enables vector splitting for this special case in the X86 DAG
Combiner.
This fix is related to PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191131
The Type Legalizer recognizes that VSELECT needs to be split, because the type
is to wide for the given target. The same does not always apply to SETCC,
because less space is required to encode the result of a comparison. As a result
VSELECT is split and SETCC is unrolled into scalar comparisons.
This commit fixes the issue by checking for VSELECT-SETCC patterns in the DAG
Combiner. If a matching pattern is found, then the result mask of SETCC is
promoted to the expected vector mask for the given target. This mask has usually
te same size as the VSELECT return type (except for Intel KNL). Now the type
legalizer will split both VSELECT and SETCC.
This allows the following X86 DAG Combine code to sucessfully detect the MIN/MAX
pattern. This fixes PR16695, PR17002, and <rdar://problem/14594431>.
llvm-svn: 191130
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191049
C-like languages promote types like unsigned short to unsigned int before
performing an arithmetic operation. Currently the rotate matcher in the
DAGCombiner does not consider this situation.
This commit extends the DAGCombiner in the way that the pattern
(or (shl ([az]ext x), (*ext y)), (srl ([az]ext x), (*ext (sub 32, y))))
is folded into
([az]ext (rotl x, y))
The matching is restricted to aext and zext because in this cases the upper
bits are either undefined or known. Test case is included.
This fixes PR16726.
llvm-svn: 191045
When selecting the DAG (add (WrapperRIP ...), (FrameIndex ...)), X86 code had
spotted the FrameIndex possibility and was working out whether it could fold
the WrapperRIP into this.
The test for forming a %rip version is notionally whether we already have a
base or index register (%rip precludes both), but we were forgetting to account
for the register that would be inserted later to access the frame.
rdar://problem/15024520
llvm-svn: 190995
1) make sure that the first two instructions of the sequence cannot
separate from each other. The linker requires that they be sequential.
If they get separated, it can still work but it will not work in all
cases because the first of the instructions mostly involves the hi part
of the pc relative offset and that part changes slowly. You would have
to be at the right boundary for this to matter.
2) make sure that this sequence begins on a longword boundary.
There appears to be a bug in binutils which makes some of these calculations
get messed up if the instruction sequence does not begin on a longword
boundary. This is being investigated with the appropriate binutils folks.
llvm-svn: 190966
For some reason I never got around to adding these at the same time as
the signed versions. No idea why.
I'm not sure whether this SystemZII::BranchC* stuff is useful, or whether
it should just be replaced with an "is normal" flag. I'll leave that
for later though.
There are some boundary conditions that can be tweaked, such as preferring
unsigned comparisons for equality with [128, 256), and "<= 255" over "< 256",
but again I'll leave those for a separate patch.
llvm-svn: 190930
Summary:
We indicate that the object files are safe by emitting a @feat.00
absolute address symbol. The address is presumably interpreted as a
bitfield of features that the compiler would like to enable. Bit 0 is
documented in the PE COFF spec to opt in to "registered SEH", which is
what /safeseh enables.
LLVM's object files are safe by default because LLVM doesn't know how to
produce SEH handlers.
Reviewers: Bigcheese
CC: llvm-commits
Differential Revision: http://llvm-reviews.chandlerc.com/D1691
llvm-svn: 190898
Large code model on PPC64 requires creating and referencing TOC entries when
using the addis/ld form of addressing. This was not being done in all cases.
The changes in this patch to PPCAsmPrinter::EmitInstruction() fix this. Two
test cases are also modified to reflect this requirement.
Fast-isel was not creating correct code for loading floating-point constants
using large code model. This also requires the addis/ld form of addressing.
Previously we were using the addis/lfd shortcut which is only applicable to
medium code model. One test case is modified to reflect this requirement.
llvm-svn: 190882
When a truncate node defines a legal vector type but uses an illegal
vector type, the legalization process was splitting the vector until
<1 x vector> type, but then it was failing to scalarize the node because
it did not know how to handle TRUNCATE.
<rdar://problem/14989896>
llvm-svn: 190830
- check that -mcpu=slm uses the call register indirect optimization
- check that -mcpu=slm runs the scheduler
- check that -mcpu=slm supports the movbe instruction
llvm-svn: 190814
The port originally had special patterns for extload, mapping them to the
same instructions as sextload. It seemed neater to have patterns that
match "an extension that is allowed to be signed" and "an extension that
is allowed to be unsigned".
This was originally meant to be a clean-up, but it does improve the handling
of promoted integers a little, as shown by args-06.ll.
llvm-svn: 190777
This is a re-commit of r190764, with an extra check to make sure that we're not
performing the transformation on illegal types (a small test case has been
added for this as well).
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190771
This is causing test-suite failures.
Original commit message:
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190765
The PPC backend uses a target-specific DAG combine to turn unaligned Altivec
loads into a permutation-based sequence when possible. Unfortunately, the
target-specific DAG combine is not always called on all loads of interest
(sometimes the routines in DAGCombine call CombineTo such that the new node and
users are not added to the worklist); allowing the combine to trigger early
(before type legalization) mitigates this problem. Because the autovectorizers
only create legal vector types, I don't expect a lot of cases where this
optimization is enabled by type legalization in practice.
llvm-svn: 190764
DAGCombiner::isAlias can be called with SrcValue1 or SrcValue2 null, and we
can't use AA in this case (if we try, then the casting code in AA will assert).
llvm-svn: 190763
When a structure is passed by value, and that structure contains a vector
member, according to the PPC ABI, the structure will receive enhanced alignment
(so that the vector within the structure will always be aligned).
This should resolve PR16641.
llvm-svn: 190636
In fast-math mode sqrt(x) is calculated using the fast expansion of the
reciprocal of the reciprocal sqrt expansion. The reciprocal and reciprocal
sqrt expansions use the associated estimate instructions along with some Newton
iterations. Unfortunately, as a result, sqrt(0) was being calculated as NaN,
which is not correct. Now we explicitly return a result of zero if the input is
zero.
llvm-svn: 190624
Aggressive anti-dependency breaking is enabled by default for all PPC cores.
This provides a general speedup on the P7 and other platforms (among other
factors, the instruction group formation for the non-embedded PPC cores is done
during post-RA scheduling). In order to do this safely, the incompatibility
between uses of the MFOCRF instruction and anti-dependency breaking are
resolved by marking MFOCRF with hasExtraSrcRegAllocReq. As noted in the removed
FIXME, the problem was that MFOCRF's output is sensitive to the identify of the
source register, and always paired with a shift to undo this effect. Because
anti-dependency breaking is unaware of this hidden dependency of the shift
amount on the source register of the MFOCRF instruction, changing that register
must be inhibited.
Two test cases were adjusted: The SjLj test was made more insensitive to
register choices and scheduling; the saveCR test disabled anti-dependency
breaking because part of what it is testing is proper register reuse.
llvm-svn: 190587
For _XYZ, the type of VDATA is v4i32, because v3i32 doesn't exist.
The ADDR64 bit is not exposed. A simpler intrinsic that doesn't take
a resource descriptor might be nicer.
The maximum number of input SGPRs is bumped to 17.
Signed-off-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Tom Stellard <thomas.stellard@amd.com>
llvm-svn: 190575