There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.
Teach the cost model to consider f16 interleaved operations as
expensive. Otherwise, we are all but guaranteed to end up with
a large block of scalarized vector code.
llvm-svn: 294819
There are no vldN/vstN f16 variants, even with +fullfp16.
We could use the i16 variants, but, in practice, even with +fullfp16,
the f16 sequence leading to the i16 shuffle usually gets scalarized.
We'd need to improve our support for f16 codegen before getting there.
Reject f16 interleaved accesses. If we try to emit the f16 intrinsics,
we'll just end up with a selection failure.
llvm-svn: 294818
Summary:
powerpc64 big-endian is not supported, but I believe that most logic can
be shared, except for xray_powerpc64.cc.
Also add a function InvalidateInstructionCache to xray_util.h, which is
copied from llvm/Support/Memory.cpp. I'm not sure if I need to add a unittest,
and I don't know how.
Reviewers: dberris, echristo, iteratee, kbarton, hfinkel
Subscribers: mehdi_amini, nemanjai, mgorny, llvm-commits
Differential Revision: https://reviews.llvm.org/D29742
llvm-svn: 294781
Since r274013, we've been looking through bitcasts on broadcast inputs.
In the scalar-folding case (from a load, build_vector, or sc2vec),
the input type didn't matter, as we'd simply bitcast the resulting
scalar back.
However, when broadcasting a 128-bit-lane-aligned element, we create an
EXTRACT_SUBVECTOR. Use proper types, by creating an extract_subvector
of the original input type.
llvm-svn: 294774
In the encoding of system registers in the M-class MSR instruction the mask bits
should be 2 for registers that don't take a _<bits> qualifier (the instruction
is unpredictable otherwise), and should also be 2 if the register takes a
_<bits> qualifier but it's not present as no _<bits> is an alias for _nzcvq.
Differential Revision: https://reviews.llvm.org/D29828
llvm-svn: 294762
This makes sure we get the same redefinition rules regardless of who
is printing (asm parser, codegen) and to what (asm, obj).
This fixes an unintentional regression in r293936.
llvm-svn: 294752
In some cases we call getTargetConstantBitsFromNode for nodes that haven't been lowered from BUILD_VECTOR yet
Note: We're getting very close to being able to move most of the constant extraction code from getTargetShuffleMaskIndices into getTargetConstantBitsFromNode
llvm-svn: 294746
until we can get better TargetMachine::isCompatibleDataLayout to compare - otherwise
we can't code generate existing bitcode without a string equality data layout.
This reverts commit r294702.
llvm-svn: 294709
For other platforms we should find out what they need and likely
make the same change, however, a smaller additional change is easier
for platforms we know have it specified in the ABI. As part of this
rewrite some of the handling in the backends for data layout and update
a bunch of testcases.
Based on a patch by Simonas Kazlauskas!
llvm-svn: 294702
This change returns empty PSet list for M0 register. Otherwise its
PSet as defined by tablegen is SReg_32. This results in incorrect
register pressure calculation every time an instruction uses M0.
Such uses count as SReg_32 PSet and inadequately increase pressure
on SGPRs.
Differential Revision: https://reviews.llvm.org/D29798
llvm-svn: 294691
Gcc supports target armv7ve which is armv7-a with virtualization
extensions. This change adds support for this in llvm for gcc
compatibility.
Also remove redundant FeatureHWDiv, FeatureHWDivARM for a few models as
this is specified automatically by FeatureVirtualization.
Patch by Manoj Gupta.
Differential Revision: https://reviews.llvm.org/D29472
llvm-svn: 294661
This requires that we communicate to X86InstrInfo::optimizeCompareInstr
that the second operand is neither a register nor an immediate. The way we
do that is by setting CmpMask to zero.
Note that there were already instructions where the second operand was not a
register nor an immediate, namely X86::SUB*rm, so also set CmpMask to zero
for those instructions. This seems like a latent bug, but I was unable to
trigger it.
Differential Revision: https://reviews.llvm.org/D28621
llvm-svn: 294634
Summary:
Convert all obvious node_begin/node_end and child_begin/child_end
pairs to range based for.
Sending for review in case someone has a good idea how to make
graph_children able to be inferred. It looks like it would require
changing GraphTraits to be two argument or something. I presume
inference does not happen because it would have to check every
GraphTraits in the world to see if the noderef types matched.
Note: This change was 3-staged with clang as well, which uses
Dominators/etc from LLVM.
Reviewers: chandlerc, tstellarAMD, dblaikie, rsmith
Subscribers: arsenm, llvm-commits, nhaehnle
Differential Revision: https://reviews.llvm.org/D29767
llvm-svn: 294620
In combineOrCmpEqZeroToCtlzSrl, replace "getConstantOperand == 0" by "isNullConstant" to account for floating point constants.
Differential Revision: https://reviews.llvm.org/D29756
llvm-svn: 294588
LowerBuildVectorv16i8/LowerBuildVectorv8i16 insert values into a UNDEF vector if the build vector doesn't contain any zero elements, resulting in register dependencies with a previous use of the register.
This patch attempts to break the register dependency by either always zeroing the vector before hand or (if we're inserting to the 0'th element) by using VZEXT_MOVL(SCALAR_TO_VECTOR(i32 AEXT(Elt))) which lowers to (V)MOVD and performs a similar function. Additionally (V)MOVD is a shorter instruction than PINSRB/PINSRW. We already do something similar for SSE41 PINSRD.
On pre-SSE41 LowerBuildVectorv16i8 we go a little further and use VZEXT_MOVL(SCALAR_TO_VECTOR(i32 ZEXT(Elt))) if the build vector contains zeros to avoid the vector zeroing at the cost of a scalar zero extension, which can probably be brought over to the other cases in a future patch in some cases (load folding etc.)
Differential Revision: https://reviews.llvm.org/D29720
llvm-svn: 294581
We only implemented it for one of the 3 HLE instructions and that instruction is also under the RTM flag. Clang only implements the RTM flag from its command line.
llvm-svn: 294562
This patch does the following.
1. Adds an Intrinsic int_x86_clzero which works with __builtin_ia32_clzero
2. Identifies clzero feature using cpuid info. (Function:8000_0008, Checks if EBX[0]=1)
3. Adds the clzero feature under znver1 architecture.
4. The custom inserter is added in Lowering.
5. A testcase is added to check the intrinsic.
6. The clzero instruction is added to assembler test.
Patch by Ganesh Gopalasubramanian with a couple formatting tweaks, a disassembler test, and using update_llc_test.py from me.
Differential revision: https://reviews.llvm.org/D29385
llvm-svn: 294558
Functions that have a dynamic alloca require a base register which is defined to
be X19 on AArch64 and r6 on ARM. We have defined the swifterror register to be
the same register. Use a different callee save register for swifterror instead:
X21 on AArch64
R8 on ARM
rdar://30433803
llvm-svn: 294551
We mark X0 as preserved by a call that passes the returned parameter.
x0 = ...
fun(x0) // no implicit def of x0
This no longer is valid if we pass the parameter in a different register then
the returned value as is the case with a swiftself parameter (passed in x20).
x20 = ...
fun(x20) // there should be an implict def of x8
rdar://30425845
llvm-svn: 294527
AArch64 has specific instructions to multiply two numbers at double the width
and produce the high part of the result. These can be used to implement LLVM's
mul.with.overflow instructions fairly simply. Helps with C++ operator new[].
llvm-svn: 294519
Implement getRegPressureLimit and getRegPressureSetLimit callbacks in
SIRegisterInfo.
This makes standard converge scheduler to behave almost the same as
GCNScheduler, sometime slightly better sometimes a bit worse.
In gerenal that is also possible to switch GCNScheduler to use these
callbacks instead of getMaxWaves(), which also makes GCNScheduler
slightly better on some tests and slightly worse on another. A big
win is behavior with converge scheduler.
Note, these are used not only by scheduling, but in places like
MachineLICM.
Differential Revision: https://reviews.llvm.org/D29700
llvm-svn: 294518