We previously only allowed truncs as sinks, but now allow them as
sources too. We do this by checking that the result type is the
narrow type that we're trying to optimise for.
Differential Revision: https://reviews.llvm.org/D51978
llvm-svn: 342141
Part of FixConsts wrongly assumes either a 8- or 16-bit constant
which can result in the wrong constants being generated during
promotion.
Differential Revision: https://reviews.llvm.org/D52032
llvm-svn: 342140
In r319995, we fixed the line table format to version 2 on Darwin
because dsymutil didn't yet understand the new format which caused test
failures for the LLDB bots. This has been resolved in the meantime so
there's no reason to keep this limitation.
rdar://problem/35968332
llvm-svn: 342136
If an argument was passed on the stack, this
was using the default alignment.
I'm not sure there's an observable change from this. This
was observable due to bugs in expansion of unaligned
loads and stores, but since that is fixed I don't think
this matters much.
llvm-svn: 342133
The Technical Reference Manuals for these two CPUs state that branching
to an unaligned 32-bit instruction incurs an extra pipeline reload
penalty. That's bad.
This also enables the optimization at -Os since it costs on average one
byte per loop in return for 1 cycle per iteration, which is pretty good
going.
llvm-svn: 342127
Summary:
Previously we type legalized v2i32 div/rem by promoting to v2i64. But we don't support div/rem of vectors so op legalization would then scalarize it using i64 scalar ops since it doesn't know about the original promotion. 64-bit scalar divides on Intel hardware are known to be slow and in 32-bit mode they require a libcall.
This patch switches type legalization to do the scalarizing itself using i32.
It looks like the division by power of 2 optimization is still kicking in and leaving the code as a vector. The division by other constant optimization doesn't kick in pre type legalization since it ignores illegal types. And previously, after type legalization we scalarized the v2i64 since we don't have v2i64 MULHS/MULHU support.
Another option might be to widen v2i32 to v4i32 so we could do division by constant optimizations, but we'd have to be careful to only do that for constant divisors or we risk scalaring to 4 scalar divides.
Reviewers: RKSimon, spatel
Reviewed By: spatel
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51325
llvm-svn: 342114
Shufflevector instructions in LLVM IR that extract a subset of elements
of a longer input into a shorter vector can be done using VECTOR_SHUFFLEs.
This will avoid expanding them into constly extracts and inserts.
llvm-svn: 342091
Summary:
rL341389 broke code with tied register operands in inline assembly. For
example, `asm("" : "=r"(var) : "0"(var));`
The code above specifies the input operand to be in the same register
with the output operand, tying the two register. This patch makes this
kind of code work again.
Reviewers: dschuff
Subscribers: sbc100, jgravelle-google, eraman, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51991
llvm-svn: 342084
Summary:
Some FPMathOperators succeed and the retrieve FMF context when they never have it, we should omit these cases to keep from removing FMF context.
For instance when we visit some FPMathOperator mapped Instructions which never have FMF flags and a Node was associated which does have FMF flags, that Node today will have all its flags cleared via the intersect operation. With this change, we exclude associating Nodes that never have FPMathOperator status under FMF.
Reviewers: spatel, wristow, arsenm, hfinkel, aemerson
Reviewed By: spatel
Subscribers: llvm-commits, wdng
Differential Revision: https://reviews.llvm.org/D51145
llvm-svn: 342081
Move isa version determination into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
llvm-svn: 342069
Summary:
Match the ordering semantics of non-vector comparisons. For
floating point comparisons that do not correspond to instructions, the
tests check that some vector comparison instruction was emitted but do
not care about the full implementation.
Reviewers: aheejin, dschuff
Subscribers: sbc100, jgravelle-google, sunfish, llvm-commits
Differential Revision: https://reviews.llvm.org/D51765
llvm-svn: 342064
There's no advantage to this instruction unless you need to avoid touching other flag bits. It's encoding is longer, it can't fold an immediate, it doesn't write all the flags.
I don't think gcc will generate this instruction either.
Fixes PR38852.
Differential Revision: https://reviews.llvm.org/D51754
llvm-svn: 342059
This patch adds codegen support for the saving/restoring
V8-V23 for functions specified with the aarch64_vector_pcs
calling convention attribute, as added in patch D51477.
Reviewers: t.p.northover, gberry, thegameg, rengolin, javed.absar, MatzeB
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D51479
llvm-svn: 342049
SMLAD and SMLALD instructions also come in the form of SMLADX and
SMLALDX which perform an exchange on their second operand. To support
this, more of the loads in the MAC candidates are compared for
sequential access and a boolean value has been added to BinOpChain.
AddMACCandiate has been refactored into a small pattern matching
state machine to reduce the amount of duplicated code, but also to
enable the matching to be more flexible. CreateParallelMACPairs now
iterates through all the candidates to find parallel ones.
Differential Revision: https://reviews.llvm.org/D51424
llvm-svn: 342033
Summary:
In GNUX23, is64BitMode returns true, but pointers are 32-bits. So we shouldn't copy pointer values into RSI/RDI since the widths don't match.
Fixes PR38865 despite what the title says. I think the llvm_unreachable in the copyPhysReg code tricked the optimizer and made the fatal error trigger.
Reviewers: rnk, efriedma, MatzeB, echristo
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D51893
llvm-svn: 342015
Since the outliner is a module pass, it doesn't get codegen size remarks like
the other codegen passes do. This adds size remarks *to* the outliner.
This is kind of a workaround, so it's peppered with FIXMEs; size remarks
really ought to not ever be handled by the pass itself. However, since the
outliner is the only "MachineModulePass", this works for now. Since the
entire purpose of the MachineOutliner is to produce code size savings, it
really ought to be included in codgen size remarks.
If we ever go ahead and make a MachineModulePass (say, something similar to
MachineFunctionPass), then all of this ought to be moved there.
llvm-svn: 342009
Summary: Initial support for nsw, nuw and exact flags in MI
Reviewers: spatel, hfinkel, wristow
Reviewed By: spatel
Subscribers: nlopes
Differential Revision: https://reviews.llvm.org/D51738
llvm-svn: 341996
into TargetParser.
Also switch away from target features to CPU string when
determining isa version. This fixes an issue when we
output wrong isa version in the object code when features
of a particular CPU are altered (i.e. gfx902 w/o xnack
used to result in gfx900).
Differential Revision: https://reviews.llvm.org/D51890
llvm-svn: 341982
In r337348, I changed lowering to prefer X86ISD::UNPCKL/UNPCKH opcodes over MOVLHPS/MOVHLPS for v2f64 {0,0} and {1,1} shuffles when we have SSE2. This enabled the removal of a bunch of weirdly bitcasted isel patterns in r337349. To avoid changing the tests I placed a gross hack in isel to still emit movhlps instructions for fake unary unpckh nodes. A similar hack was not needed for unpckl and movlhps because we do execution domain switching for those. But unpckh and movhlps have swapped operand order.
This patch removes the hack.
This is a code size increase since unpckhpd requires a 0x66 prefix and movhlps does not. But if that's a big concern we should be using movhlps for all unpckhpd opcodes and let commuteInstruction turnit into unpckhpd when its an advantage.
Differential Revision: https://reviews.llvm.org/D49499
llvm-svn: 341973
GNUX32 uses 32-bit pointers despite is64BitMode being true. So we should use EAX to return the value.
Fixes ones of the failures from PR38865.
Differential Revision: https://reviews.llvm.org/D51940
llvm-svn: 341972
An fp_to_sint node would be incorrectly lowered to a TruncIntFP node in
single-float mode. This would trigger an "Unexpected illegal type!"
assert.
Patch by Dan Ravensloft.
Differential revision: https://reviews.llvm.org/D51810
llvm-svn: 341952
Summary:
The undef and the infinite loop at the end cause this test to be translated
unpredictably. In particular, the checked-for `mpy` disappears under
certain legal optimizations (e.g. the one in D50222).
Since the use of these constructs is not relevant to the behavior tested,
according to the header comment, this change, suggested by @kparzysz,
eliminates them.
Was initially committed in r341046, but was reverted.
Patch by: hermord (Dmytro Shynkevych)!
Reviewers: kparzysz
Reviewed By: kparzysz
Subscribers: lebedev.ri, llvm-commits, kparzysz
Differential Revision: https://reviews.llvm.org/D50944
llvm-svn: 341943
Search from i64 reducing phis, as well as i32, to allow the
generation of smlald instructions.
Differential Revision: https://reviews.llvm.org/D51101
llvm-svn: 341941
MIPS ISAs start to support third operand for the `rdhwr` instruction
starting from Revision 6. But LLVM generates assembler code with
three-operands version of this instruction on any MIPS64 ISA. The third
operand is always zero, so in case of direct code generation we get
correct code.
This patch fixes the bug by adding an instruction alias. The same alias
already exists for 32-bit ISA.
Ideally, we also need to reject three-operands version of the `rdhwr`
instruction in an assembler code if ISA revision is less than 6. That is
a task for a separate patch.
This fixes PR38861 (https://bugs.llvm.org/show_bug.cgi?id=38861)
Differential revision: https://reviews.llvm.org/D51773
llvm-svn: 341919
MOVMSKPS and MOVMSKPD both take FP types, but likely the operations before it are on integer types with just a int->fp bitcast between them. If the bitcast isn't used by anything else and doesn't change the element width we can look through it to simplify the integer ops.
llvm-svn: 341915
These are test cases inspired by sequences like below for extracting the same bit from every vector element and checking for all zeros/ones.
define i1 @and256_x8(<8 x i32>) {
%a = trunc <8 x i32> %0 to <8 x i1>
%b = bitcast <8 x i1> %a to i8
%d = icmp eq i8 %b, -1
ret i1 %d
}
This is what the above looks like after InstCombine.
define i1 @and256_x8_opt(<8 x i32>) {
%2 = and <8 x i32> %0, <i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1, i32 1>
%a = icmp ne <8 x i32> %2, zeroinitializer
%b = bitcast <8 x i1> %a to i8
%d = icmp eq i8 %b, -1
ret i1 %d
}
llvm-svn: 341908
We should never abort on valid IR. The most reasonable
interpretation of an arbitrary address space pointer is
probably some kind of special subset of global memory.
llvm-svn: 341894
This already worked if only one register piece was used,
but didn't if a type was split into multiple, unequal
sized pieces.
Fixes not splitting 3i16/v3f16 into two registers for
AMDGPU.
This will also allow fixing the ABI for 16-bit vectors
in a future commit so that it's the same for all subtargets.
llvm-svn: 341801
Summary:
This fixes a bug where a large number of implicit def instructions can fill the GCNHazardRecognizer lookahead buffer causing required NOPs to not be inserted.
Reviewers: nhaehnle, arsenm
Reviewed By: arsenm
Subscribers: sheredom, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits
Differential Revision: https://reviews.llvm.org/D51726
Change-Id: Ie75338f94de704ee5816b05afd0c922c6748a95b
llvm-svn: 341798
We have isel patterns for v4i32/v4f64 that artificially widen to v8i32/v8f64 so just use that.
If x86-experimental-vector-widening-legalization is enabled, we don't need any custom legalization and can just return. I've modified the test RUN lines to cover this case.
llvm-svn: 341765
This is the DAG equivalent of D51433.
If we know we're not using all vector lanes, use that knowledge to potentially simplify a vselect condition.
The reduction/horizontal tests show that we are eliminating AVX1 operations on the upper half of 256-bit
vectors because we don't need those anyway.
I'm not sure what the pr34592 test is showing. That's run with -O0; is SimplifyDemandedVectorElts supposed
to be running there?
Differential Revision: https://reviews.llvm.org/D51696
llvm-svn: 341762