Commit Graph

30931 Commits

Author SHA1 Message Date
Jon Roelofs
be8738324c [MachineVerifier] Diagnose invalid INSERT_SUBREGs
Differential revision: https://reviews.llvm.org/D105953
2021-07-20 17:32:29 -07:00
Jon Roelofs
a14b4e34a4 [GlobalISel] Tail call memcpy/memmove/memset even in the presence of copies
Differentail revision: https://reviews.llvm.org/D105382
2021-07-20 17:04:33 -07:00
Jon Roelofs
afaf92826e [GlobalISel] Mark memcpy/memmove/memset as thisreturn
https://clang.godbolt.org/z/9az64j8W6

rdar://77466123

Differential revision: https://reviews.llvm.org/D105370
2021-07-20 17:04:33 -07:00
Fangrui Song
3924877932 [IR] Rename comdat noduplicates to comdat nodeduplicate
In the textual format, `noduplicates` means no COMDAT/section group
deduplication is performed. Therefore, if both sets of sections are retained, and
they happen to define strong external symbols with the same names,
there will be a duplicate definition linker error.

In PE/COFF, the selection kind lowers to `IMAGE_COMDAT_SELECT_NODUPLICATES`.
The name describes the corollary instead of the immediate semantics.  The name
can cause confusion to other binary formats (ELF, wasm) which have implemented/
want to implement the "no deduplication" selection kind. Rename it to be clearer.

Reviewed By: rnk

Differential Revision: https://reviews.llvm.org/D106319
2021-07-20 12:47:10 -07:00
Stefan Pintilie
1a6dc92be7 [PowerPC] Inefficient register allocation of ACC registers results in many copies.
ACC registers are a combination of four consecutive vector registers.
If the vector registers are assigned first this often forces a number
of copies to appear just before the ACC register is created. If the ACC
register is assigned first then fewer copies are generated when the vector
registers are assigned.

This patch tries to force the register allocator to assign the ACC registers first
and then the UACC registers and then the vector pair registers. It does this
by changing the priority of the register classes.

This patch also adds hints to help the register allocator assign UACC registers from
known ACC registers and vector pair registers from known UACC registers.

Reviewed By: nemanjai

Differential Revision: https://reviews.llvm.org/D105854
2021-07-20 10:53:40 -05:00
Jeremy Morse
241f3e386c [DebugInfo][InstrRef] Fix a broken substitution method, add test coverage
This patch fixes a clearly-broken function that I absent-mindedly bodged
many months ago.

Over in D85749 I landed the substituteDebugValuesForInst, that creates
substitution records for all the def operands from one debug-labelled
instruction to the new one. Unfortunately it would crash if the two
instructions had different numbers of operands; I tried to fix this in
537f0fbe82 by adding a "max operand" parameter to the method, but then
didn't actually change the loop bound to take account of this. It passed
all the tests because.... well there wasn't any real test coverage of this
method.

This patch fixes up the loop to be bounded by the MaxOperand bound; and
adds test coverage for the x86-fixup-LEAs calls to this method, so that
it's actually tested.

Differential Revision: https://reviews.llvm.org/D105820
2021-07-20 11:45:13 +01:00
Matt Arsenault
c9ec807b11 CodeGen: Make MachineOptimizationRemarkEmitterPass a CFG analysis
This avoids rerunning it a few times.
2021-07-19 21:08:26 -04:00
Matt Arsenault
904dab55ab GlobalISel: Remove some mystery code that clears isReturned
I don't understand what this is going for, and haven't found an analog
in DAG code. No tests fail with this removed.
2021-07-19 20:21:05 -04:00
Amy Huang
fd972bb9fd Revert "[llvm][sve] Lowering for VLS truncating stores" because it
causes a seg fault (see https://reviews.llvm.org/D104471).

This reverts commit c305557acd.
2021-07-19 11:03:33 -07:00
Amara Emerson
03cdb5221d [GlobalISel] Fix load-or combine moving loads across potential aliasing stores.
Although this combine checks that there's no load folding barriers between
the loads that it's trying to merge, it was inserting the load at the
MIRBuilder's default insertion point, which is the G_OR use inst.

This was causing a miscompile in the test suite's
SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-bswap-2

Differential Revision: https://reviews.llvm.org/D106251
2021-07-19 10:23:23 -07:00
Craig Topper
50302feb1d [SelectionDAG][RISCV] Use isSExtCheaperThanZExt to control whether sext or zext is used for constant folding any_extend.
RISCV would prefer a sign extended constant since that works better
with our constant materialization. We have an existing TLI hook we
use to control sign extension of setcc operands in type legalization.
That hook happens to do the right check we need here, but might be
straying from its original purpose. With only RISCV defining this
hook in tree, I wasn't sure if it was worth adding another hook
with identical behavior.

This is an alternative to D105785 where I tried to handle this in
the RISCV backend by not creating ANY_EXTENDs in some places.

Reviewed By: frasercrmck

Differential Revision: https://reviews.llvm.org/D105918
2021-07-19 09:25:28 -07:00
Matt Arsenault
67d6132463 GlobalISel: Preserve memory types for implicit sret load/stores 2021-07-19 11:52:42 -04:00
Matt Arsenault
9236125ec8 GlobalISel: Preserve LLT when bitcasting loads and stores
This also avoids improperly legalizing some truncating vector stores.
2021-07-19 11:30:14 -04:00
Roman Lebedev
5b51bd1878 [TLI] prepareSREMEqFold(): use correct VT for the final VSELECT (PR51133)
We were using the wrong VT for this final VSELECT,
it should be in the final comparison VT,
not the source value's VT.

Fixes https://bugs.llvm.org/show_bug.cgi?id=51133
2021-07-19 16:44:00 +03:00
Eli Friedman
6601be4419 [X86] Remove incorrect use of known bits in shuffle simplification.
This reverts commit 2a419a0b99.

The result of a shufflevector must not propagate poison from any element
other than the one noted in the shuffle mask.

The regressions outside of fptoui-may-overflow.ll can probably be
recovered some other way; for example, using isGuaranteedNotToBePoison.

See discussion on https://reviews.llvm.org/D106053 for more background.

Differential Revision: https://reviews.llvm.org/D106222
2021-07-18 18:13:11 -07:00
Simon Pilgrim
fd7a54c709 [DAG] DAGCombiner::foldSelectOfBinops - propagate the common flags to the merged binop
As discussed on D106058 - we were failing to keep the common flags. This matches the behaviour in InstCombinerImpl::foldSelectOpOp.
2021-07-18 18:38:59 +01:00
Simon Pilgrim
5643be96bc [DAG] Enable foldSelectOfBinops on select(setcc(),binop(),binop()) calls 2021-07-18 18:38:59 +01:00
Simon Pilgrim
1a6a8443c2 [DAG] Move select(cc, binop(), binop()) folds into DAGCombiner::foldSelectOfBinops. NFCI.
I'm going to extend the functionality started in D106058 so move the folds into their own method to reduce the amount of code in DAGCombiner::visitSELECT
2021-07-18 14:54:41 +01:00
Amara Emerson
4c55cdb00a [GlobalISel] Fix known bits for G_BSWAP and B_BITREVERSE not doing anything.
llvm::KnownBits::byteSwap() and reverse() don't modify in-place, so
we weren't actually computing anything. This was causing a miscompile on an
arm64 stage2 bootstrap clang build.
2021-07-17 23:07:16 -07:00
Kazu Hirata
1993b73755 [Analaysis, CodeGen] Remove getHotSucc (NFC)
These functions seem to be unused for at least 5 years.
2021-07-17 07:31:36 -07:00
Amara Emerson
9637848f51 [GlobalISel] Fix non-pow-2 legalization of s56 stores.
s56 stores are broken down into s32 + s24 stores. During this step
both of those new stores use an anyextended s64 value, resulting in
truncating stores. With s56, the s24 requires another lower step to
make it legal, and we were crashing because we didn't expect non-pow-2
stores to also be truncating as well.

Differential Revision: https://reviews.llvm.org/D106183
2021-07-16 13:29:49 -07:00
Guozhi Wei
5609c8b607 [X86FixupLEAs] Try again to transform the sequence LEA/SUB to SUB/SUB
This patch transforms the sequence
    lea (reg1, reg2), reg3
    sub reg3, reg4
to two sub instructions
    sub reg1, reg4
    sub reg2, reg4

Similar optimization can also be applied to LEA/ADD sequence.

The modifications to TwoAddressInstructionPass is to ensure the operands of ADD
instruction has expected order (the dest register of LEA should be src register
of ADD).

Differential Revision: https://reviews.llvm.org/D104684
2021-07-16 10:16:03 -07:00
Jon Roelofs
6c40abb6fe Revert "[MachineVerifier] Diagnose invalid INSERT_SUBREGs"
This reverts commit dd57ba1a17.

It broke some tests: http://45.33.8.238/linux/51314/step_12.txt
2021-07-16 09:53:55 -07:00
Simon Pilgrim
95995673d1 [DAG] SelectionDAG::MaskedElementsAreZero - assert we're calling with a vector. NFCI.
Add an assertion that we've calling MaskedElementsAreZero with a vector op and that the DemandedElts arg is a matching width.

Makes the error a lot easier to grok when something else accidentally gets used.
2021-07-16 17:43:35 +01:00
Jon Roelofs
dd57ba1a17 [MachineVerifier] Diagnose invalid INSERT_SUBREGs
Differential revision: https://reviews.llvm.org/D105953
2021-07-16 09:43:12 -07:00
Matt Arsenault
5a0d940f2a GlobalISel: Preserve memory type for memset expansion 2021-07-16 11:41:32 -04:00
Matt Arsenault
f57f8f7ccc GlobalISel: Remove dead function 2021-07-16 08:59:25 -04:00
Jeremy Morse
231bf52119 [InstrRef][FastISel] Support emitting DBG_INSTR_REF from fast-isel
If you attach __attribute__((optnone)) to a function when using
optimisations, that function will use fast-isel instead of the usual
SelectionDAG method. This is a problem for instruction referencing,
because it means DBG_VALUEs of virtual registers will be created,
triggering some safety assertions in LiveDebugVariables. Those assertions
exist to detect exactly this scenario, where an unexpected piece of code is
generating virtual register references in instruction referencing mode.

Fix this by transforming the DBG_VALUEs created by fast-isel into
half-formed DBG_INSTR_REFs, after which they get patched up in
finalizeDebugInstrRefs. The test modified adds a fast-isel mode to the
instruction referencing isel test.

Differential Revision: https://reviews.llvm.org/D105694
2021-07-16 13:56:15 +01:00
Matt Arsenault
a2d7ace3e3 GlobalISel: Surface offsets parameter from ComputeValueVTs 2021-07-15 19:11:40 -04:00
Matt Arsenault
e91da668d0 GlobalISel: Track argument pointeriness with arg flags
Since we're still building on top of the MVT based infrastructure, we
need to track the pointer type/address space on the side so we can end
up with the correct pointer LLTs when interpreting CCValAssigns.
2021-07-15 19:11:40 -04:00
Amara Emerson
4e3dc6b8dd GlobalISel: Introduce GenericMachineInstr classes and derivatives for idiomatic LLVM RTTI.
This adds some level of type safety, allows helper functions to be added for
specific opcodes for free, and also allows us to succinctly check for class
membership with the usual dyn_cast/isa/cast functions.

To start off with, add variants for the different load/store operations with some
places using it.

Differential Revision: https://reviews.llvm.org/D105751
2021-07-15 15:21:57 -07:00
Jessica Paquette
5da0f9ab61 [GlobalISel] Fix infinite loop in reassociationCanBreakAddressingModePattern
It didn't update the opcode while walking through G_INTTOPTR/G_PTRTOINT.

Differential Revision: https://reviews.llvm.org/D106080
2021-07-15 10:09:07 -07:00
Simon Pilgrim
0aece73aba [DAG] Fold select(cond,binop(x,y),binop(x,z)) -> binop(x,select(cond,y,z))
Similar to the folds performed in InstCombinerImpl::foldSelectOpOp, this attempts to push a select further up to help merge a pair of binops.

I'm primarily interested in select(cond,add(x,y),add(x,z)) folds to help expose pointer math (see https://bugs.llvm.org/show_bug.cgi?id=51069 etc.) but I've tried to use the more generic isBinOp().

Differential Revision: https://reviews.llvm.org/D106058
2021-07-15 16:08:30 +01:00
Tim Northover
5d7632ee72 MachO: don't emit L... private symbols in do_not_dead_strip sections.
The linker can sometimes drop the do_not_dead_strip if it can't associate the
atom with a symbol (the other place to specify no dead-stripping in MachO
files).
2021-07-15 14:40:43 +01:00
Djordje Todorovic
fa2daaeff8 [2/2][RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This patch adds the forward scan for finding redundant DBG_VALUEs.

This analysis aims to remove redundant DBG_VALUEs by going forward
in the basic block by considering the first DBG_VALUE as a valid
until its first (location) operand is not clobbered/modified.
For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) <block of code that does affect $edi>
(3) DBG_VALUE $edi, !"var1", ...
 ...
in this case, we can remove (3).

Differential Revision: https://reviews.llvm.org/D105280
2021-07-15 00:08:31 -07:00
Kai Luo
b9c3941cd6 [PowerPC] Generate inlined quadword lock free atomic operations via AtomicExpand
This patch uses AtomicExpandPass to implement quadword lock free atomic operations. It adopts the method introduced in https://reviews.llvm.org/D47882, which expand atomic operations post RA to avoid spilling that might prevent LL/SC progress.

Reviewed By: jsji

Differential Revision: https://reviews.llvm.org/D103614
2021-07-15 01:12:09 +00:00
Stanislav Mekhanoshin
76b7d3432e [AMDGPU] Add TII::isIgnorableUse() to allow VOP rematerialization
Any def of EXEC prevents rematerialization of any VOP instruction
because of the physreg use. Create a callback to check if the
physreg use can be ingored to allow rematerialization.

Differential Revision: https://reviews.llvm.org/D105836
2021-07-14 13:03:58 -07:00
Eli Friedman
1e30bf8621 [SelectionDAG] Add an overload of getStepVector that assumes step 1.
This is mostly a minor convenience, but the pattern seems frequent
enough to be worthwhile (and we'll probably add more uses in the
future).

Differential Revision: https://reviews.llvm.org/D105850
2021-07-14 11:37:01 -07:00
Matt Arsenault
47269da5d8 GlobalISel: Handle lowering non-power-of-2 extloads 2021-07-14 11:54:11 -04:00
Djordje Todorovic
df686842bc [RemoveRedundantDebugValues] Add a Pass that removes redundant DBG_VALUEs
This new MIR pass removes redundant DBG_VALUEs.

After the register allocator is done, more precisely, after
the Virtual Register Rewriter, we end up having duplicated
DBG_VALUEs, since some virtual registers are being rewritten
into the same physical register as some of existing DBG_VALUEs.
Each DBG_VALUE should indicate (at least before the LiveDebugValues)
variables assignment, but it is being clobbered for function
parameters during the SelectionDAG since it generates new DBG_VALUEs
after COPY instructions, even though the parameter has no assignment.
For example, if we had a DBG_VALUE $regX as an entry debug value
representing the parameter, and a COPY and after the COPY,
DBG_VALUE $virt_reg, and after the virtregrewrite the $virt_reg gets
rewritten into $regX, we'd end up having redundant DBG_VALUE.

This breaks the definition of the DBG_VALUE since some analysis passes
might be built on top of that premise..., and this patch tries to fix
the MIR with the respect to that.

This first patch performs bacward scan, by trying to detect a sequence of
consecutive DBG_VALUEs, and to remove all DBG_VALUEs describing one
variable but the last one:

For example:

(1) DBG_VALUE $edi, !"var1", ...
(2) DBG_VALUE $esi, !"var2", ...
(3) DBG_VALUE $edi, !"var1", ...
 ...

in this case, we can remove (1).

By combining the forward scan that will be introduced in the next patch
(from this stack), by inspecting the statistics, the RemoveRedundantDebugValues
removes 15032 instructions by using gdb-7.11 as a testbed.

Differential Revision: https://reviews.llvm.org/D105279
2021-07-14 04:29:42 -07:00
Ruiling Song
40e3df2a1b [RegisterCoalescer] Resolve conflict based on liveness of subregister
Currently we are resolving lane/subregister conflict by visiting
instructions sequentially in current block to see whether there is any
use of the tainted lanes. To save compile time, we are not doing further
check in successor blocks. This sounds reasonable without subgregister liveness.

But since we have added subregister liveness tracking capability to
register coalescer, we can easily determine whether we have subregister
liveness conflict by checking subranges. This would help coalescing more
COPYs for target that enables subregister liveness tracking.

Reviewed by: arsenm, qcolombet

Differential Revision: https://reviews.llvm.org/D104509
2021-07-14 14:43:22 +08:00
Hongtao Yu
74b99b5c2e [CSSPGO] Do not import pseudo probe desc in thinLTO
Previously we reliedy on pseudo probe descriptors to look up precomputed GUID during probe emission for inlined probes. Since we are moving to always using unique linkage names, GUID for functions can be computed in place from dwarf names. This eliminates the need of importing pseudo probe descs in thinlto, since those descs should be emitted by the original modules.

This significantly reduces thinlto memory footprint in some extreme case where the number of imported modules for a single module is massive.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D105248
2021-07-13 18:26:36 -07:00
Matt Arsenault
eebe841a47 RegAlloc: Allow targets to split register allocation
AMDGPU normally spills SGPRs to VGPRs. Previously, since all register
classes are handled at the same time, this was problematic. We don't
know ahead of time how many registers will be needed to be reserved to
handle the spilling. If no VGPRs were left for spilling, we would have
to try to spill to memory. If the spilled SGPRs were required for exec
mask manipulation, it is highly problematic because the lanes active
at the point of spill are not necessarily the same as at the restore
point.

Avoid this problem by fully allocating SGPRs in a separate regalloc
run from VGPRs. This way we know the exact number of VGPRs needed, and
can reserve them for a second run.  This fixes the most serious
issues, but it is still possible using inline asm to make all VGPRs
unavailable. Start erroring in the case where we ever would require
memory for an SGPR spill.

This is implemented by giving each regalloc pass a callback which
reports if a register class should be handled or not. A few passes
need some small changes to deal with leftover virtual registers.

In the AMDGPU implementation, a new pass is introduced to take the
place of PrologEpilogInserter for SGPR spills emitted during the first
run.

One disadvantage of this is currently StackSlotColoring is no longer
used for SGPR spills. It would need to be run again, which will
require more work.

Error if the standard -regalloc option is used. Introduce new separate
-sgpr-regalloc and -vgpr-regalloc flags, so the two runs can be
controlled individually. PBQB is not currently supported, so this also
prevents using the unhandled allocator.
2021-07-13 18:49:29 -04:00
Guillaume Chatelet
2c47b8847e Revert "[llvm] Add enum iteration to Sequence"
This reverts commit a006af5d6e.
2021-07-13 16:44:42 +00:00
Guillaume Chatelet
a006af5d6e [llvm] Add enum iteration to Sequence
This patch allows iterating typed enum via the ADT/Sequence utility.

Differential Revision: https://reviews.llvm.org/D103900
2021-07-13 16:22:19 +00:00
Matt Arsenault
222fde1eec GlobalISel: Use extension instead of merge with undef in common case
This fixes not respecting signext/zeroext in these cases. In the
anyext case, this avoids a larger merge with undef and should be a
better canonical form.

This should also handle this if a merge is needed, but I'm not aware
of a case where that can happen. In a future change this will also
allow AMDGPU to drop some custom code without introducing regressions.
2021-07-13 11:04:47 -04:00
Matt Arsenault
77a608d9de GlobalISel: Remove getIntrinsicID utility function
This is redundant with a method directly on MachineInstr
2021-07-13 11:04:10 -04:00
Qiu Chaofan
954a15d639 [SelectionDAG] Check use before combining into USUBSAT
Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D105789
2021-07-13 14:50:26 +08:00
Jessica Paquette
47d0780f45 [GlobalISel] Handle more types in narrowScalar for eq/ne G_ICMP
Generalize the existing eq/ne case using `extractParts`. The original code only
handled narrowings for types of width 2n->n. This generalization allows for any
type that can be broken down by `extractParts`.

General overview is:

- Loop over each narrow-sized part and do exactly what the 2-register case did.
- Loop over the leftover-sized parts and do the same thing
- Widen the leftover-sized XOR results to the desired narrow size
- OR that all together and then do the comparison against 0 (just like the old
  code)

This shows up a lot when building clang for AArch64 using GlobalISel, so it's
worth fixing. For the sake of simplicity, this doesn't handle the non-eq/ne
case yet.

Also remove the code in this case that notifies the observer; we're just going
to delete MI anyway so talking to the observer shouldn't be necessary.

Differential Revision: https://reviews.llvm.org/D105161
2021-07-12 22:18:50 -07:00
Arthur Eubanks
7987c46273 [OpaquePtr][ISel] Use ArgListEntry::IndirectType more 2021-07-12 21:14:35 -07:00