Commit Graph

31598 Commits

Author SHA1 Message Date
Mircea Trofin
fa99cb64ff [mlgo][regalloc] Add score calculation for training
Add the calculation of a score, which will be used during ML training. The
score qualifies the quality of a regalloc policy, and is independent of
what we train (currently, just eviction), or the regalloc algo itself.
We can then use scores to guide training (which happens offline), by
formulating a reward based on score variation - the goal being lowering
scores (currently, that reward is percentage reduction relative to
Greedy's heuristic)

Currently, we compute the score by factoring different instruction
counts (loads, stores, etc) with the machine basic block frequency,
regardless of the instructions' provenance - i.e. they could be due to
the regalloc policy or be introduced previously. This is different from
RAGreedy::reportStats, which accummulates the effects of the allocator
alone. We explored this alternative but found (at least currently) that
the more naive alternative introduced here produces better policies. We
do intend to consolidate the two, however, as we are actively
investigating improvements to our reward function, and will likely want
to re-explore scoring just the effects of the allocator.

In either case, we want to decouple score calculation from allocation
algorighm, as we currently evaluate it after a few more passes after
allocation (also, because score calculation should be reusable
regardless of allocation algorithm).

We intentionally accummulate counts independently because it facilitates
per-block reporting, which we found useful for debugging - for instance,
we can easily report the counts indepdently, and then cross-reference
with perf counter measurements.

Differential Revision: https://reviews.llvm.org/D115195
2021-12-07 09:00:27 -08:00
spupyrev
f573f6866e ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424
2021-12-07 07:31:10 -08:00
Paulo Matos
2fd634a5e3 [WebAssembly] Implement table instruction intrinsics
This change implements intrinsics for table.grow, table.fill,
table.size, and table.copy.

Differential Revision: https://reviews.llvm.org/D113420
2021-12-07 13:25:59 +01:00
Fraser Cormack
40d51de5cb [SelectionDAG] Use UnknownSize for VP memory ops
In the style of D113888, this patch updates the various VP memory
operations (load, store, gather, scatter) to use UnknownSize. This is
for the same reason as for masked loads and stores: the number of
elements accessed is not generally known at compile time.

This is somewhat pessimistic in the sense that we may still find
un-canonicalized intrinsics featuring both an all-true mask and an EVL
equal to the vector size. Arguably those should be canonicalized before
the SelectionDAG, so those have been left for future work.

Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D115036
2021-12-07 10:51:02 +00:00
Fraser Cormack
3460cc2585 [VP] Propagate align parameter attr on VP load/store to ISel
This patch fixes a case where the 'align' parameter attribute on the
pointer operands to llvm.vp.load and llvm.vp.store was being dropped
during the conversion to the SelectionDAG. The default alignment
equal to the ABI type alignment of the vector type was kept. It also
updates the documentation to reflect the fact that the parameter
attribute is now properly supported.

Reviewed By: simoll

Differential Revision: https://reviews.llvm.org/D114422
2021-12-07 10:16:16 +00:00
Sameer Sahasrabuddhe
0fe61ecc2c CycleInfo: Introduce cycles as a generalization of loops
LLVM loops cannot represent irreducible structures in the CFG. This
change introduce the concept of cycles as a generalization of loops,
along with a CycleInfo analysis that discovers a nested
hierarchy of such cycles. This is based on Havlak (1997), Nesting of
Reducible and Irreducible Loops.

The cycle analysis is implemented as a generic template and then
instatiated for LLVM IR and Machine IR. The template relies on a new
GenericSSAContext template which must be specialized when used for
each IR.

This review is a restart of an older review request:
https://reviews.llvm.org/D83094

Original implementation by Nicolai Hähnle <nicolai.haehnle@amd.com>,
with recent refactoring by Sameer Sahasrabuddhe <sameer.sahasrabuddhe@amd.com>

Differential Revision: https://reviews.llvm.org/D112696
2021-12-07 12:02:34 +05:30
Mircea Trofin
2bd7384d3a [NFC][MachineInstr] Pass-by-value DebugLoc in CreateMachineInstr
DebugLoc is cheap to move, passing it by-val rather than const ref to
take advantage of the fact that it is consumed that way by the
MachineInstr ctor, which creates some optimization oportunities.

Differential Revision: https://reviews.llvm.org/D115208
2021-12-06 19:42:19 -08:00
Kai Luo
b206ee6906 [MachineVerifier] Make TiedOpsRewritten computable in MIRParser
This patch is to address post-commit comment https://reviews.llvm.org/D80538#anchor-inline-1091625, which make the constraint stronger based on what https://reviews.llvm.org/D80538 does, i.e., "TiedOpsRewritten is set iff leave-ssa and all tied operands share the same register".

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D114573
2021-12-07 02:25:15 +00:00
Mircea Trofin
615e374252 [NFC][MachineInstr] Rename some vars to conform to coding style 2021-12-06 17:19:11 -08:00
Nico Weber
3678326d28 Revert "ext-tsp basic block layout"
This reverts commit c68f71eb37.

Breaks tests on arm hosts, see comments on https://reviews.llvm.org/D113424
2021-12-06 19:08:20 -05:00
James Nagurne
cc3bb85580 [llvm][Hexagon] Generalize VLIWResourceModel, VLIWMachineScheduler, and ConvergingVLIWScheduler
The Pre-RA VLIWMachineScheduler used by Hexagon is a relatively generic
implementation that would make sense to use on other VLIW targets.

This commit lifts those classes into their own header/source file with the
root VLIWMachineScheduler. I chose this path rather than adding the
strategy et al. into MachineScheduler to avoid bloating the file with other
implementations.

Target-specific behaviors have been captured and replicated through
function overloads.

- Added an overloadable DFAPacketizer creation member function. This is
  mainly done for our downstream, which has the capability to override
  the DFAPacketizer with custom implementations. This is an upstreamable
  TODO on our end. Currently, it always returns the result of
  TargetInstrInfo::CreateTargetScheduleState
- Added an extra helper which returns the number of instructions in the
  current packet. This is used in our downstream, and may be useful
  elsewhere.
- Placed the priority heuristic values into the ConvergingVLIWscheduler
  class instead of defining them as local statics in the implementation
- Added a overridable helper in ConvergingVLIWScheduler so that targets
  can create their own VLIWResourceModel

Differential Revision: https://reviews.llvm.org/D113150
2021-12-06 16:23:48 -06:00
spupyrev
c68f71eb37 ext-tsp basic block layout
A new basic block ordering improving existing MachineBlockPlacement.

The algorithm tries to find a layout of nodes (basic blocks) of a given CFG
optimizing jump locality and thus processor I-cache utilization. This is
achieved via increasing the number of fall-through jumps and co-locating
frequently executed nodes together. The name follows the underlying
optimization problem, Extended-TSP, which is a generalization of classical
(maximum) Traveling Salesmen Problem.

The algorithm is a greedy heuristic that works with chains (ordered lists)
of basic blocks. Initially all chains are isolated basic blocks. On every
iteration, we pick a pair of chains whose merging yields the biggest increase
in the ExtTSP value, which models how i-cache "friendly" a specific chain is.
A pair of chains giving the maximum gain is merged into a new chain. The
procedure stops when there is only one chain left, or when merging does not
increase ExtTSP. In the latter case, the remaining chains are sorted by
density in decreasing order.

An important aspect is the way two chains are merged. Unlike earlier
algorithms (e.g., based on the approach of Pettis-Hansen), two
chains, X and Y, are first split into three, X1, X2, and Y. Then we
consider all possible ways of gluing the three chains (e.g., X1YX2, X1X2Y,
X2X1Y, X2YX1, YX1X2, YX2X1) and choose the one producing the largest score.
This improves the quality of the final result (the search space is larger)
while keeping the implementation sufficiently fast.

Differential Revision: https://reviews.llvm.org/D113424
2021-12-06 08:56:39 -08:00
Kazu Hirata
c4a8928b51 [CodeGen] Use range-based for loops (NFC) 2021-12-06 08:49:10 -08:00
Jack Andersen
f108c7f59d [GlobalISel] Allow DBG_VALUE to use undefined vregs before LiveDebugValues.
Expanding on D109750.

Since `DBG_VALUE` instructions have final register validity determined in
`LDVImpl::handleDebugValue`, there is no apparent reason to immediately prune
unused register operands as their defs are erased. Consequently, this renders
`MachineInstr::eraseFromParentAndMarkDBGValuesForRemoval` moot; gaining a
substantial performance improvement.

The only necessary changes involve making relevant passes consider invalid
DBG_VALUE vregs uses as valid.

Reviewed By: MatzeB

Differential Revision: https://reviews.llvm.org/D112852
2021-12-05 15:55:59 -05:00
Michael Liao
b6ccca217c Fix -Wunused-variable warning. NFC. 2021-12-05 13:40:35 -05:00
Kazu Hirata
1457e78352 [llvm] Use range-based for loops (NFC) 2021-12-05 08:33:02 -08:00
Kristina Bessonova
75b622a795 Reland [DwarfDebug] Support emitting function-local declaration for a lexical block
This is another attempt to make function-local declarations
(like static variables, structs/classes and other) be correctly
emitted within a lexical (bracketed) block.

Fixes https://bugs.llvm.org/show_bug.cgi?id=19238.

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D113741
2021-12-05 13:56:45 +02:00
Kristina Bessonova
0ac75e82ff Reland [DwarfDebug] Move emission of global vars, types and imports to endModule()
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
  local imported entities; related types can be created as a part of
  the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).

Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.

The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
    from this time IR can be significantly changed by target-specific passes.
    If it happens for debug metadata of global entities, those changes will not
    be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
    but it isn't known in DwarfDebug::beginModule() (it's possible to make some
    guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
    (subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
    (they need parent emitted first). Another problem is if to try to gather
    some information about local entities and defer their emission
    (till subprogram's processing or DwarfDebug::endModule()) all the gathered
    details might be irrelevant / invalid by the time the entities are being
    emitted (because of (1)).

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114705
2021-12-05 13:56:45 +02:00
David Green
57ff805a6d [DAG] Create fptoui.sat from clamped fptosi
As an extension to D111976, this converts clamp fptosi, clamped between
0 and (2^n)-1 to a fptoui.sat. This can greatly help on targets with
conversions that naturally saturate, such as Arm.

X86 disables the transform as some of the test cases increases in size.
A fptoui.sat necessitates a fp clamp without native support, so there is
little use in converting if the instruction is just going to be
expanded.

Differential Revision: https://reviews.llvm.org/D112428
2021-12-05 09:25:52 +00:00
Kazu Hirata
ca2f53897a [CodeGen] Use range-based for loops (NFC) 2021-12-04 08:48:05 -08:00
Kristina Bessonova
a961604819 Revert "[DwarfDebug] Support emitting function-local declaration for a lexical block"
This reverts commits
* ee691970a9 (D113741),
* 79d3132998 (D114705)

due to lldb and dexter test failures.
2021-12-04 18:06:57 +02:00
Kristina Bessonova
ee691970a9 [DwarfDebug] Support emitting function-local declaration for a lexical block
This is another attempt to make function-local declarations
(like static variables, structs/classes and other) be correctly
emitted within a lexical (bracketed) block.

Fixes https://bugs.llvm.org/show_bug.cgi?id=19238.

Differential Revision: https://reviews.llvm.org/D113741
2021-12-04 17:12:47 +02:00
Kristina Bessonova
79d3132998 [DwarfDebug] Move emission of global vars, types and imports to endModule()
This patch proposes to move emission of global variables, types,
imported entities, etc from DwarfDebug::beginModule() to DwarfDebug::endModule().
Effectively, this changes nothing but the order of debug entities which
will be as follows:
* subprograms (including related context, local variables/labels,
  local imported entities; related types can be created as a part of
  the emission of local entities of an abstract subprogram);
* global variables (including related context and types);
* retained types and enums;
* non-local-scoped imported entities;
* basic types;
* other types left (as a part of local variables attributes emission).

Note that the order of emitted compile units may also be changed as now we emit
units that contain subprograms first and then all other non-empty units.

The motivation behind this change is the following:
(1) DwarfDebug::beginModule() is run at the very beginning of backend's pipeline,
    from this time IR can be significantly changed by target-specific passes.
    If it happens for debug metadata of global entities, those changes will not
    be reflected in the emitted DWARF.
(2) imported subprogram names should refer to an abstract subprogram if it exists,
    but it isn't known in DwarfDebug::beginModule() (it's possible to make some
    guesses based on location info, but it's not quite reliable);
(3) aforementioned entities if they are scoped within a bracketed block
    (subject of D113741) couldn't be emitted in DwarfDebug::beginModule()
    (they need parent emitted first). Another problem is if to try to gather
    some information about local entities and defer their emission
    (till subprogram's processing or DwarfDebug::endModule()) all the gathered
    details might be irrelevant / invalid by the time the entities are being
    emitted (because of (1)).

Reviewed By: dblaikie

Differential Revision: https://reviews.llvm.org/D114705
2021-12-04 14:10:01 +02:00
Kazu Hirata
3aed282257 [CodeGen] Use range-based for loops (NFC) 2021-12-03 20:45:59 -08:00
Simon Pilgrim
ebf5271918 [DAG] PromoteIntRes_FunnelShift - rename shift Amount variable to Amt to prevent line overflow. NFC. 2021-12-03 17:24:45 +00:00
Stephen Tozer
98a021fcbf [DebugInfo] Attempt to preserve more information during tail duplication
Prior to this patch, tail duplication handled debug info poorly -
specifically, debug instructions would be dropped instead of being set
undef, potentially extending the lifetimes of prior debug values that
should be killed. The pass was also very aggressive with dropping debug
info, dropping debug info even when the SSA value it referred to was
still present. This patch attempts to handle debug info more carefully,
checking to see whether each affected debug value can still be live,
setting it undef if not.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D106875
2021-12-03 15:30:05 +00:00
David Green
255ad73424 [ARM] Make MVE v2i1 predicates legal
MVE can treat v16i1, v8i1, v4i1 and v2i1 as different views onto the
same 16bit VPR.P0 register, with v2i1 holding two 8 bit values for the
two halves. This was never treated as a legal type in llvm in the past
as there are not many 64bit instructions and no 64bit compares. There
are a few instructions that could use it though, notably a VSELECT (as
it can handle any size using the underlying v16i8 VPSEL), AND/OR/XOR for
similar reasons, some gathers/scatter and long multiplies and VCTP64
instructions.

This patch goes through and makes v2i1 a legal type, handling all the
cases that fall out of that. It also makes VSELECT legal for v2i64 as a
side benefit. A lot of the codegen changes as a result - usually in way
that is a little better or a little worse, but still expensive. Costs
can change a little too in the process, again in a way that expensive
things remain expensive. A lot of the tests that changed are mainly to
ensure correctness - the code can hopefully be improved in the future
where it comes up in practice.

The intrinsics currently remain using the v4i1 they previously did to
emulate a v2i1. This will be changed in a followup patch but this one
was already large enough.

Differential Revision: https://reviews.llvm.org/D114449
2021-12-03 14:05:41 +00:00
Jay Foad
d133a21b71 [SelectionDAG] Add newline to a debug message 2021-12-03 13:33:32 +00:00
Simon Pilgrim
6803d08c38 [DAG][PowerPC] Enable initial ISD::BITCAST SimplifyDemandedBits/SimplifyMultipleUseDemandedBits big-endian handling
This patch begins extending handling for peeking through bitcast nodes to big-endian targets as well as the existing little-endian case.

Differential Revision: https://reviews.llvm.org/D114676
2021-12-02 11:47:53 +00:00
Omer Aviram
617ad14060 [SelectionDAG] Add pattern to haveNoCommonBitsSet
Correctly identify the following pattern, which has no common bits: (X & ~M) op (Y & M).

Differential Revision: https://reviews.llvm.org/D113970
2021-12-01 12:04:04 -05:00
Simon Pilgrim
19d34f6e95 [X86] combinePMULH - recognise 'cheap' trunctions via PACKS/PACKUS as well as SEXT/ZEXT
combinePMULH currently only truncates vXi32/vXi64 multiplies to PMULHW/PMULUW if the source operands are SEXT/ZEXT instructions for a 'free' truncation.

But we can generalize this to any source operand with sufficient leading sign/zero bits that would allow PACKS/PACKUS to be used as a 'cheap' truncation.

This helps us avoid the wider multiplies, in exchange for truncation on both source operands instead of the result.

Differential Revision: https://reviews.llvm.org/D113371
2021-12-01 16:37:49 +00:00
Ties Stuij
f5f28d5b0c [ARM] Implement BTI placement pass for PACBTI-M
This patch implements a new MachineFunction in the ARM backend for
placing BTI instructions. It is similar to the existing AArch64
aarch64-branch-targets pass.

BTI instructions are inserted into basic blocks that:
- Have their address taken
- Are the entry block of a function, if the function has external
  linkage or has its address taken
- Are mentioned in jump tables
- Are exception/cleanup landing pads

Each BTI instructions is placed in the beginning of a BB after the
so-called meta instructions (e.g. exception handler labels).

Each outlining candidate and the outlined function need to be in agreement about
whether BTI placement is enabled or not. If branch target enforcement is
disabled for a function, the outliner should not covertly enable it by emitting
a call to an outlined function, which begins with BTI.

The cost mode of the outliner is adjusted to account for the extra BTI
instructions in the outlined function.

The ARM Constant Islands pass will maintain the count of the jump tables, which
reference a block. A `BTI` instruction is removed from a block only if the
reference count reaches zero.

PAC instructions in entry blocks are replaced with PACBTI instructions (tests
for this case will be added in a later patch because the compiler currently does
not generate PAC instructions).

The ARM Constant Island pass is adjusted to handle BTI
instructions correctly.

Functions with static linkage that don't have their address taken can
still be called indirectly by linker-generated veneers and thus their
entry points need be marked with BTI or PACBTI.

The changes are tested using "LLVM IR -> assembly" tests, jump tables
also have a MIR test. Unfortunately it is not possible add MIR tests
for exception handling and computed gotos because of MIR parser
limitations.

This patch is part of a series that adds support for the PACBTI-M extension of
the Armv8.1-M architecture, as detailed here:

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/armv8-1-m-pointer-authentication-and-branch-target-identification-extension

The PACBTI-M specification can be found in the Armv8-M Architecture Reference
Manual:

https://developer.arm.com/documentation/ddi0553/latest

The following people contributed to this patch:

- Mikhail Maltsev
- Momchil Velikov
- Ties Stuij

Reviewed By: ostannard

Differential Revision: https://reviews.llvm.org/D112426
2021-12-01 12:54:05 +00:00
Bradley Smith
0eb1efb92c [DAGCombiner] When combining REM ensure optimized div nodes are unique
The REM DAG combine uses the visitDivLike functions to try and get an
optimized DIV node to provide better codegen, however in some cases this
visitDivLike call ends up in the BuildSDIVPow2 target hook, which in
turn sometimes will return the same node passed in to indicate not to
change it. The REM DAG combine does not anticipate this and creates a
cycle in the DAG because of it.

Fix this by ensuring any such optimized div node returned is distinct
from the node being combined.

Differential Revision: https://reviews.llvm.org/D114716
2021-12-01 11:24:26 +00:00
Simon Pilgrim
9981dd142f [DAG] Apply clang-format to visitMSTORE + visitMLOAD. NFC.
Reduce diff in D114582
2021-12-01 11:23:47 +00:00
Qiu Chaofan
15826eb437 [Legalizer] Avoid expansion to BR_CC if illegal
Reviewed By: craig.topper

Differential Revision: https://reviews.llvm.org/D110616
2021-12-01 12:22:21 +08:00
Mircea Trofin
a503cb00d1 [NFC][regalloc] Factor accesses to ExtraRegInfo
We'll move ExtraRegInfo to the RegAllocEvictionAdvisor subsequently.
This change prepares for that by factoring all accesses.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D114759
2021-11-30 15:10:49 -08:00
David Green
9e8a71caf0 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 15:29:14 +00:00
Hans Wennborg
a87782c34d Revert "[DAG] Create fptosi.sat from clamped fptosi"
It causes builds to fail with this assert:

llvm/include/llvm/ADT/APInt.h:990:
bool llvm::APInt::operator==(const llvm::APInt &) const:
Assertion `BitWidth == RHS.BitWidth && "Comparison requires equal bit widths"' failed.

See comment on the code review.

> This adds a fold in DAGCombine to create fptosi_sat from sequences for
> smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
> the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
> it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
> ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
> to be handled similarly.
>
> A shouldConvertFpToSat method was added to control when converting may
> be profitable. The original fptosi will have a less strict semantics
> than the fptosisat, with less values that need to produce defined
> behaviour.
>
> This especially helps on ARM/AArch64 where the vcvt instructions
> naturally saturate the result.
>
> Differential Revision: https://reviews.llvm.org/D111976

This reverts commit 52ff3b0093.
2021-11-30 15:36:56 +01:00
Jeremy Morse
3c04507088 [DebugInfo] Turn instruction referencing on by default for x86
This patch is designed to be reverted -- it activates a reasonably large
block of new-ish code, so some turbulence is likely.

Instruction referencing is best summarised, and it being on-by-default,
is discussed here:

    https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html

Differential Revision: https://reviews.llvm.org/D114631
2021-11-30 13:44:07 +00:00
Jeremy Morse
651122fc4a [DebugInfo][InstrRef] Pre-land on-by-default-for-x86 changes
Over in D114631 and [0] there's a plan for turning instruction referencing
on by default for x86. This patch adds / removes all the relevant bits of
code, with the aim that the final patch is extremely small, for an easy
revert. It should just be a condition in CommandFlags.cpp and removing the
XFail on instr-ref-flag.ll.

[0] https://lists.llvm.org/pipermail/llvm-dev/2021-November/153653.html
2021-11-30 12:40:59 +00:00
Jeremy Morse
8dda516b83 [DebugInfo][InstrRef] Avoid dropping fragment info during PHI elimination
InstrRefBasedLDV used to crash on the added test -- the exit block is not
in scope for the variable being propagated, but is still considered because
it contains an assignment. The failure-mode was vlocJoin ignoring
assign-only blocks and not updating DIExpressions, but pickVPHILoc would
still find a variable location for it. That led to DBG_VALUEs created with
the wrong fragment information.

Fix this by removing a filter inherited from VarLocBasedLDV: vlocJoin will
now consider assign-only blocks and will update their expressions.

Differential Revision: https://reviews.llvm.org/D114727
2021-11-30 11:32:31 +00:00
David Green
52ff3b0093 [DAG] Create fptosi.sat from clamped fptosi
This adds a fold in DAGCombine to create fptosi_sat from sequences for
smin(smax(fptosi(x))) nodes, where the min/max saturate the output of
the fp convert to a specific bitwidth (say INT_MIN and INT_MAX). Because
it is dealing with smin(/smax) in DAG they may currently be ISD::SMIN,
ISD::SETCC/ISD::SELECT, ISD::VSELECT or ISD::SELECT_CC nodes which need
to be handled similarly.

A shouldConvertFpToSat method was added to control when converting may
be profitable. The original fptosi will have a less strict semantics
than the fptosisat, with less values that need to produce defined
behaviour.

This especially helps on ARM/AArch64 where the vcvt instructions
naturally saturate the result.

Differential Revision: https://reviews.llvm.org/D111976
2021-11-30 11:05:32 +00:00
Abinav Puthan Purayil
bc5dbb0bae [GlobalISel] Add matchers for constant splat.
This change exposes isBuildVectorConstantSplat() to the llvm namespace
and uses it to implement the constant splat versions of
m_SpecificICst().

CombinerHelper::matchOrShiftToFunnelShift() can now work with vector
types and CombinerHelper::matchMulOBy2()'s match for a constant splat is
simplified.

Differential Revision: https://reviews.llvm.org/D114625
2021-11-30 15:18:50 +05:30
Guozhi Wei
f1d8345a2a [TwoAddressInstructionPass] Create register mapping for registers with multiple uses in the current MBB
Currently we create register mappings for registers used only once in current
MBB. For registers with multiple uses, when all the uses are in the current MBB,
we can also create mappings for them similarly according to the last use.
For example

    %reg101 = ...
            = ... reg101
    %reg103 = ADD %reg101, %reg102

We can create mapping between %reg101 and %reg103.

Differential Revision: https://reviews.llvm.org/D113193
2021-11-29 19:01:59 -08:00
Mircea Trofin
e8b8304d76 [NFC][Regalloc] Split canEvictInterference into hint and general
There are 2 eviction queries. One is made by tryAssign, when it attempts to
free an interference occupying the hint of the candidate. The other is
during 'regular' interference resolution, where we scan over all
physical registers and try to see if we can evict live ranges in favor
of the candidate. We currently use the same logic in both cases, just
that the former never passes the cost to any subsequent query.
Technically, the 2 decisions could be implemented with different
policies.

This patch splits the 2.

RFC: https://lists.llvm.org/pipermail/llvm-dev/2021-November/153639.html

Differential Revision: https://reviews.llvm.org/D114019
2021-11-29 16:04:03 -08:00
Jeremy Morse
0eee844539 [DebugInfo][InstrRef] Terminate overlapping variable fragments
If we have a variable where its fragments are split into overlapping
segments:

    DBG_VALUE $ax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 16)
    ...
    DBG_VALUE $eax, $noreg, !123, !DIExpression(DW_OP_LLVM_fragment_0, 32)

we should only propagate the most recently assigned fragment out of a
block. LiveDebugValues only deals with live-in variable locations, as
overlaps within blocks is DbgEntityHistoryCalculators domain.

InstrRefBasedLDV has kept the accumulateFragmentMap method from
VarLocBasedLDV, we just need it to recognise DBG_INSTR_REFs. Once it's
produced a mapping of variable / fragments to the overlapped variable /
fragments, VLocTracker uses it to identify when a debug instruction needs
to terminate the other parts it overlaps with. The test is updated for
some standard "InstrRef picks different registers" variation, and the
order of some unrelated DBG_VALUEs changes.

Differential Revision: https://reviews.llvm.org/D114603
2021-11-29 23:37:20 +00:00
Jeremy Morse
a20987adf4 [DebugInfo][InstrRef] Add indirection from dbg.declare in SelectionDAG
Usually dbg.declares get translated into either entries in an MF
side-table, or a DBG_VALUE on entry to the function with IsIndirect set
(including in instruction referencing mode). Much rarer is a dbg.declare
attached to a non-argument value, such as in the test added in this patch
where there's a variable-length-array. Such dbg.declares become SDDbgValue
nodes with InIndirect=true.

As it happens, we weren't correctly emitting DBG_INSTR_REFs with the
additional indirection. This patch adds the extra indirection, encoded as
adding an additional DW_OP_deref to the expression.

Differential Revision: https://reviews.llvm.org/D114440
2021-11-29 22:24:19 +00:00
Jeremy Morse
9cf31b8d39 [DebugInfo][InstrRef] Preserve properties of restored variables
InstrRefBasedLDV observes when variable locations are clobbered, scans what
values are available in the machine, and re-issues a DBG_VALUE for the
variable if it can find another location. Unfortunately, I hadn't joined up
the Indirectness flag, so if it did this to an Indirect Value, the
indirectness would be dropped.

Fix this, and add a test that if we clobber a variable value (on the stack
in this case), then the recovered variable location keeps the Indirect
flag.

Differential Revision: https://reviews.llvm.org/D114378
2021-11-29 21:57:24 +00:00
Kazu Hirata
f240e528ce [llvm] Use range-based for loops (NFC) 2021-11-29 09:04:44 -08:00
Mirko Brkusanin
0dd570ff56 [AMDGPU][GlobalISel] Transform (fsub (fpext (fneg (fmul x, y))), z) -> (fneg (fma (fpext x), (fpext y), z))
Patch by: Mateja Marjanovic

Differential Revision: https://reviews.llvm.org/D98050
2021-11-29 16:27:22 +01:00