The DWARF 5 specification says that:
> The name index must contain an entry for each debugging information
entry that
> defines a named [...] label [...].
The verifier currently verifies this, but the AsmPrinter does not add
entries for TAG_labels in debug_names. This patch addresses the issue by
ensuring we add labels in the accelerator tables once we have a fully
completed DIE for the TAG_label entry.
We also respect the spec as follows:
> DW_TAG_label debugging information entries without an address
attribute
> (DW_AT_low_pc, DW_AT_high_pc, DW_AT_ranges, or DW_AT_entry_pc) are
excluded.
The effect of this on the size of accelerator tables is minimal, as
TAG_labels are usually created by C/C++ labels (see example in test),
which are typically paired with "goto" statements.
Most (x86) swiftasync functions tend to use both SelectionDAGISel and
FastISel lowering:
* FastISel argument lowering can only handle C calling convention.
* FastISel fails mid-BB in a number of ways, including in simple `ret
void` instructions under certain circumstances.
This dance of SelectionDAG (argument) -> FastISel (some instructions) ->
SelectionDAG(remaining instructions) is lossy; in particular, Argument
information lowering is cleared after that first SelectionDAG run.
Since swiftasync functions rely heavily on proper Argument lowering for
debug information, this patch disables the use of FastISel in such
functions.
Add subrange tracking and handling for LiveIntervals during PHI
elimination.
This requires extending MachineBasicBlock::SplitCriticalEdge to also
update subrange intervals.
After performing sink-and-fold over a COPY, the original instruction is
replaced with one that produces its output in the destination of the
copy. Its value is still available (in a hard register), so if there are
debug instructions which refer to the (now deleted) virtual register
they could be updated to refer to the hard register, in principle.
However, it's not clear how to do that, moreover in some cases the debug
instructions may need to be replicated proportionally to the number of
the COPY instructions replaced and in some extreme cases we can end up
with quadratic increase in the number of debug instructions, e.g:
int f(int);
void g(int x) {
int y = x + 1;
int t0 = y;
f(t0);
int t1 = y;
f(t1);
}
Similar to #70635, this expands the handling of integer to fp
conversions. The code is very similar to the float->integer conversions
with types handled oppositely. There are some extra unhandled cases
which require more handling for ASR operations.
This patch lowers `sdiv x, +/-2**k` to `add + select + shift` when the
short forward branch optimization is enabled. The latter inst seq
performs faster than the seq generated by target-independent
DAGCombiner. This algorithm is described in ***Hacker's Delight***.
This patch also removes duplicate logic in the X86 and AArch64 backend.
But we cannot do this for the PowerPC backend since it generates a
special instruction `addze`.
This patch plumbs the command line --experimental-debuginfo-iterators flag
in to the pass managers, so that modules can be converted to the new
format, passes run, then converted back to the old format. That allows
developers to test-out the new debuginfo representation across some part of
LLVM with no further work, and from the command line. It also installs
flag-catchers at the various points that bitcode and textual IR can egress
from a process, and temporarily convert the module to dbg.value format when
doing so.
No tests alas as it's designed to be transparent.
Differential Revision: https://reviews.llvm.org/D154372
When generating the assembly code for AIX/XCOFF, the .file pseudo-op
needs to be emitted first, before any csects are generated. Otherwise,
information such as the embedded command line will be associated with
part of the object file rather than the entire object file.
This makes use of the more accurate register number introduced in PR
#70222 to avoid CFI calculations for unsupported registers.
This has basically no impact right now, but results in a 0.2% compile-time
improvement at O0 when applied on top of #70958.
The reason is that the extra registers that PR adds push the `BitVector`
out of the `SmallVector` space, which results in an outsized impact.
(This does make me wonder whether `BitVector` should accept an `N`
template parameter to allow using a larger `SmallVector`...)
This patch enhances the optimization of memcmp calls when only two
outcomes
are needed and comparison fits into one block, for example:
bool result = memcmp(a, b, 6) > 0;
Previously, LLVM would generate unnecessary operations even when the
user of
memcmp was only interested in a binary outcome.
`MergePotentialElts::operator<` asserts that the two elements being
compared are not equal. However, sorting functions are allowed to invoke
the comparison function with equal arguments (though they usually don't
for efficiency reasons).
There is an existing special-case that disables the assert if
_GLIBCXX_DEBUG is used, which may invoke the comparator with equal args
to verify strict weak ordering. I believe libc++ also has strict weak
ordering checks under some options nowadays.
Recently, #71312 was reported, where a change to glibc's qsort_r
implementation can also result in comparison between equal elements.
From what I understood, this is an inefficiency that will be fixed on
the glibc side as well, but I think at this point we should just remove
this assertion.
Fixes https://github.com/llvm/llvm-project/issues/71312.
- Revert "[DAGCombiner] Transform `(icmp eq/ne (and X,C0),(shift X,C1))`
to use rotate or to getter constants." - causes a miscompile, see
112e49b381 (commitcomment-131943923)
- Revert "[X86] Fix gcc warning about mix of enumeral and non-enumeral
types. NFC", which fixes a compiler warning in the commit above
Track the live register state immediately before, instead of after,
MBBI. This makes it simple to track the state at the start or end of a
basic block without a separate (and poorly named) Tracking flag.
This changes the API of the backward(MachineBasicBlock::iterator I)
method, which now recedes to the state just before, instead of just
after, *I. Some clients are simplified by this change.
There is one small functional change shown in the lit tests where
multiple spilled registers all need to be reloaded before the same
instruction. The reloads will now be inserted in the opposite order.
This should not affect correctness.
This change fixes#71518, which compared the KnownMinValue of the
scalable virtual register with the FixedSize of the physical register in
the wrong direction. It turns out that we cannot include this check at all since
it will lead to a false failures. Test cases are added to show that
the false failures no longer occur after this fix.
…gSizeInBits
This patch changes getRegSizeInBits to return a TypeSize instead of an
unsigned in the case that a virtual register has a scalable LLT. In the
case that register is physical, a Fixed TypeSize is returned.
The MachineVerifier pass is updated to allow copies between fixed and
scalable operands as long as the Src size will fit into the Dest size.
This is a precommit which will be stacked on by a change to GISel to
generate COPYs with a scalable destination but a fixed size source.
This patch is stacked on https://github.com/llvm/llvm-project/pull/70893
for the ability to use scalable vector types in MIR tests.
For more recent sve capable CPUs it is beneficial to use the inc*
instruction
to increment a value by vscale (potentially shifted or multiplied) even
in
short loops.
This patch tells codegenprepare to sink appropriate vscale calls into
blocks where they are used so that isel can match them.
If a load instruction qualifies to be optimized by InterleavedAccess
Pass, but also has a dead binop instruction, this will lead to a crash.
Binop instruction will not be deleted, because normally it would be
deleted through its' users, but it has none. Later on deleting a load
instruction will fail because it still has uses.
The @llvm.amdgcn.cs.chain intrinsic is essentially a call. The call
parameters are bundled up into 2 intrinsic arguments, one for those that
should go in the SGPRs (the 3rd intrinsic argument), and one for those
that should go in the VGPRs (the 4th intrinsic argument). Both will
often be some kind of aggregate.
Both instruction selection frameworks have some internal representation
for intrinsics (G_INTRINSIC[_WITH_SIDE_EFFECTS] for GlobalISel,
ISD::INTRINSIC_[VOID|WITH_CHAIN] for DAGISel), but we can't use those
because aggregates are dissolved very early on during ISel and we'd lose
the inreg information. Therefore, this patch shortcircuits both the
IRTranslator and SelectionDAGBuilder to lower this intrinsic as a call
from the very start. It tries to use the existing infrastructure as much
as possible, by calling into the code for lowering tail calls.
This has already gone through a few rounds of review in Phab:
Differential Revision: https://reviews.llvm.org/D153761
This is similar to vector.reverse, but only reverses the first EVL
elements.
I extracted this code from our downstream. Some of it may have come from
https://repo.hca.bsc.es/gitlab/rferrer/llvm-epi/ originally.
This reverts commit 28ae42e662.
The assert in getIConstantVRegVal was not updated for this change.
The ValAndVReg->VReg == VReg check fails if any look through happens.
RISC-V was the only target using the lookthrough functionality, but I'm
not sure it was needed so I'm removing that too.
Now that we have more types handled for zext/sext and trunc, it is
possible to get more types working for the vector float to integer
conversions. This patch adds fp16, widening and narrowing vector support
to handle more types. The smaller types wil be expanded to the size of
the larger element type. A couple of case require more awkward truncates
to get working as they go from illegal to illegal types.
Fix a bug introduced in 98c90a1 (ISel: introduce vector ISD::LRINT,
ISD::LLRINT; custom RISCV lowering), where ISD::LRINT and ISD::LLRINT
used WidenVecRes_Unary to widen the vector result. This leads to
incorrect CodeGen for RISC-V fixed-vectors of length 3, and a crash in
SelectionDAG when we try to lower llvm.lrint.vxi32.vxf64 on i686. Fix
the bug by implementing a correct WidenVecRes_XRINT.
Fixes#71187.
This is necessary for RegAllocGreedy support for memory folding inline
asm that uses "rm" constraints.
Thanks to @qcolombet for the suggestion.
Link: https://github.com/llvm/llvm-project/issues/20571
This adds the nneg flag to SDNodeFlags and the node printing code.
SelectionDAGBuilder will add this flag to the node if the target doesn't
prefer sign extend.
A future RISC-V patch can remove the sign extend preference from
SelectionDAGBuilder.
I've also added the flag to the DAG combine that converts
ISD::SIGN_EXTEND to ISD::ZERO_EXTEND.
When using the inline asm constraint string "rm" (or "g"), we generally
would like the compiler to choose "r", but it is permitted to choose "m"
if there's register pressure. This is distinct from "r" in which the
register is not permitted to be spilled to the stack.
The decision of which to use must be made at some point. Currently, the
instruction selection frameworks (ISELs) make the choice, and the
register allocators had better be able to handle the result.
Steal a bit from Storage when using register operands to disambiguate
between the two cases. Add helpers/getters/setters, and print in MIR
when such a register is foldable.
The getter will later be used by the register allocation frameworks (and
asserted by the ISELs) while the setters will be used by the instruction
selection frameworks.
Link: https://github.com/llvm/llvm-project/issues/20571
RegBanks are allocated as global variables. The use of BitVector causes
a static global constructor to be used. The BitVector is initialized
from a table of bits that is created by tablegen. We can keep a pointer
to that data and use it as the bit vector instead.
This does require a little bit of manual indexing and reimplementation
of BitVector::count.