Enable specialization on literal constant arguments by default in
Function Specialization.
---------
Co-authored-by: Alexandros Lamprineas <alexandros.lamprineas@arm.com>
Both are based on MachineLICMBase, and the functionality there is
"switched" based on a PreRegAlloc flag. This commit is simply about
trusting the original value of that flag, defined by the `MachineLICM`
and `EarlyMachineLICM` classes.
The `PreRegAlloc` flag used to be overwritten it based on MRI.isSSA(),
which is un-reliable due to how it is inferred by the MIRParser. I see
that we can now define isSSA in MIR (thanks @gargaroff ), meaning the
fix isn’t really needed anymore, but redefining that flag still feels
wrong.
Note that I'm looking into upstreaming more changes to MachineLICM, see
[the discourse
thread](https://discourse.llvm.org/t/extending-post-regalloc-machinelicm/82725).
Store Swift mangled names in DW_AT_linkage_name. The Swift compiler
emits only the type mangled name in debug information, and LLDB uses
those mangled names as keys to look up size, alignment, fields, etc
from either reflection metadata or Swift modules.
Additionally, emit types linkage names for types into the accelerator
table if they exist and they're different from the display name.
Until now debug info was printing the symbols names as-is and that
resulted in invalid PTX when the symbols contained characters that are
invalid for PTX. E.g. `__PRETTY_FUNCTION.something`
Debug info is somewhat disconnected from the symbols themselves, so the
regular "NVPTXAssignValidGlobalNames" pass can't easily fix them.
As the "plan B" this patch catches printout of debug symbols and fixes
them, as needed. One gotcha is that the same code path is used to print
the names of debug info sections. Those section names do start with a
'.debug'. The dot in those names is nominally illegal in PTX, but the
debug section names with a dot are accepted as a special case. The
downside of this change is that if someone ever has a `.debug*` symbol
that needs to be referred to from the debug info, that label will be
passed through as-is, and will still produce broken PTX output. If/when
we run into a case where we need it to work, we could consider only
passing through specific debug section names, or add a mechanism
allowing us to tell section names apart from regular symbols.
Fixes#58491
LDV could reorder reinserted fragment and non-fragment debug values for
the same variable (compared to the input order), potentially resulting
in stale values being presented.
For example, before:
DBG_VALUE 1001, $noreg, !13, !DIExpression(DW_OP_LLVM_fragment, 0, 16)
DBG_VALUE 1002, $noreg, !13, !DIExpression(DW_OP_LLVM_fragment, 16, 16)
DBG_VALUE %0, $noreg, !13, !DIExpression()
After (without this patch):
DBG_VALUE %stack.0, 0, !13, !DIExpression()
DBG_VALUE 1002, $noreg, !13, !DIExpression(DW_OP_LLVM_fragment, 16, 16)
DBG_VALUE 1001, $noreg, !13, !DIExpression(DW_OP_LLVM_fragment, 0, 16)
It would also reorder DBG_VALUEs for different variables. Although that
does not matter for the debug information output, it resulted in some
noise in before/after pass diffs.
This should hopefully align so that instruction referencing and
DBG_VALUE emit debug instructions in the same order (see the
sdag-salvage-add.ll change).
Added .setMIFlag(MachineInstr::FrameSetup) to all BuildMI calls in
HexagonFrameLowering::insertAllocframe. This change ensures that the
test sugared-constants.ll passes upstream by correctly marking
instructions as part of the frame setup.
This patch handles virtual registers in the VarLocBasedImpl of the
LiveDebugVariables pass, allowing it to be used on architectures that
depend on virtual registers in debugging, like NVPTX. It enables the
pass for NVPTX.
Enable .debug_loc section for NVPTX backend.
This commit makes NVPTX omit DW_AT_low_pc (and DW_AT_high_pc) for
DW_TAG_compile_unit. This is because cuda-gdb uses the compile unit's
low_pc as a base address, and adds the addresses in the debug_loc
section to it. Removing low_pc is equivalent to setting that base
address to zero, so addition doesn't break the location ranges.
Additionally, this patch forces debug_loc label emission to emit single
labels with no subtraction or base. This would not be necessary if we
could emit `label1 - label2` expressions in PTX. The PTX documentation
at
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#debugging-directives-section
makes it seem like this is supported, but it doesn't actually work. I
believe when that documentation says that you can subtract “label
addresses between labels in the same dwarf section”, it doesn't merely
mean that the labels need to be in the same section as each other, but
in fact they need to be in the same section as the use. If support for
label subtraction is supported such that in the debug_loc section you
can subtract labels from the main code section, then we can remove the
workarounds added in this PR.
Also, since this now emits valid .debug_loc sections, it replaces the
empty .debug_loc to force existence of at least one debug section with
an empty .debug_macinfo section, which matches what nvcc does.
As of rev ea222be0d, LLVMs assembler will actually try to honour the
"fill value" part of p2align directives. X86 printed these as 0x90, which
isn't actually what it wanted: we want multi-byte nops for .text
padding. Compiling via a textual assembly file produces single-byte
nop padding since ea222be0d but the built-in assembler will produce
multi-byte nops. This divergent behaviour is undesirable.
To fix: don't set the byte padding field for x86, which allows the
assembler to pick multi-byte nops. Test that we get the same multi-byte
padding when compiled via textual assembly or directly to object file.
Added same-align-bytes-with-llasm-llobj.ll to that effect, updated
numerous other tests to not contain check-lines for the explicit padding.
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`
* llvm.nvvm.ptr.gen.to.global
* llvm.nvvm.ptr.gen.to.shared
* llvm.nvvm.ptr.gen.to.constant
* llvm.nvvm.ptr.gen.to.local
* llvm.nvvm.ptr.global.to.gen
* llvm.nvvm.ptr.shared.to.gen
* llvm.nvvm.ptr.constant.to.gen
* llvm.nvvm.ptr.local.to.gen
Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.
This was reverted to avoid conflicts while reverting #107655. Re-landing
unchanged.
This is the final piece to enable register debugging for variables in
registers that have single locations that last throughout their
enclosing scope.
The next step after this for supporting register debugging for NVPTX is
to support the .debug_loc section.
Stacked on top of: https://github.com/llvm/llvm-project/pull/109495
This patch adds support for encoding PTX registers for DWARF, using the
encoding supported by nvcc and cuda-gcc.
There are some other features still needed for proper register debugging
that this patch does not address, such as DW_AT_address_class.
This PR is stacked on: https://github.com/llvm/llvm-project/pull/109494
When emitting debug info for code alignment, it was possible to emit a
.loc directive with a file number of zero, which is invalid for DWARF 4
and earlier. This happened because getCurrentDwarfLoc() returned a
zero-initialised value when there hadn't been a previous .loc directive
emitted.
---------
Co-authored-by: Paul T Robinson <paul.robinson@sony.com>
Remove the following intrinsics which can be trivially replaced with an
`addrspacecast`
* llvm.nvvm.ptr.gen.to.global
* llvm.nvvm.ptr.gen.to.shared
* llvm.nvvm.ptr.gen.to.constant
* llvm.nvvm.ptr.gen.to.local
* llvm.nvvm.ptr.global.to.gen
* llvm.nvvm.ptr.shared.to.gen
* llvm.nvvm.ptr.constant.to.gen
* llvm.nvvm.ptr.local.to.gen
Also, cleanup the NVPTX lowering of `addrspacecast` making it more
concise.
Same fix was provided for AIX in commit
704da919ba.
The issue is unsupported DWARF 5 section with the following assertion:
`Assertion failed: Section && "Cannot switch to a null section!", file:
llvm/lib/MC/MCStreamer.cpp, line: 1266 `
Reverted due to large .debug_line size regressions for some
configurations; work currently in place to improve the output of this
behaviour in PR #108251.
This patch also modifies two tests that were created or modified after
the original commit landed and are affected by the revert:
llvm/test/CodeGen/X86/pseudo_cmov_lower2.ll
llvm/test/DebugInfo/X86/empty-line-info.ll
This reverts commit 5fef40c2c4.
In degenerate but legal inputs, we can have functions that have no source
locations at all -- all the DebugLocs attached to instructions are empty.
LLVM didn't produce any source location for the function; with this patch
it will at least emit the function-scope source location. Demonstrated by
empty-line-info.ll
The XCOFF test modified has similar symptoms -- with this patch, the size
of the ".dwline" section grows a bit, thus shifting some of the file
internal offsets, which I've updated.
Inlining and zero-cost abstractions tend to produce volumes of debug
info with identical ranges. When built with full debugging information
(the equivalent of -g2) librustc_driver.so has 2.1 million entries in
.debug_ranges. But only 1.1 million of those entries are unique. While
in principle all duplicates could be eliminated with a hashtable,
checking to see if the new range is exactly identical to the previous
range and skipping a new addition if it is is sufficient to eliminate
99.99% of the duplicates. This reduces the size of librustc_driver.so's
.debug_ranges section by 35%, or the overall binary size a little more
than 1%.
When serialising to textual IR, there can be constant Values referred to
by DbgRecords that don't appear anywhere else, and have types hidden
even deeper in side them. Enumerate these when enumerating all types.
Test by Mikael Holmén.
One of the tests for the new fake use intrinsic are failing on darwin
buildbots due to relying on behaviour for their expected triple; this
commit adds explicit triples to the few remaining fake-use tests that
didn't have them.
Fixes commit 3d08ade (#86149).
Buildbot failures:
https://lab.llvm.org/buildbot/#/builders/23/builds/2505
This patch is part of a set of patches that add an `-fextend-lifetimes`
flag to clang, which extends the lifetimes of local variables and
parameters for improved debuggability. In addition to that flag, the
patch series adds a pragma to selectively disable `-fextend-lifetimes`,
and an `-fextend-this-ptr` flag which functions as `-fextend-lifetimes`
for this pointers only. All changes and tests in these patches were
written by Wolfgang Pieb (@wolfy1961), while Stephen Tozer (@SLTozer)
has handled review and merging. The extend lifetimes flag is intended to
eventually be set on by `-Og`, as discussed in the RFC
here:
https://discourse.llvm.org/t/rfc-redefine-og-o1-and-add-a-new-level-of-og/72850
This patch implements a new intrinsic instruction in LLVM,
`llvm.fake.use` in IR and `FAKE_USE` in MIR, that takes a single operand
and has no effect other than "using" its operand, to ensure that its
operand remains live until after the fake use. This patch does not emit
fake uses anywhere; the next patch in this sequence causes them to be
emitted from the clang frontend, such that for each variable (or this) a
fake.use operand is inserted at the end of that variable's scope, using
that variable's value. This patch covers everything post-frontend, which
is largely just the basic plumbing for a new intrinsic/instruction,
along with a few steps to preserve the fake uses through optimizations
(such as moving them ahead of a tail call or translating them through
SROA).
Co-authored-by: Stephen Tozer <stephen.tozer@sony.com>
Fixes the previous buildbot error by adding an explicit triple to the test,
ensuring that llc can produce a valid object file.
This reverts commit 926f0979af.
Fixes failure on the llvm-clang-aarch64-darwin buildbot:
https://lab.llvm.org/buildbot/#/builders/190/builds/4660/
The test mentioned does not rely on any unique property of X86, but does
rely on the layout of the basic blocks produced by llc, which varies
between targets. Although the test could be duplicated for other targets,
it seems unnecessary since the behaviour being tested is not
target-specific.
Fixes: https://github.com/llvm/llvm-project/issues/104695
This patch adds the is_stmt flag to line table entries for the first
instruction with a non-0 line location in each basic block, to ensure
that it will be used for stepping even if the last instruction in the
previous basic block had the same line number; this is important for
cases where the new BB is reachable from BBs other than the preceding
block.
SimplifyCFG sinking currently does not sink loads/stores of allocas,
because historically SROA was unable to handle the resulting IR. Since
then, SROA both learned to speculate loads/stores over selects and phis,
*and* SimplifyCFG sinking has been deferred to the end of the function
simplification pipeline, which means that SROA happens before it.
As such, I believe that this workaround should no longer be necessary.
Given how sensitive SimplifyCFG sinking seems to be, this patch takes a
very conservative step towards removing this, by allowing sinking if we
don't actually need to form a phi over the pointer argument.
This fixes https://github.com/llvm/llvm-project/issues/104567, where
sinking a store to an escaped alloca allows converting a switch into
arithmetic.
This patch fixes an issue in the GlobalOpt pass where deleting a global
variable fails to update the corresponding dbg.value and it references
an empty metadata entry. The SalvageDebugInfo() function has been
updated to handle dbg.value intrinsic when globals are deleted.
Since ce0c205813, we are doing that if a
single (LTO) compilation contains more than one compile unit, but the
same thing can happen if the non-lto and single-cu lto compilations,
typically when the CU ends up (nearly) empty. In my case, this happened
when LTO emptied two compile units.
Note that the source file name is already a part of the hash, so this
can only happen when a single file is compiled and linked twice into the
same application (most likely with different preprocessor defintiions).
While not exactly common, this pattern is used by some C code to
implement "templates".
The 2017 patch already hinted at the possibility of doing this
unconditionally, and this patch implements that. While the DWARF spec
hints at the option of using the type signature hashing algorithm for
the DWO_id purposes, AFAICT it does not actually require it, so I
believe this change is still conforming.
The relevant section of the spec is in Section 3.1.2 "Skeleton
Compilation Unit Entries" (in non-normative text):
```
The means of determining a compilation unit ID does not need to be
similar or related to the means of determining a type unit signature.
However, it should be suitable for detecting file version skew or other
kinds of mismatched files and for looking up a full split unit in a
DWARF package file (see Section 7.3.5 on page 190).
```
This patch makes LLVM emit line 0 source locations for nops emitted as
basic block padding.
---------
Co-authored-by: Orlando Cazalet-Hyams <orlando.hyams@sony.com>
Some of SIE's post-mortem analysis infrastructure currently makes use of
.debug_aranges, so we'd like to ensure the section's presence in
PlayStation binaries. The simplest way to do this is to force emission
when the debugger tuning is set to SCE (which is in turn typically
initialized from the target triple). This also simplifies the driver.
llvm/test/DebugInfo/debuglineinfo-path.ll has been marked as UNSUPPORTED
on PlayStation. When aranges are emitted, the DWARF in the test case is
such that relocations need to be applied to the aranges section in order
for symbolization to work. An alternative approach would be to implement
the application of relocations in DWARFDebugArangeSet. While experiments
show that this can be made to work with a modest patch, the test cases
would be rather contrived. Since I expect the only utility for such a
change would be to make this test case pass for PlayStation targets, and
few - if any - outside of PlayStation care about aranges, UNSUPPORTED
would seem to be a more practical option.
This was originally commited as 22eb290a96 (#99629) and later reverted
at 84658fb82b (#99711) due to test failures on SIE built bots. These
failures shouldn't recur due to 3b24e5d450 (#99897) and the
aforementioned change to debuglineinfo-path.ll.
SIE tracker: TOOLCHAIN-16951
Add `createLocalSymbol` to create a local, non-temporary symbol.
Different from `createRenamableSymbol`, the `Used` bit is ignored,
therefore multiple local symbols might share the same name.
Utilizing `createLocalSymbol` in AArch64 allows for efficient mapping
symbol creation with non-unique names, saving .strtab space.
The behavior matches GNU assembler.
Pull Request: https://github.com/llvm/llvm-project/pull/99836