These tests take less than a second each,
so they are fine to have as tests.
It will be useful to add similar ones
for `prepare-and-assemble-snippet` x repetition mode,
but there we have a problem of not being truly ISA set independent...
For some reason the global variable reduction was trying to delete
use instructions. This broke the verifier if the user was a terminator,
since the block now no longer has one. It doesn't make sense for this
reduction to delete the users, so just stop doing that.
While "skip measurements mode" is super useful for test coverage,
i've come to discover it's trade-offs. It still calls back-end
to actually codegen the target assembly, and that is what is taking
80%+ of the time regardless of whether or not we skip the measurements.
On the other hand, just being able to see that exegesis can come up
with a snippet to measure something, is already very useful,
and takes maybe a second for a all-opcode sweep.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D140702
By default, all benchmark results are analysed, but sometimes it may be useful
to only look at those that to not involve memory, or vice versa. This option
allows to either keep all benchmarks, or filter out (ignore) either all the
ones that do involve memory (involve instructions that may read or write to
memory), or the opposite, to only keep such benchmarks.
Personally, so far i have found the benchmarks that do involve memory
to have dubious results. But the ones that do not involve memory,
are generally actionable. So i would like to have a toggle to declutter results.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D140734
Main thing I was unsure about was to whether try to delete the now
dead landing blocks, or leave that for the unreachable block reduction.
Personality function is not reduced, but that should be a separate
reduction on the function.
Fixes#58815
We required the test and input arguments for --print-delta-passes
which is unhelpful. Also, start printing the help output if no
arguments were supplied.
It looks like there's more sophisticated ways to accomplish this with
the opt library, but it was less work to manually emit these errors.
The verifier should fail if constrained intrinsics are used in
functions with strictfp, but that patch hasn't been pushed yet.
Ideally we would be able to analyze the function body to see if any
constrained intrinsics were used, but we seem to be missing a utility
function to check for any constrained ops.
This change refactors the parte parsing logic to operate on StringRefs
of the part data rather than starting from an offset and splicing down.
It also improves some of the error reporting around part layout.
Specifically, this code now reports a distinct error if there isn't
enough data in the buffer to store the part size and it reports an
error if the parts overlap.
Reviewed By: bob80905
Differential Revision: https://reviews.llvm.org/D139681
The current reduction tries all or nothing elimination of named
metadata. I noticed in one case where one of the module flags was
necessary, but it left the rest. Reduce the individual operands of
named metadata nodes that are known to behave like lists. Be
conservative since some named metadata may have more specific verifier
requirements for the operands.
It seems the execute implementations have gone out of their way to
produce inconsistent error messages. The unix version explicitly
checks if the file exists before trying to execute. The windows
version checks if it's executable. I don't understand why they
wouldn't just try the execution and check the error code.
Fixes missing test coverage for the failed to execute case. However,
this test fails to verify the newline is printed. I can't figure out
how to get FileCheck to match the trailing newline.
getRawNumberOfSymbols() assumes that a symbol table exists, which isn't
always guaranteed, while getNumberOfSymbols() handles and tolerates objects
without a symbol table. When there is a symbol table, both methods return
the same value.
Also add a test to ensure we don't regress in this regard. The test
generates a basic COFF object with symbols and overrides the symbol table
pointer with zeros to craft the input required to verify llvm-objcopy works
as expected in this scenario.
The movprfx is a vector copy, so it doesn't access memory. Set the
value of hasSideEffects 0 to avoid return true for the hasUnmodeledSideEffects(),
which will block the machine scheduler which load/store instructions.
Reviewed By: paulwalker-arm
Differential Revision: https://reviews.llvm.org/D140680
Add file with Xtensa ELF relocations. Add Xtensa support to ELF.h,
ELFObject.h and ELFYAML.cpp. Add simple test of Xtensa ELF representation in YAML.
Differential Revision: https://reviews.llvm.org/D64827
This reverts e4b126cc2d and
e57ab8fe91.
This previously depended on where the target happened to construct (or
not) the MachineFunctionInfo during the initial MIR construction. Now
that the MachineFunctionInfo is consistently constructed at
MachineFunction construction time, this should always work.
This patch adds handling of debug_macinfo/debug_macro tables to the DWARFLinker.
It uses already existing code for reading tables from DWARFDebugMacro.h.
It adds new code writing tables into the DwarfStreamer::emitMacroTables.
Differential Revision: https://reviews.llvm.org/D140223
These are going to waste a lot of time and produce clutter when we're
bulk introducing crashes. Add a flag to disable this behavior in case
this matters to a reproducer.
To improve deduplication across CUs within a single object file,
2b747241a6 changed the way we track ODR canonical candidates. The
result is that some assumptions no longer hold. Because of the
aforementioned change, the following condition
assert(Ref > InputDIE.getOffset());
in DWARFLinker::DIECloner::cloneDieReferenceAttribute is no longer an
invariant. An example of a situation where this assertion no longer
holds is when we have decided to replace a backward reference in the
current CU with a forward reference in a subsequent CU. The ODR
canonical DIE hasn't been emitted yet, but the references DIE has an
offset smaller than the current DIE. The assertion is only true if the
referenced DIE was the ODR canonical DIE, which is no longer guaranteed
to be part of the same CU.
Big thanks to Alexey for putting a test together.
Differential revision: https://reviews.llvm.org/D138176
LOCK + CMPXCHG8/CMPXCHG16 variants still need overriding as they are not completely correct - already much better though
Based off llvm-exegesis captures, confirmed with Agner + uops.info
This change is rather more invasive than intended. The main intention
here is to make CommandLine.cpp not rely on llvm/Support/Host.h. Right
now, this reliance is only in 3 superficial places:
- Choosing how to expand response files (in two places)
- Printing the default triple and current CPU in `--version` output.
The built in version system has a method for adding "extra version
printers", commonly used by several tools (such as llc) to report the
registered targets in the built version of LLVM. It was reasonably easy
to move the logic for printing the default triple and current CPU into
a similar function, and register it with any relevant binaries.
The incompatible change here is that now, even if
LLVM_VERSION_PRINTER_SHOW_HOST_TARGET_INFO is defined, most binaries
will no longer print out the default target triple and cpu when provided
with `--version`, for instance llvm-as and llvm-dis. This breakage is
intended, but the changes in this patch keep printing the default target
and detected in `llc` and `opt` as these were remarked as important
binaries in the LLVM install.
The change to expanding response files may also be controversial, but I
believe that these macros should correspond exactly to the host triple
introspection used before.
Differential Revision: https://reviews.llvm.org/D137837
The set/reset/complement RMW variants use +1uop compared to the BT read-only instructions
Based off llvm-exegesis captures, confirmed with Agner + uops.info
We can't serialize if for latency measurements, and when measuring uops,
it crashes so hard even `CrashRecoveryContext` doesn't stop it.
Looks like *this* was the last crasher, now `--opcode-index=-1`
succeeds for all three benchmark modes here.
At least with Duplication repetitor.
This was a regression from 17e202424c.
Previously we'd gracefully handle missing measurements,
but that handling got accidentally lost during the code move,
and we'd assert.
What we want to do, is to discard all measurements (from all repetitors
in a given config) if any of them failed, but do append the snippet,
and do emit the empty measurement.
This patch makes ADRP target label address calculations the same as
label address calculations (see AArch64InstPrinter::printAdrpLabel()).
Otherwise the target label looks misleading as ADRP's immediate offset is,
actually, not an offset to this PC, but an offset to the current PC's
page address in pages.
See for example, `llvm-objdump/ELF/AArch64/pcrel-address.yaml`.
Before this patch the target label `<_start+0x80>` represents the
address `0x200100 + 0x80` while `0x220000` is expected.
Note that with this patch llvm-objdump output matches GNU objdump.
Reviewed By: simon_tatham
Differential Revision: https://reviews.llvm.org/D139407
I'm absolutely flabbergasted by this.
I was absolutely sure this worked.
But apparently not.
When outputting to the file, we'd write a single `InstructionBenchmark`
to it, and then close the file. And for the next `InstructionBenchmark`,
we'd reopen the file, truncating it in process,
and thus the first `InstructionBenchmark` would be gone...
This change introduces a missing frame inferrer aiming at fixing missing frames. It current only handles missing frames due to the compiler tail call elimination (TCE) but could also be extended to supporting other scenarios like frame pointer omission. When a tail called function is sampled, the caller frame will be missing from the call chain because the caller frame is reused for the callee frame. While TCE is beneficial to both perf and reducing stack overflow, a workaround being made in this change aims to find back the missing frames as much as possible.
The idea behind this work is to build a dynamic call graph that consists of only tail call edges constructed from LBR samples and DFS-search for a unique path for a given source frame and target frame on the graph. The unique path will be used to fill in the missing frames between the source and target. Note that only a unique path counts. Multiple paths are treated unreachable since we don't want to overcount for any particular possible path.
A switch --infer-missing-frame is introduced and defaults to be on.
Some testing results:
- 0.4% perf win according to three internal benchmarks.
- About 2/3 of the missing tail call frames can be recovered, according to an internal benchmark.
- 10% more profile generation time.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D139367
This used %zu to print a uint64_t type. z is for size_t so on 32 bit
we tried to treat it as a 32 bit number.
Use PRIu64 instead to print as 64 bit everywhere.
a1b4e13cff updated this to use
the target= syntax.
However the triple for our Arm bots is usually like:
armv8l-unknown-linux-gnueabihf
With "eabihf" on the end. I assume before we just checked for
"linux-gnu" being in the triple at all but now it is a proper
regex match.
Add .* on the end to account for the ABI tag on the end.