Commit Graph

5315 Commits

Author SHA1 Message Date
Simon Pilgrim
6ba0b9f68a [X86][SLM] Fix PBLENDVB uops and throughput
SLM PBLENDVB is just as bad as BLENDVPD/PS - so model it as such, fixing the rr vs rm uops diff as well. The Intel AoM appears to have a copy+paste typo with PBLENDW, it doesn't match Agner or InstLatX64.

Noticed while investigating some of the weird discrepancies reported by the D103695 helper script (SLM had much better vector shift throughputs than it should).
2021-09-03 11:31:29 +01:00
gbreynoo
e28cd75a50 [OptTable] Reapply Improve error message output for grouped short options
This reapplies 71d7fed3bc which was
reverted by 3e2bd82f02. This change
includes the fix for breaking the sanitizer bots.

As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770
2021-09-03 11:13:52 +01:00
Hongtao Yu
7ca8030030 [CSSPGO] Enable loading MD5 CS profile.
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.

There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.

Reviewed By: wenlei, wmi

Differential Revision: https://reviews.llvm.org/D108342
2021-09-01 09:19:47 -07:00
Kevin Athey
3e2bd82f02 Revert "[OptTable] Improve error message output for grouped short options"
This reverts commit 71d7fed3bc.

Reason: broke sanitizer bots
more info: https://reviews.llvm.org/D108770
2021-08-31 14:06:11 -07:00
wlei
964053d56f [llvm-profgen] Support LBR only perf script
This change aims at supporting LBR only sample perf script which is used for regular(Non-CS) profile generation.  A LBR perf script includes a batch of LBR sample which starts with a frame pointer and a group of 32 LBR entries is followed. The FROM/TO LBR pair and the range between two consecutive entries (the former entry's TO and the latter entry's FROM) will be used to infer function profile info.

An example of LBR perf script(created by `perf script -F ip,brstack -i perf.data`)
```
           40062f 0x40062f/0x4005b0/P/-/-/9  0x400645/0x4005ff/P/-/-/1  0x400637/0x400645/P/-/-/1 ...
           4005d7 0x4005d7/0x4005e5/P/-/-/8  0x40062f/0x4005b0/P/-/-/6  0x400645/0x4005ff/P/-/-/1 ...
           ...
```

For implementation:
 - Extended a new child class `LBRPerfReader` for the sample parsing, reused all the functionalities in `extractLBRStack` except for an extension to parsing leading instruction pointer.
 - `HybridSample` is reused(just leave the call stack empty) and the parsed samples is still aggregated in `AggregatedSamples`. After that, range samples, branch sample, address samples are computed and recorded.
 - Reused `ContextSampleCounterMap` to store the raw profile, since it's no need to aggregation by context, here it just registered one sample counter with a fake context key.
 - Unified to use `show-raw-profile` instead of `show-unwinder-output` to dump the intermediate raw profile, see the comments of the format of the raw profile. For CS profile, it remains to output the unwinder output.

Profile generation part will come soon.

Differential Revision: https://reviews.llvm.org/D108153
2021-08-31 13:28:17 -07:00
gbreynoo
71d7fed3bc [OptTable] Improve error message output for grouped short options
As seen in https://bugs.llvm.org/show_bug.cgi?id=48880 the current
implementation for parsing grouped short options can return unclear
error messages. This change fixes the example given in the ticket in
which a flag is incorrectly given an argument. Also when parsing a
group we now keep reading past the first incorrect option and output
errors for all incorrect options in the group.

Differential Revision: https://reviews.llvm.org/D108770
2021-08-31 16:41:08 +01:00
Simon Pilgrim
7ec7272b80 [MCA][X86] Add basic coverage for icelake arch
Copy the skylake-avx512 tests for icelake-server coverage.

Add icelake/rocketlake/tigerlake test coverage to the relevent generic tests as well.
2021-08-31 12:20:09 +01:00
Hongtao Yu
b9db70369b [CSSPGO] Split context string to deduplicate function name used in the context.
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.

A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is  time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.

When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.

Testing
This is no-diff change in terms of code quality and profile content (for text profile).

For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.

The compile time of ads is dropped by 25%.

Differential Revision: https://reviews.llvm.org/D107299
2021-08-30 20:09:29 -07:00
Keith Smiley
b5da3120b8 [llvm-cov][NFC] Add test for coverage-prefix-map remappings
This test covers acts as a regression test for these fixes:

c75a0a1e9d
dd388ba3e0

Differential Revision: https://reviews.llvm.org/D108805
2021-08-30 17:19:57 -07:00
Haowei Wu
31e61c58b0 [ifs] Add option to hide undefined symbols
This change add an option to llvm-ifs to hide undefined symbols from
its output.

Differential Revision: https://reviews.llvm.org/D108428
2021-08-27 11:15:56 -07:00
Roman Lebedev
d4d459e747 [X86] AMD Zen 3: MULX w/ mem operand has the same throughput as with reg op
Exegesis is faulty and sometimes when measuring throughput^-1
produces snippets that have loop-carried dependencies,
which must be what caused me to incorrectly measure it originally.

After looking much more carefully, the inverse throughput should match
that of the MULX w/ reg op.

As per llvm-exegesis measurements.
2021-08-27 13:27:05 +03:00
Roman Lebedev
0f04936a2d [X86] AMD Zen 3: MULX produces low part of the result in 3cy, +1cy for high part
As per llvm-exegesis measurements.
2021-08-27 13:27:05 +03:00
Roman Lebedev
db2c6cd99c [NFC][X86][MCA] AMD Zen 3: improve MULX test coverage
Latency for MULX isn't right
2021-08-27 13:27:05 +03:00
Andrea Di Biagio
4a5b191703 [X86][MCA] Address the latest issues with MULX reported in PR51495.
It turns out that SchedWrite WriteIMulH was always assigned to the low half of
the result of a MULX (rather than to the high half).

To avoid confusion, this patch swaps the two MULX writes in the tablegen
definition of MULX32/64.  That way, write names better describe what they
actually refer to; this also avoids further complications if in future we decide
to reuse the same MulH writes to also model other scalar integer multiply
instructions.  I also had to swap the latency values for the two MULX writes to
make sure that the change is effectively an NFC. In fact, none of the existing
x86 tests were affected by this small refactoring.

This patch also fixes a bug in MCA: a wrong latency value was propagated for
instructions that perform multiple writes to a same register.  This last issue
was found by Roman while testing MULX on targets that define a different latency
for the Low/High part of the result.

Differential Revision: https://reviews.llvm.org/D108727
2021-08-26 12:08:20 +01:00
David Green
6ffc6951a3 [AArch64] Remove unpredictable from narrowing instructions.
Like other similar instructions the xtn2 family do not have side
effects, and explicitly marking them as such can help improve scheduling
freedom.
2021-08-26 09:43:44 +01:00
David Green
9474b03d41 [AArch64] Add a Cortex-A55 NEON scheduler test case. 2021-08-26 09:43:44 +01:00
Esme-Yi
b21ed75e10 [llvm-readobj][XCOFF] Add support for --needed-libs option.
Summary: This patch is trying to add support for llvm-readobj
--needed-libs option under XCOFF.
For XCOFF, the needed libraries can be found from the Import
File ID Name Table of the Loader Section.
Currently, I am using binary inputs in the test since yaml2obj
does not yet support for writing the Loader Section and the
import file table.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D106643
2021-08-26 07:17:06 +00:00
Fangrui Song
4a66a11286 [LLVMgold.so][test] Make comdat-nodeduplicate.ll work with binutils<2.27 2021-08-25 16:59:06 -07:00
Andrea Di Biagio
6181427bb9 [X86][MCA] Add more tests for MULX (PR51495).
llvm-mca still reports a wrong latency for the case where
the two destination registers of MULX are the same.
2021-08-25 21:28:21 +01:00
Alfonso Sánchez-Beato
cdd407286a [llvm-objcopy] [COFF] Consider section flags when adding section
The --set-section-flags option was being ignored when adding a new
section. Take it into account if present.

Fixes https://llvm.org/PR51244

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D106942
2021-08-25 23:11:41 +03:00
Rong Xu
24201b6437 [SampleFDO] Set ProfileIsFS bit properly from the internal option
We have "-profile-isfs" internal option for text, binary, and
compactbinary format (mostly for debug and test purpose). We
need to set the related flag in FunctionSamples so that ProfileIsFS
is written to the header in extbinary format.

Differential Revision: https://reviews.llvm.org/D108707
2021-08-25 09:07:34 -07:00
Wenlei He
a6f15e9a49 [CSSPGO] Use probe inline tree to track zero size fully optimized context for pre-inliner
This is a follow up diff for BinarySizeContextTracker to track zero size for fully optimized inlinee. When an inlinee is fully optimized away, we won't be able to get its size through symbolizing instructions, hence we will treat the corresponding context size as unknown. However by traversing the inlined probe forest, we know what're original inlinees regardless of optimization. If a context show up in inlined probes, but not during symbolization, we know that it's fully optimized away hence its size is zero instead of unknown. It should provide more accurate size cost estimation for pre-inliner to make better inline decisions in llvm-profgen.

Differential Revision: https://reviews.llvm.org/D108350
2021-08-25 09:01:11 -07:00
Andrea Di Biagio
5f848b311f [X86][SchedModel] Fix latency the Hi register write of MULX (PR51495).
Before this patch, WriteIMulH reported a latency value which is correct for the
RR variant of MULX, but not for the RM variant.

This patch fixes the issue by introducing a new WriteIMulHLd, which is meant to
be used only by the RM variant of MULX.

Differential Revision: https://reviews.llvm.org/D108701
2021-08-25 16:12:09 +01:00
Vyacheslav Zakharin
2e192ab1f4 [CodeExtractor] Preserve topological order for the return blocks.
Differential Revision: https://reviews.llvm.org/D108673
2021-08-25 08:09:01 -07:00
Andrea Di Biagio
fe13b81ed9 [X86][NFC] Pre-commit llvm-mca tests for PR51495.
WriteIMulH reports an incorrect latency for RM variants of MULX.
2021-08-25 14:17:17 +01:00
Fangrui Song
9b96b0865d llvm-xray {convert,extract}: Add --demangle
No demangling may be a better default in the future.
Add `--demangle` for migration convenience.

Reviewed By: Enna1

Differential Revision: https://reviews.llvm.org/D108100
2021-08-24 13:35:19 -07:00
Patrick Holland
e4ebfb5786 [MCA] Adding an AMDGPUCustomBehaviour implementation.
This implementation allows mca to model the desired behaviour of the s_waitcnt
instruction. This patch also adds the RetireOOO flag to the AMDGPU instructions
within the scheduling model. This flag is only used by mca and allows
instructions to finish out-of-order which helps mca's simulations more closely
model the actual device.

Differential Revision: https://reviews.llvm.org/D104730
2021-08-24 13:33:58 -07:00
Arthur Eubanks
d2e103644b [llvm-reduce] Remove various module data
This removes the data layout, target triple, source filename, and module
identifier when possible.

Reviewed By: swamulism

Differential Revision: https://reviews.llvm.org/D108568
2021-08-24 09:45:31 -07:00
David Green
50f4ae58eb [AArch64] Correct store ReadAdrBase operand
It appears that the Read operand for stores was being placed on the
first operand (the stored value) not the address base. This adds a
ReadST for the stored value operand, allowing the ReadAdrBase to
correctly act upon the address.

Differential Revision: https://reviews.llvm.org/D108287
2021-08-23 21:07:55 +01:00
David Green
955c9437fd [AArch64] Add Scheduling tests for Load/Store ReadAdv operands. 2021-08-23 21:07:55 +01:00
Alexey Lapshin
07d44cc0b1 [DWARF][Verifier] Do not add child DieRangeInfo with empty address range to the parent.
verifyDieRanges function checks for the intersected address ranges.
It adds child DieRangeInfo into parent DieRangeInfo to check
whether children have overlapping address ranges. It is safe to not add
DieRangeInfo with empty address range into parent's children list.
This decreases the number of children which should be navigated and as a result
decreases execution time(parents having a lot of children with empty ranges
spend much time navigating them). For this command: "llvm-dwarfdump --verify clang-repl"
execution time decreased from 220 sec till 75 sec.

Differential Revision: https://reviews.llvm.org/D107554
2021-08-22 19:39:21 +03:00
Christian Fetzer
9116211d18 [Coverage][llvm-cov] Correctly export branch coverage in LCOV format
Commit 9f2967bcfe introduced support for
branch coverage including export to the LCOV format.

This commit corrects the LCOV field name for branches from BFH to BRH.
The mistake seems to have slipped in as typo because the correct field
name BRH is used in the comment section at the beginning of the file.

Differential Revision: https://reviews.llvm.org/D108358
2021-08-20 13:44:25 -05:00
Andrea Di Biagio
35d4292a73 [X86][SchedModels] Fix missing ReadAdvance for MULX and ADCX/ADOX (PR51494)
Before this patch, instructions MULX32rm and MULX64rm were missing a ReadAdvance
for the implicit read of register EDX/RDX.  This patch fixes the issue, and it
also introduces a new SchedWrite for the two variants of MULX. The general idea
behind this last change is to eventually decrease the number of InstRW in the
scheduling models.

This patch also adds a ReadAdvance for the implicit read of EFLAGS in ADCX/ADOX.

Differential Revision: https://reviews.llvm.org/D108372
2021-08-20 17:39:51 +01:00
Maryam Benimmar
2cdfd0b259 [AIX][XCOFF] 64-bit relocation reading support
Support XCOFFDumper relocation reading support
This patch is part of D103696 partition

Reviewed By: daltenty, Helflym

Differential Revision: https://reviews.llvm.org/D104646
2021-08-19 21:56:57 -04:00
Andrzej Warzynski
dcc6b7b1d5 [OptTable] Refine how printHelp treats empty help texts
Currently, `printHelp` behaves differently for options that:
  * do not define `HelpText` (such options _are not printed_), and
  * define its `HelpText` as `HelpText<"">` (such options _are printed_).
In practice, both approaches lead to no help text and `printHelp` should
treat them consistently. This patch addresses that by making
`printHelpt` check the length of the help text to be printed.

All affected tests have been updated accordingly. The option definitions
for llvm-cvtres have been updated with a short description or "Not
  implemented" for options that are ignored by the tool.

Differential Revision: https://reviews.llvm.org/D107557
2021-08-19 09:30:15 +00:00
Wenlei He
eca03d2768 [CSSPGO] Track and use context-sensitive post-optimization function size to drive global pre-inliner in llvm-profgen
This change enables llvm-profgen to use accurate context-sensitive post-optimization function byte size as a cost proxy to drive global preinline decisions.

To do this, BinarySizeContextTracker is introduced to track function byte size under different inline context during disassembling. In preinliner, we can not query context byte size under switch `context-cost-for-preinliner`. The tracker uses a reverse trie to keep size of functions under different context (callee as parent, caller as child), and it can give best/longest possible matching context size for given input context.

The new size cost is off by default. There're a few TODOs that needs to addressed: 1) avoid dangling string from `Offset2LocStackMap`, which will be addressed in split context work; 2) using inlinee's entry probe to make sure we have correct zero size for inlinee that's completely optimized away after inlining. Some tuning is also needed.

Differential Revision: https://reviews.llvm.org/D108180
2021-08-18 22:50:57 -07:00
Andrea Di Biagio
2d53e54f0e [X86][NFC] Pre-commit tests for PR51494 2021-08-18 19:55:21 +01:00
Maryam Benimmar
7151a8aada [PowerPC][AIX] llvm-readobj: Convert some errors to warnings.
Report warnings rather than errors, so that llvm-readobj doesn't bail
out on malformed inputs.

Differential Revision: https://reviews.llvm.org/D106783
2021-08-18 11:04:08 -04:00
Xu Mingjie
168ee72718 [NFC][llvm-xray] add a llvm-xray convert option no-demangle
When option `--symbolize` is true, llvm-xray convert will demangle function
name on default. This patch adds a llvm-xray convert option `no-demangle` to
determine whether to demangle function name when symbolizing function ids from
the input log.

Reviewed By: MaskRay, smeenai

Differential Revision: https://reviews.llvm.org/D108019
2021-08-18 12:22:04 +08:00
Jozef Lawrynowicz
108ba4f4a4 [llvm-readobj] Refactor ELFDumper::printAttributes()
The current implementation of printAttributes makes it fiddly to extend
attribute support for new targets.

By refactoring the code so all target specific variables are
initialized in a switch/case statement, it becomes simpler to extend
attribute support for new targets.

Reviewed By: jhenderson, MaskRay

Differential Revision: https://reviews.llvm.org/D107968
2021-08-17 13:28:31 -07:00
Tozer
6d5e31baaa Fix 2: [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes an issue with revision 5c6f748c and ad40cb88.

Adds an mcpu argument to the test command, preventing an invalid default
CPU from being used on some platforms.
2021-08-17 17:13:21 +01:00
Fangrui Song
c56b4cfd4b [llvm-objdump] -T: print symbol versions
Similar to D94907 (llvm-nm -D).

The output will match GNU objdump 2.37.
Older versions don't use ` (version)` for undefined symbols.

Reviewed By: jhenderson

Differential Revision: https://reviews.llvm.org/D108097
2021-08-17 09:10:50 -07:00
Tozer
ad40cb8821 Fix: [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes an issue with revision 5c6f748c.

Move the test added in the above commit into the X86 folder, ensuring
that it is only run on targets where its triple is valid.
2021-08-17 16:16:19 +01:00
Tozer
5c6f748cbc [MCParser] Correctly handle CRLF line ends when consuming line comments
Fixes issue: https://bugs.llvm.org/show_bug.cgi?id=47983

The AsmLexer currently has an issue with lexing line comments in files
with CRLF line endings, in which it reads the carriage return as being
part of the line comment. This causes an error for certain valid comment
layouts; this patch fixes this by excluding the carriage return from the
line comment.

Differential Revision: https://reviews.llvm.org/D90234
2021-08-17 15:52:51 +01:00
Fangrui Song
54e76cb17a [split-file] Default to --no-leading-lines
It turns out that the --leading-lines may be a bad default.
[[#@LINE+-num]] is rarely used.
2021-08-16 19:23:11 -07:00
Hongtao Yu
f27fee623d [SamplePGO][NFC] Dump function profiles in order
Sample profiles are stored in a string map which is basically an unordered map. Printing out profiles by simply walking the string map doesn't enforce an order. I'm sorting the map in the decreasing order of total samples to enable a more stable dump, which is good for comparing two dumps.

Reviewed By: wenlei, wlei

Differential Revision: https://reviews.llvm.org/D108147
2021-08-16 17:22:30 -07:00
Fangrui Song
935a6d4024 [test] Change llvm-xray options to use the preferred double-dash forms and change -f= to -f 2021-08-15 21:19:04 -07:00
David Blaikie
44d0a99a12 Add missing triple for test 2021-08-15 12:32:12 -07:00
David Blaikie
62a4c2c10e DWARFVerifier: Check section-relative references at the end of the section
This ensures that debug_types references aren't looked for in
debug_info section.

Behavior is still going to be questionable in an unlinked object file -
since cross-cu references could refer to symbols in another .debug_info
(or, in theory, .debug_types) chunk - but if a producer only uses
ref_addr to refer to things within the same .debug_info chunk in an
object file (eg: whole program optimization/LTO - producing two CUs into
a single .debug_info section in an object file - the ref_addrs there
could be resolved relative to that .debug_info chunk, not needing to
consider comdat  (DWARFv5 type units or other creatures) chunks of
.debug_info, etc)
2021-08-15 11:40:24 -07:00
David Blaikie
2af4db7d5c Migrate DWARFVerifier tests to lit-based yaml instead of gtest with embedded yaml
Improves maintainability (edit/modify the tests without recompiling) and
error messages (previously the failure would be a gtest failure
mentioning nothing of the input or desired text) and the option to
improve tests with more checks.

(maybe these tests shouldn't all be in separate files - we could
probably have DWARF yaml that contains multiple errors while still being
fairly maintainable - the various invalid offsets (ref_addr, rnglists,
ranges, etc) could probably be all in one test, but for the simple sake
of the migration I just did the mechanical thing here)
2021-08-13 19:09:41 -07:00