The motivating use case is to support import the function declaration
across modules to construct call graph edges for indirect calls [1]
when importing the function definition costs too much compile time
(e.g., the function is too large has no `noinline` attribute).
1. Currently, when the compiled IR module doesn't have a function
definition but its postlink combined summary contains the function
summary or a global alias summary with this function as aliasee, the
function definition will be imported from source module by IRMover. The
implementation is in FunctionImporter::importFunctions [2]
2. In order for FunctionImporter to import a declaration of a function,
both function summary and alias summary need to carry the def / decl
state. Specifically, all existing summary fields doesn't differ across
import modules, but the def / decl state of is decided by
`<ImportModule, Function>`.
This change encodes the def/decl state in `GlobalValueSummary::GVFlags`.
In the subsequent changes
1. The indexing step `computeImportForModule` [3]
will compute the set of definitions and the set of declarations for each
module, and passing on the information to bitcode writer.
2. Bitcode writer will look up the def/decl state and sets the state
when it writes out the flag value. This is demonstrated in
https://github.com/llvm/llvm-project/pull/87600
3. Function importer will read the def/decl state when reading the
combined summary to figure out two sets of global values, and IRMover
will be updated to import the declaration (aka linkGlobalValuePrototype [4])
into the destination module.
- The next change is https://github.com/llvm/llvm-project/pull/87600
[1] mentioned in rfc https://discourse.llvm.org/t/rfc-for-better-call-graph-sort-build-a-more-complete-call-graph-by-adding-more-indirect-call-edges/74029#support-cross-module-function-declaration-import-5
[2] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L1608-L1764)
[3] 3b337242ee/llvm/lib/Transforms/IPO/FunctionImport.cpp (L856)
[4] 3b337242ee/llvm/lib/Linker/IRMover.cpp (L605)
This improves a pattern that occurs in 531.deepsjeng_r. Reducing the
dynamic instruction count by 0.5%.
This may be possible to improve in SelectionDAG, but given the special
cases around shXadd formation, it's not obvious it can be done in a
robust way without adding multiple special cases.
I've used a GEP with 2 indices because that mostly closely resembles the
motivating case. Most of the test cases are the simplest GEP case. One
test has a logical right shift on an index which is closer to the
deepsjeng code. This requires special handling in isel to reverse a
DAGCombiner canonicalization that turns a pair of shifts into (srl (and
X, C1), C2).
Improvements include
* Enable `ListeningSocket::accept` to timeout after a specified amount
of time or block indefinitely
* Enable `ListeningSocket::createUnix` to handle instances where the
target socket address already exists and differentiate between
situations where the existing file does and does not already have a
bound socket
* Doxygen comments
Functionality added for the module build daemon
---------
Co-authored-by: Michael Spencer <bigcheesegs@gmail.com>
As noted when #82404 was pushed (canonicalizing `sitofp` -> `uitofp`),
different signedness on fp casts can have dramatic performance
implications on different backends.
So, it makes to create a reliable means for the backend to pick its
cast signedness if either are correct.
Further, this allows us to start canonicalizing `sitofp`- > `uitofp`
which may easy middle end analysis.
Closes#86141
Saves several MB of memory in larger applications after linking finishes
by clearing DenseMap storage that is empty. This does not attempt to
shrink partially full materialization infos. The assumption is that
adding more after linking finishes is rare.
rdar://126145336
This removes the old legacy PM "intervals" analysis pass (aka
IntervalPartition). It also removes the associated Interval and
IntervalIterator help classes.
Reasons for removal:
1) The pass is not used by llvm-project (not even being tested by
any regression tests).
2) Pass has not been ported to new pass manager, which at least
indicates that it isn't used by the middle-end.
3) ASan reports heap-use-after-free on
++I; // After the first one...
even if false is passed to intervals_begin. Not sure if that is
a false positive, but it makes the code a bit less trustworthy.
`getDebugNamesBucketAndHashCount` lures users to provide an array to
compute the bucket count using an O(n log n) sort. This is inefficient
as hash table based uniquifying is faster.
The performance issue matters less for Clang as the number of names is
relatively small. For `ld.lld --debug-names`, I plan to compute the
unique hash count as a side product of parallel entry pool computation,
and I just need a function to suggest a bucket count.
Update the remaining uses of `std::begin`/`end` functions to
`adl_beging`/`end`. This is to make the behavior all the utility
functions consistent, rather than trying to fix a specific usecase.
This tries to add some costs for the shuffle in a ST3/ST4 instruction,
which are represented in LLVM IR as store(interleaving shuffle). In
order to detect the store, it needs to add a CxtI context instruction to
check the users of the shuffle. LD3 and LD4 are added, LD2 should be a
zip1 shuffle, which will be added in another patch.
It should help fix some of the regressions from #87510.
rldimi is 64-bit instruction, due to backward compatibility, it needs to
be expanded into series of rotate and masking in 32-bit environment. In
the future, we may improve bit permutation selector and remove such
direct codegen.
-fsanitize=function emits a signature and function hash before a
function. Similar to 7f6e2c9, these can be sheared off when
`.subsections_via_symbols` is used.
This change uses the same technique 7f6e2c9 introduced for prefixes:
emitting a symbol for the metadata, then marking the actual function
entry as an .alt_entry symbol.
Currently only machine function passes support `insertPass`, but it
seems to be enough, all targets tune their pipelines when adding machine
function passes.
This attempts to standardize and extend some of the insert vector
element lowering. Most notably:
- More types are handled by splitting illegal vectors.
- The index type for G_INSERT_VECTOR_ELT is canonicalized to
TLI.getVectorIdxTy(), similar to extact_vector_element.
- Some of the existing patterns now have the index type specified to
make sure they can apply to GISel too.
- The C++ selection code has been removed, relying on tablegen patterns.
- G_INSERT_VECTOR_ELT with small GPR input elements are pre-selected to
use a i32 type, allowing the existing patterns to apply.
- Variable index inserts are lowered in post-legalizer lowering,
expanding into a stack store and reload.
https://bugs.llvm.org/show_bug.cgi?id=51735https://github.com/llvm/llvm-project/issues/51077
In the given test case:
```
4 ...
5 void bar() {
6 int End = 777;
7 int Index = 27;
8 char Var = 1;
9 for (; Index < End; ++Index)
10 ;
11 nop(Index);
12 }
13 ...
```
Missing local variable `Index` after loop `Induction Variable Elimination`. When adding a breakpoint at line `11`, LLDB does not have information on the variable. But it has info on `Var` and `End`.
Add an ExecutionSession state verifier, enabled under EXPENSIVE_CHECKS, that can
be used to identify inconsistent session state to assist in tracking down bugs.
This initial version was motivated by investigation of the EDU-update bug that
was fixed in a671ceec33.
rdar://125376708
Lift the requirement that rbegin/rend must be member functions. Also
allow the rbegin/rend to be found through Argument Dependent Lookup
(ADL) and add `adl_rbegin`/`adl_rend` to STLExtras.
It's useful to have some significant build options visible in the
version when investigating problems with a specific compiler artifact.
This makes it easy to see if assertions, expensive checks, sanitizers,
etc. are enabled when checking a compiler version.
Example config line output:
Build configuration: +unoptimized, +assertions, +asan, +ubsan
This patch adds a new flag: `--preserve-input-debuginfo-format`
This flag instructs the tool to not convert the debug info format
(intrinsics/records) of input IR, but to instead determine the format of
the input IR and overwrite the other format-determining flags so that we
process and output the file in the same format that we received it in.
This flag is turned off by llvm-link, llvm-lto, and llvm-lto2, and
should be turned off by any other tool that expects to parse multiple IR
modules and have their debug info formats match.
The motivation for this flag is to allow tools to not convert the debug
info format - verify-uselistorder and llvm-reduce, and any downstream
tools that seek to test or mutate IR as-is, without applying extraneous
modifications to the input. This is a necessary step to using debug
records by default in all (other) LLVM tools.
This reverts the revert commit 589c7abb03.
This patch includes a fix for any-of reductions and epilogue
vectorization. Extra test coverage for the issue that caused the revert
has been added in 399ff08e29.
--------------------------------
Original commit message:
Update AnyOf reduction code generation to only keep track of the AnyOf
property in a boolean vector in the loop, only selecting either the new
or start value in the middle block.
The patch incorporates feedback from https://reviews.llvm.org/D153697.
This fixes the #62565, as now there aren't multiple uses of the
start/new values.
Fixes https://github.com/llvm/llvm-project/issues/62565
PR: https://github.com/llvm/llvm-project/pull/78304
And use it to print the correct default OpenMP version for flang and
flang -fc1.
This change adds an optional `HelpTextsForVariants` to options. This
allows you to change the help text that gets shown in documentation and
`--help` based on the program its being generated for.
As `OptTable` needs to be constexpr compatible, I have used a std::array
of help text variants. Each entry is:
(list of visibilities) - > help text string
So for the OpenMP version we have (flang, fc1) -> "OpenMP version for
flang is...".
So you can have multiple visibilities use the same string. The number of
entries is currently set to 1, and the number of visibilities per entry
is 2, because that's the maximum we need for now. The code is written so
we can increase these numbers later, and the unused elements will be initialised.
I have not applied this to group descriptions just because I don't know
of one that needs changing. It could easily be enabled for those too if
needed. There are minor changes to them just to get it all to compile.
This approach of storing many help strings per option in the 1 driver
library seemed preferable to making a whole new library for Flang (even
if that would mostly be including stuff from Clang).
Start of #83882
- `Builtins.td` - add the `hlsl` `all` elementwise builtin.
- `CGBuiltin.cpp` - Show a use case for CGHLSLUtils via an `all`
intrinsic codegen.
- `CGHLSLRuntime.cpp` - move `thread_id` to use CGHLSLUtils.
- `CGHLSLRuntime.h` - Create a macro to help pick the right intrinsic
for the backend.
- `hlsl_intrinsics.h` - Add the `all` api.
- `SemaChecking.cpp` - Add `all` builtin type checking
- `IntrinsicsDirectX.td` - Add the `all` `dx` intrinsic
- `IntrinsicsSPIRV.td` - Add the `all` `spv` intrinsic
Work still needed:
- `SPIRVInstructionSelector.cpp` - Add an implementation of `OpAll` for
`spv_all` intrinsic
This patch introduces generating VP intrinsics in the Loop Vectorizer.
Currently the Loop Vectorizer supports vector predication in a very
limited capacity via tail-folding and masked load/store/gather/scatter
intrinsics. However, this does not let architectures with active vector
length predication support take advantage of their capabilities.
Architectures with general masked predication support also can only take
advantage of predication on memory operations. By having a way for the
Loop Vectorizer to generate Vector Predication intrinsics, which (will)
provide a target-independent way to model predicated vector
instructions. These architectures can make better use of their
predication capabilities.
Our first approach (implemented in this patch) builds on top of the
existing tail-folding mechanism in the LV (just adds a new tail-folding
mode using EVL), but instead of generating masked intrinsics for memory
operations it generates VP intrinsics for loads/stores instructions. The
patch adds a new VPlanTransforms to replace the wide header predicate
compare with EVL and updates codegen for load/stores to use VP
store/load with EVL.
Other important part of this approach is how the Explicit Vector Length
is computed. (VP intrinsics define this vector length parameter as
Explicit Vector Length (EVL)). We use an experimental intrinsic
`get_vector_length`, that can be lowered to architecture specific
instruction(s) to compute EVL.
Also, added a new recipe to emit instructions for computing EVL. Using
VPlan in this way will eventually help build and compare VPlans
corresponding to different strategies and alternatives.
Differential Revision: https://reviews.llvm.org/D99750
commit d89914f30b
Author: Kazu Hirata <kazu@google.com>
Date: Wed Apr 3 21:48:38 2024 -0700
changed RecordWriterTrait to a template class with IndexedVersion as a
template parameter. This patch changes the class back to a
non-template one while retaining the ability to serialize multiple
versions.
The reason I changed RecordWriterTrait to a template class was
because, even if RecordWriterTrait had IndexedVersion as a member
variable, RecordWriterTrait::EmitKeyDataLength, being a static
function, would not have access to the variable.
Since OnDiskChainedHashTableGenerator calls EmitKeyDataLength as:
const std::pair<offset_type, offset_type> &Len =
InfoObj.EmitKeyDataLength(Out, I->Key, I->Data);
we can make EmitKeyDataLength a member function, but we have one
problem. InstrProfWriter::writeImpl calls:
void insert(typename Info::key_type_ref Key,
typename Info::data_type_ref Data) {
Info InfoObj;
insert(Key, Data, InfoObj);
}
which default-constructs RecordWriterTrait without a specific version
number. This patch fixes the problem by adjusting
InstrProfWriter::writeImpl to call the other form of insert instead:
void insert(typename Info::key_type_ref Key,
typename Info::data_type_ref Data, Info &InfoObj)
To prevent an accidental invocation of the default constructor of
RecordWriterTrait, this patch deletes the default constructor.
--compress-sections is similar to --compress-debug-sections but applies
to arbitrary sections.
* `--compress-sections <section>=none`: decompress sections
* `--compress-sections <section>=[zlib|zstd]`: compress sections with zlib/zstd
Like `--remove-section`, the pattern is by default a glob, but a regex
when --regex is specified.
For `--remove-section` like options, `!` prevents matches and is not
dependent on ordering (see `ELF/wildcard-syntax.test`). Since
`--compress-sections a=zlib --compress-sections a=none` naturally allows
overriding, having an order-independent `!` would be confusing.
Therefore, `!` is disallowed.
Sections within a segment are effectively immutable. Report an error for
an attempt to (de)compress them. `SHF_ALLOC` sections in a relocatable
file can be compressed, but linkers usually reject them.
Link: https://discourse.llvm.org/t/rfc-compress-arbitrary-sections-with-ld-lld-compress-sections/71674
Pull Request: https://github.com/llvm/llvm-project/pull/85036
All callers have been changed to use the new simpler overload with an
implicit modulus of 2^BitWidth. The old form was never used or tested
with non-power-of-two modulus anyway.
The current APInt::multiplicativeInverse takes a modulus which can be
any value, but all in-tree callers use a power of two. Moreover, most
callers want to use two to the power of the width of an existing APInt,
which is awkward because 2^N is not representable as an N-bit APInt.
Add a new overload of multiplicativeInverse which implicitly uses
2^BitWidth as the modulus.
Fixes issue noted at: https://github.com/llvm/llvm-project/pull/86274
When loading bitcode lazily, we may request debug intrinsics be upgraded
to debug records during the module parsing phase; later on we perform
this upgrade when materializing the module functions. If we change the
module's debug info format between parsing and materializing however,
then the requested upgrade is no longer correct and leads to an
assertion. This patch fixes the issue by adding an extra check in the
autoupgrader to see if the upgrade is no longer suitable, and either
exit-out or fall back to the correct intrinsic->intrinsic upgrade if one
is required.
The class `ScopedDbgInfoFormatSetter` was added as a convenient way to
temporarily change the debug info format of a function or module, as
part of IR printing; since this process is repeated in a number of other
places, this patch uses the format-setter class in those places as well.
Reland #85231 after fixing build failure
https://lab.llvm.org/buildbot/#/builders/186/builds/15631.
Use `PRIx64` for format output of `uint64_t` as hex.
Original PR description below.
This adds support for `GNU_PROPERTY_AARCH64_FEATURE_PAUTH` feature (as
defined in https://github.com/ARM-software/abi-aa/pull/240) handling in
llvm-readobj and llvm-readelf. The following constants for supported
platforms are also introduced:
- `AARCH64_PAUTH_PLATFORM_INVALID = 0x0`
- `AARCH64_PAUTH_PLATFORM_BAREMETAL = 0x1`
- `AARCH64_PAUTH_PLATFORM_LLVM_LINUX = 0x10000002`
For the llvm_linux platform, output of the tools contains descriptions
of PAuth features which are enabled/disabled depending on the version
value. Version value bits correspond to the following `LangOptions`
defined in #85232:
- bit 0: `PointerAuthIntrinsics`;
- bit 1: `PointerAuthCalls`;
- bit 2: `PointerAuthReturns`;
- bit 3: `PointerAuthAuthTraps`;
- bit 4: `PointerAuthVTPtrAddressDiscrimination`;
- bit 5: `PointerAuthVTPtrTypeDiscrimination`;
- bit 6: `PointerAuthInitFini`.
Support for `.note.AARCH64-PAUTH-ABI-tag` is dropped since it's deleted
from the spec in ARM-software/abi-aa#250.