The naming has come up as a source of confusion in several recent reviews. onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.
We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.
This is part of D115924.
This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward.
One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred.
Differential Revision: https://reviews.llvm.org/D115003
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.
Differential Revision: https://reviews.llvm.org/D116110
Global ctor evaluation currently models memory as a map from Constant*
to Constant*. For this to be correct, it is required that there is
only a single Constant* referencing a given memory location. The
Evaluator tries to ensure this by imposing certain limitations that
could result in ambiguities (by limiting types, casts and GEP formats),
but ultimately still fails, as can be seen in PR51879. The approach
is fundamentally fragile and will get more so with opaque pointers.
My original thought was to instead store memory for each global as an
offset => value representation. However, we also need to make sure
that we can actually rematerialize the modified global initializer
into a Constant in the end, which may not be possible if we allow
arbitrary writes.
What this patch does instead is to represent globals as a MutableValue,
which is either a Constant* or a MutableAggregate*. The mutable
aggregate exists to allow efficient mutation of individual aggregate
elements, as mutating an element on a Constant would require interning
a new constant. When a write to the Constant* is made, it is converted
into a MutableAggregate* as needed.
I believe this should make the evaluator more robust, compatible
with opaque pointers, and a bit simpler as well.
Fixes https://github.com/llvm/llvm-project/issues/51221.
Differential Revision: https://reviews.llvm.org/D115530
This reverts commit fd4808887e.
This patch causes gcc to issue a lot of warnings like:
warning: base class ‘class llvm::MCParsedAsmOperand’ should be
explicitly initialized in the copy constructor [-Wextra]
AAPointerInfo, and thereby other places, can look already through
internal global and stack memory. This patch enables them to look
through heap memory returned by functions with a `noalias` return.
In the future we can look through `noalias` arguments as well but that
will require AAIsDead to learn that such memory can be inspected by the
caller later on. We also need teach AAPointerInfo about dominance to
actually deal with memory that might not be `null` or `undef`
initialized. D106397 is a first step in that direction already.
Reviewed By: kuter
Differential Revision: https://reviews.llvm.org/D109170
Similar to loads, we want to be aggressive when it comes to store
simplification. Not everything in LLVM handles dead stores well when
address space casts are involved, we can simply ask the Attributor to do
it for us though.
Reviewed By: tianshilei1992
Differential Revision: https://reviews.llvm.org/D109998
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.
Differential Revision: https://reviews.llvm.org/D113126
While we skipped uses in stores if we can find all copies of the value
when the memory is loaded, we did not correlate the use in the store
with the use in the load. So far this lead to less precise results in the
offset calculations which prevented deductions. With the new
EquivalentUseCB callback argument the user of checkForAllUses can be
informed of the correlation and act on it appropriately.
Differential Revision: https://reviews.llvm.org/D109662
This patch uses the return alignment attribute now present in the
`__kmpc_alloc_shared` runtime call to set the alignment of the shared
memory global created to replace it.
Depends on D115971
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D116319
This patch changes the HeapToStack optimization to attach the return alignment
attribute information to the created alloca instruction. This would cause
problems when replacing the heap allocation with an alloca did not respect the
alignment of the original heap allocation, which would typically be aligned on
an 8 or 16 byte boundary. Malloc calls now contain alignment attributes,
so we can use that information here.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D115888
Instead of using the ArgumentPromotion implementation, we now walk
call sites using checkForAllCallSites() and directly call
areTypesABICompatible() using the replacement types. I believe
that resolves the TODO in the code.
Differential Revision: https://reviews.llvm.org/D116033
Arguments to an indirect call is by definition outside the SCC, but there's no reason we can't use locally defined facts on the call site. This also has the nice effect of further simplifying the code.
Differential Revision: https://reviews.llvm.org/D116118
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.
This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.
This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).
Differential Revision: https://reviews.llvm.org/D116031
In regular LTO, analyze IR and discard unreachable functions when finding virtual call targets.
Differential Revision: https://reviews.llvm.org/D116056
This change allows us to infer access attributes (readnone, readonly) on arguments passed to vararg functions. Since there isn't a formal argument corresponding to the parameter, they'll never be considered part of the speculative SCC, but they can still benefit from attributes on the call site or the callee function.
The main motivation here is just to simplify the code, and remove some special casing. Previously, an indirect vararg call could return more precise results than an direct vararg call which is just weird.
Differential Revision: https://reviews.llvm.org/D115964
This fixes a bug where we would infer readnone/readonly for a function which passed a value to a function which could capture it. With the value captured in memory, the function could reload the value from memory after the call, and write to it. Inferring the argument as readnone or readonly is unsound.
@jdoerfert apparently noticed this about two years ago, and tests were checked in with 76467c4, but the issue appears to have never gotten fixed.
Since this seems like this issue should break everything, let me explain why the case is actually fairly narrow. The main inference loop over the argument SCCs only analyzes nocapture arguments. As such, we can only hit this when construction the partial SCCs. Due to that restriction, we can only hit this when we have either a) a function declaration with a manually annotated argument, or b) an immediately self recursive call.
It's also worth highlighting that we do have cases we can infer readonly/readnone on a capturing argument validly. The easiest example is a function which simply returns its argument without ever accessing it.
Differential Revision: https://reviews.llvm.org/D115961
Rename option MaxConstantsThreshold to MaxClonesThreshold. Not only is this
more descriptive, this is also in preparation of introducing another threshold
to analyse more than just 1 constant argument as we currently do, and to better
distinguish these options/thresholds.
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces
function references with CFI jump table references, which is a problem
for low-level code that needs the address of the actual function body.
For example, in the Linux kernel, the code that sets up interrupt
handlers needs to take the address of the interrupt handler function
instead of the CFI jump table, as the jump table may not even be mapped
into memory when an interrupt is triggered.
This change adds the no_cfi constant type, which wraps function
references in a value that LowerTypeTestsModule::replaceCfiUses does not
replace.
Link: https://github.com/ClangBuiltLinux/linux/issues/1353
Reviewed By: nickdesaulniers, pcc
Differential Revision: https://reviews.llvm.org/D108478
This reverts commit 9fd4f80e33.
This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
Instead of having the speculative path be the untaken path in the branch, explicitly have it return. This does require tail duplicating one call, but the resulting code is shorter and easier to understand. Also rewrite the condition using appropriate accessors.
We were being wildly inconsistent about what memory access was implied by an indirect function call. Depending on the call site attributes, you could get anything from a read, to unknown, to none at all. (The last was a miscompile.)
We were also always traversing the uses of a readonly indirect call. This is entirely unneeded as the indirect call does not capture. The callee might capture itself internally, but that has no implications for this caller. (See the nice explanation in the CaptureTracking comments if that case is confusing.)
Note that elsewhere in the same file, we were correctly computing the nocapture attribute for indirect calls. The changed case only resulted in conservatism when computing memory attributes if say the return value was written to.
Differential Revision: https://reviews.llvm.org/D115916
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.
We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.
Differential Revision: https://reviews.llvm.org/D115924
This is a follow up of D115458 and truncates the worklist of actual arguments
that can be specialised to 'MaxConstantsThreshold' candidates if
MaxConstantsThreshold was exceeded. Thus, this changes the behaviour of option
-func-specialization-max-constants. Before it didn't specialise at all when
this threshold was exceeded, but now it specialises up to MaxConstantsThreshold
candidates from the sorted worklist.
Differential Revision: https://reviews.llvm.org/D115509
This mostly is the same code that is refactored to decouple the cost and
benefit analysis. The biggest change is top-level function specializeFunctions
that now drives the transformation more like this:
specializeFunctions() {
Cost = getSpecializationCost(F);
calculateGains(F, Cost);
specializeFunction(F);
}
while this is just a restructuring, it helps the functional change in
calculateGains. I.e., we now sort the candidates based on the expected
specialisation gain, which we didn't do before. For this, a book keeping struct
ArgInfo was introduced. If we have a list of N candidates, but we only want
specialise less than N as set by option -func-specialization-max-constants, we
sort the list and discard the candidates that give the least benefit.
Given a formal argument, this change results in selecting the best actual
argument(s). This is NFC'ish in that this shouldn't change the current output
(hence no test change here), but in follow ups starting with D115509, it
should and I want to go one step further and compare all functions and all
arguments, which will mostly build on top of this refactoring and change.
Differential Revision: https://reviews.llvm.org/D115458
Modules that are not compiled with pseudo probe enabled can still be compiled with a sample profile input, such as in LTO postlink where other modules are probed. Since the profile is unrelated to the current modules, we should warn instead of error out the compilation.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115642
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.
For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts, since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.
In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.
A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes, a nested profile shown as below can also have a CFG checksum.
```
main:1968679:12
2: 24
3: 28 _Z5funcAi:18
3.1: 28 _Z5funcBi:30
3: _Z5funcAi:1467398
0: 10
1: 10 _Z8funcLeafi:11
3: 24
1: _Z8funcLeafi:1467299
0: 6
1: 6
3: 287884
4: 287864 _Z3fibi:315608
15: 23
!CFGChecksum: 138828622701
!Attributes: 2
!CFGChecksum: 281479271677951
!Attributes: 2
```
Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
- Changes in the sample loader to support probe-based nested profile.
I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.
Test Plan:
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D115205
Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Creating
the aliases unconditionally resulted in an unexpected size increase in
a Chrome helper binary:
https://bugs.chromium.org/p/chromium/issues/detail?id=1261715
This is caused by the compiler being unable to drop unused code now
referenced by the alias in module-level inline assembly. This change
adds a .set_conditional assembly extension, which emits an assignment
only if the target symbol is also emitted, avoiding phantom references
to functions that could have otherwise been dropped.
This is an alternative to the solution proposed in D112761.
Reviewed By: pcc, nickdesaulniers, MaskRay
Differential Revision: https://reviews.llvm.org/D113613