We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This is the first change for existing usages of profi (e.g., CSSPGO):
- introducing an object holding the algorithmic parameters;
- some renaming of existing options;
- dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value;
- no changes in the output / tests.
Reviewed By: hoy
Differential Revision: https://reviews.llvm.org/D134756
Adds a whitespace in a debug message before printing out a
value in the SSAUpdaterBulk.
Without this, debugging can end up a bit cumbersome.
Differential Revision: https://reviews.llvm.org/D141262
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.
Reviewed by: jmorse
Differential Revision: https://reviews.llvm.org/D133926
Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
This is a recurring pattern: We want to find the nearest common
dominator (instruction) for two instructions, but currently only
provide an API for the nearest common dominator of two basic blocks.
Add an overload that accepts and return instructions.
When fetching allocation sizes, we almost always want to have the
size in bytes, but we were only providing an InBits API. Also add
the corresponding byte-based conjugate to save some *8 and /8
juggling everywhere.
NFC-ish. There is a functional change but the outputs are semantically
identical. Where we might've before replaced one operand with undef (which
means "this is a kill location marker") the use of `setKillLocation` will
replace all location operands with `undef` (which also means "this is a kill
location marker").
Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value
Reviewed By: StephenTozer
Differential Revision: https://reviews.llvm.org/D140904
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
Fixes https://github.com/llvm/llvm-project/issues/58565
The previous implementation visits operands in pre-order, but this does
not guarantee an instruction is visited before its uses. This can cause
instructions to be copied in the incorrect order. For example:
```
a = ...
b = add a, 1
c = add a, b
d = add b, a
```
Pre-order visits does not guarantee the order in which `a` and `b` are
visited. LoopUnrollAndJam may incorrectly insert `b` before `a`.
This patch implements post-order visits. By visiting dependencies first,
we guarantee that an instruction's dependencies are visited first.
Differential Revision: https://reviews.llvm.org/D140255
Switch simplification could sometimes fail to notice when an
intermediate case removal caused the switch condition to become
constant. This would cause the switch to be simplified into a
conditional branch rather than a direct branch.
Most of the time this didn't matter, except that occasionally
downstream parts of SimplifyCFG expect tautological branches to
already have been eliminated. The missed handling in switch
simplification would cause an assertion failure in the downstream
code.
Triggering the assertion failure is fairly sensitive to the exact
order of various simplifications.
Fixes https://github.com/llvm/llvm-project/issues/59768
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D140831
It isn't correct to always remove memprof metadata MIBs from the
original allocation call after inlining.
Let's say we have the following partial call graph:
C D
\ /
v v
B E
| /
v v
A
where A contains an allocation call. If both contexts including B have
the same allocation behavior, the context in the memprof metadata on the
allocation will be pruned, and we will have 2 MIBs with contexts:
A,B and A,E.
Previously, if we inlined A into B we propagate the matching MIBs onto
the inlined allocation call in B' (A,B in this case), and remove it from
the original out of line allocation in A. This is correct if we have a
single round of bottom up inlining.
However, in the compiler we can have multiple invocations of the inliner
pass (e.g. LTO). We may also inline non-bottom up with an alternative
inliner such as the ModuleInliner. In that case, we could end up first
inlining B into C, without having inlined A into B. The call graph then
looks like:
D
|
v
C' B E
\ | /
v v v
A
If we subsequently (perhaps on a later invocation of bottom up inlining)
inline A into B, the previous handling would propagate the memprof MIB
context A,B up into the inlined allocation in B', and remove it from the
original allocation in A. The propagation into B' is fine, however, by
removing it from A's allocation, we no longer reflect the context coming
from C'.
To fix this, simply prevent the removal of MIB from the original
allocation callsites.
Note that the memprof_inline.ll test has some changes to existing
checking to replace "noncold" with "notcold" in the metadata. The
corresponding CHECK was accidentally commented out in the old version
and thus this mistake was not previously detected.
Differential Revision: https://reviews.llvm.org/D140764
The function name was misleading - the expectation set both by the name
and by other members of Function (like isDeclaration or isIntrinsic)
would be that the function somehow would "be" "debug info for
profiling". But that's not the case - the property indicates (as the
comment over the declaration also explains) whether debug info should be
emitted (for profiling).
riscv64 wants callees to sign extend signed and unsigned int returns.
The caller can use this to avoid a sign extend if the result is
used by a comparison since riscv64 only has 64-bit compares.
InstCombine/SimplifyLibCalls aggressively turn memcmps that are only
used by an icmp eq 0 into bcmp, but we lose the signext attribute that
would have been present on the memcmp. This causes an unneeded sext.w
in the generated assembly.
This looks even sillier if bcmp is implemented alias to memcmp. In
that case, not only did we not get any savings by using bcmp, we added
an instruction.
This probably applies to other functions, this just happens to be
the one I noticed so far.
See also the discussion here https://discourse.llvm.org/t/can-we-preserve-signext-return-attribute-when-converting-memcmp-to-bcmp/67126
Reviewed By: efriedma
Differential Revision: https://reviews.llvm.org/D139901
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D135202
One of these two changes is exposing (or causing) some more miscompiles.
A reproducer is in progress, so reverting until resolved.
This reverts commit 428f36401b.
This facilitates replacing llvm::Any with std::any.
- Deprecate any_isa in favor of using any_cast(Any*) and checking for
nullptr because C++17 has no any_isa.
- Remove the assert from any_cast(Any*), so it returns nullptr if the
type is not correct. This aligns it with std::any_cast(any*).
Use any_cast(Any*) throughout LLVM instead of checks with any_isa.
This is the first part outlined in
https://discourse.llvm.org/t/rfc-switching-from-llvm-any-to-std-any/67176
Differential Revision: https://reviews.llvm.org/D139973
This is a fairly large changeset, but it can be broken into a few
pieces:
- `llvm/Support/*TargetParser*` are all moved from the LLVM Support
component into a new LLVM Component called "TargetParser". This
potentially enables using tablegen to maintain this information, as
is shown in https://reviews.llvm.org/D137517. This cannot currently
be done, as llvm-tblgen relies on LLVM's Support component.
- This also moves two files from Support which use and depend on
information in the TargetParser:
- `llvm/Support/Host.{h,cpp}` which contains functions for inspecting
the current Host machine for info about it, primarily to support
getting the host triple, but also for `-mcpu=native` support in e.g.
Clang. This is fairly tightly intertwined with the information in
`X86TargetParser.h`, so keeping them in the same component makes
sense.
- `llvm/ADT/Triple.h` and `llvm/Support/Triple.cpp`, which contains
the target triple parser and representation. This is very intertwined
with the Arm target parser, because the arm architecture version
appears in canonical triples on arm platforms.
- I moved the relevant unittests to their own directory.
And so, we end up with a single component that has all the information
about the following, which to me seems like a unified component:
- Triples that LLVM Knows about
- Architecture names and CPUs that LLVM knows about
- CPU detection logic for LLVM
Given this, I have also moved `RISCVISAInfo.h` into this component, as
it seems to me to be part of that same set of functionality.
If you get link errors in your components after this patch, you likely
need to add TargetParser into LLVM_LINK_COMPONENTS in CMake.
Differential Revision: https://reviews.llvm.org/D137838
The value map of last peeled iteration is computed within peelLoop API.
This patch exposes it for callers of peelLoop.
While this is not currently used by upstream passes, we have a usecase
downstream which benefits from this API update. Future users of peelLoop
can also use the ValueMap if needed.
Similar value maps are exposed by other loop utilities such as loop
cloning.
Differential Revision: https://reviews.llvm.org/D138228
This reverts commit 37b8f09a4b,
and returns commit 1bd0b82e50.
The miscompile was in InstCombine, and it has been addressed.
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
value() has undesired exception checking semantics and calls
__throw_bad_optional_access in libc++. Moreover, the API is unavailable without
_LIBCPP_NO_EXCEPTIONS on older Mach-O platforms (see
_LIBCPP_AVAILABILITY_BAD_OPTIONAL_ACCESS).
Use a consistent type for the operands() methods of different SCEV
types. Also make the API consistent by only providing operands(),
rather than also providin op_begin() and op_end() for some of them.
When a dbg.label is moved into a new function, its corresponding scope
should be preserved, with the exception of the subprogram at the end of
the scope chain, which should now be the subprogram of the destination
function. See D139671.
Differential Revision: https://reviews.llvm.org/D139849
When a dbg.value is moved into a new function, the corresponding
variable should have its entire scope chain reparented with the new
function. The current implementation drops the scope chain and replaces
it with a subprogram for the new function.
Differential Revision: https://reviews.llvm.org/D139671
This tries to approach the problem noted by @arsenm:
terrible codegen for `__builtin_fpclassify()`:
https://godbolt.org/z/388zqdE37
Just because the PHI in the common successor happens to have different
incoming values for these two blocks, doesn't mean we have to give up.
It's quite easy to deal with this, we just need to produce a select:
https://alive2.llvm.org/ce/z/000srb
Now, the cost model for this transform is rather overly strict,
so this will basically never fire. We tally all (over all preds)
the selects needed to the NumBonusInsts
Differential Revision: https://reviews.llvm.org/D139275
When a dbg.value instruction for a variable V is extracted into a new
function, the scope of the underlying variable should be set to the new
function iff V was in the scope of the old function (i.e. it hadn't been
inlined). Prior to this patch, the code extractor would always update
the scope of V.
Differential Revision: https://reviews.llvm.org/D139669