Commit Graph

32519 Commits

Author SHA1 Message Date
spupyrev
61eb12e1f4 [BOLT] introducing profi params
We want to use profile inference (**profi**) in BOLT for stale profile matching.
To this end, I am making a few changes modifying the interface of the algorithm.
This is the first change for existing usages of profi (e.g., CSSPGO):
- introducing an object holding the algorithmic parameters;
- some renaming of existing options;
- dropped unused option, SampleProfileInferEntryCount, as we don't plan to change its default value;
- no changes in the output / tests.

Reviewed By: hoy

Differential Revision: https://reviews.llvm.org/D134756
2023-01-09 12:03:28 -08:00
Alexey Bataev
755282ec1e [SLP][NFC]Move getExtractIndex function for future changes, NFC. 2023-01-09 09:53:01 -08:00
Sanjay Patel
0eedc9e567 [InstCombine] bitrev (zext i1 X) --> select X, SMinC, 0
https://alive2.llvm.org/ce/z/ZXCtgi

This breaks the infinite combine loop for issue #59897,
but we may still need more changes to avoid those loops.
2023-01-09 12:27:37 -05:00
Sanjay Patel
2dcbd740ee [InstCombine] reduce smul.ov with i1 types to 'and'
https://alive2.llvm.org/ce/z/5tLkW6

There's still a miscompile bug as shown in issue #59876 / D141214 .
2023-01-09 10:27:15 -05:00
Nikita Popov
59f91ddf90 [InstCombine] Preserve alignment in atomicrmw -> store fold
Preserve the alignment of the original atomicrmw, rather than using
the ABI alignment.

The same problem exists for loads, but that code is being removed
in D141277 anyway.
2023-01-09 15:37:24 +01:00
Jamie Hill-Daniel
6b9317f52a [InstCombine] Fold zero check followed by decrement to usub.sat
Fold (a == 0) : 0 ? a - 1 into usub.sat(a, 1).

Differential Revision: https://reviews.llvm.org/D140798
2023-01-09 14:22:25 +01:00
Noah Goldstein
6d839621da [InstCombine] Canonicalize (A & B_Pow2) eq/ne B_Pow2 patterns
1. A & B_Pow2 != B_Pow2 -> A & B_Pow2 == 0
   https://alive2.llvm.org/ce/z/KVUej4

2. A & B_Pow2 == B_Pow2 -> A & B_Pow2 != 0
   https://alive2.llvm.org/ce/z/PVv9FR

This allows the patterns to more easily be analyzed elsewhere.

Differential Revision: https://reviews.llvm.org/D141090
2023-01-09 12:48:28 +01:00
Ben Mudd
1f11d1bd12 [DebugInfo] Fix jump threading failing to update cloned dbg.values
This is a patch to fix duplicated dbg.values in the JumpThreading pass not
pointing towards their local value, and instead towards the variable in the
original block.
JumpThreadingPass::cloneInstructions is the changed function to target metadata
as well as normal cloned values.

Reviewed By: jmorse, StephenTozer

Differential Revision: https://reviews.llvm.org/D140006
2023-01-09 11:42:33 +00:00
Noah Goldstein
e6375ca6dc [InstCombine] Fix potentially buggy code in ((%x & C) == 0) --> %x u< (-C) transform
While demanded bits constant shrinking appears to prevent this in
practice right now, it is principally possible for C2 to have
set bits that are known not-needed (zeroable). See: D140858

`+` will overflow here, `|` will get the right logic.

Differential Revision: https://reviews.llvm.org/D141089
2023-01-09 11:44:11 +01:00
Thomas Symalla
6c1cf201be [NFC] Missing whitespace in SSAUpdaterBulk debug output.
Adds a whitespace in a debug message before printing out a
value in the SSAUpdaterBulk.
Without this, debugging can end up a bit cumbersome.

Differential Revision: https://reviews.llvm.org/D141262
2023-01-09 10:15:25 +01:00
Max Kazantsev
957952dbf2 [JumpThreading] Preserve profile metadata during select unfolding
Jump threading can replace select and unconditional branch with
conditional branch, but when doing so loses profile information.

This destructive transform can eventually lead to a performance
degradation due to folding of branches in
shouldFoldCondBranchesToCommonDestination as branch probabilities
are no longer known.

Patch by Roman Paukner!

Differential Revision: https://reviews.llvm.org/D138132
Reviewed By: mkazantsev
2023-01-09 16:14:58 +07:00
Max Kazantsev
ba7af0bf69 [NFC] Add missing 'static' notion in createReplacement 2023-01-09 14:13:05 +07:00
chenglin.bi
33794cffcf [InstCombine] Fold logic-and/logic-or by distributive laws part2
Follow up https://reviews.llvm.org/D139408, support `and/or+select` patterns
X && Z || Y && Z --> (X || Y) && Z
https://alive2.llvm.org/ce/z/EMCkBG
https://alive2.llvm.org/ce/z/Q-YRvr
https://alive2.llvm.org/ce/z/SFkVQc
https://alive2.llvm.org/ce/z/S9MCuJ
https://alive2.llvm.org/ce/z/KZ7zzz

(X || Z) && (Y || Z) --> (X && Y) || Z
https://alive2.llvm.org/ce/z/Ggpa8-
https://alive2.llvm.org/ce/z/nhQRLY
https://alive2.llvm.org/ce/z/zpmEnq
https://alive2.llvm.org/ce/z/7omsrf
https://alive2.llvm.org/ce/z/CWBzBp

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D139630
2023-01-09 10:21:17 +08:00
Shilei Tian
acd22b2751 [AAUnderlyingObjects] Introduce an AA for getting underlying objects of a pointer
This patch introduces a new AA `AAUnderlyingObjects`. It is basically like a wrapper
AA of the function `AA::getAssumedUnderlyingObjects`, but it can recursively do
query if the underlying object is an indirect access, such as a phi node or a select
instruction.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D141164
2023-01-08 16:45:50 -05:00
Sanjay Patel
21d3871b7c [InstCombine] fold not-shift of signbit to icmp+zext, part 2
Follow-up to:
6c39a3aae1

That converted a pattern with ashr directly to icmp+zext, and
this updates the pattern that we used to convert to.

This canonicalizes to icmp for better analysis in the minimum case
and shortens patterns where the source type is not the same as dest type:
https://alive2.llvm.org/ce/z/tpXJ64
https://alive2.llvm.org/ce/z/dQ405O

This requires an adjustment to an icmp transform to avoid infinite looping.
2023-01-08 12:04:09 -05:00
Benjamin Kramer
b6942a2880 [NFC] Hide implementation details in anonymous namespaces 2023-01-08 17:37:02 +01:00
Florian Hahn
78914e8c32 [VPlan] Keep entries in worklist in sinkScalarOperands.
Not removing the entries ensures that duplicates are avoided,
reducing the number of iterations.
2023-01-08 15:52:57 +00:00
luxufan
eda8e999dd [InstCombine] Combine (zext a) mul (zext b) to llvm.umul.with.overflow only if mul has NUW flag
Fixes: https://github.com/llvm/llvm-project/issues/59836

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D141031
2023-01-08 14:41:59 +08:00
Alexey Bataev
996ad44b97 [SLP][NFC]Fix compile build by declaring ArrayRef, NFC.
Fix compiler build reported in https://lab.llvm.org/buildbot#builders/243/builds/218
2023-01-06 17:01:48 -08:00
Alexey Bataev
cc17e93178 [SLP][NFC]Remove unused variables, NFC. 2023-01-06 16:55:54 -08:00
Alexey Bataev
7439e1b2de [SLP]Fix incorrect reordering of clustered scalars.
The new mask represents the order, not the mask itself. At first, need
to treat as the order, convert to mask and only after that reorder
gathered scalars to build correct clustered order.

Differential Revision: https://reviews.llvm.org/D141161
2023-01-06 16:04:09 -08:00
Stephen Tozer
c383f4d655 [DebugInfo] Allow non-stack_value variadic expressions and use in DBG_INSTR_REF
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.

Reviewed by: jmorse

Differential Revision: https://reviews.llvm.org/D133926
2023-01-06 19:31:10 +00:00
James Y Knight
1ae36b1387 Remove special cases for invoke of non-throwing inline-asm.
Non-throwing inline asm infers the nounwind attribute in
instcombine. Thus, it can be handled in the same manner as
non-throwing target functions are generally. Further special casing is
unnecessary complexity.
2023-01-06 13:53:10 -05:00
Alexey Bataev
9b5f62685a [SLP]Fix cost of the broadcast buildvector/gather.
Need to include the cost of the initial insertelement to the cost of the
broadcasts. Also, need to adjust the cost of the gather/buildvector if
the element is inserted into poison/undef vector.

Differential Revision: https://reviews.llvm.org/D140498
2023-01-06 09:25:05 -08:00
Nikita Popov
c60149b49e Revert "[Dominator] Add findNearestCommonDominator() for Instructions (NFC)"
This reverts commit 7f0de9573f.

This is missing handling for !isReachableFromEntry() blocks, which
may be relevant for some callers. Revert for now.
2023-01-06 17:36:01 +01:00
Nikita Popov
7f0de9573f [Dominator] Add findNearestCommonDominator() for Instructions (NFC)
This is a recurring pattern: We want to find the nearest common
dominator (instruction) for two instructions, but currently only
provide an API for the nearest common dominator of two basic blocks.

Add an overload that accepts and return instructions.
2023-01-06 17:06:25 +01:00
David Green
161bfa5f53 [LoopFlattening] Check for extra uses on Mul
Similar to D138404, we were not guarding against extra uses of the Mul.
In most cases other checks would catch the issue due to unsupported
instructions in the outer loop, but certain non-canonical loop forms
could still get through.

Fixes #59339

Differential Revision: https://reviews.llvm.org/D141114
2023-01-06 15:32:38 +00:00
Guillaume Chatelet
87b6b347fc Revert D141134 "[NFC] Only expose getXXXSize functions in TypeSize"
The patch should be discussed further.

This reverts commit dd56e1c92b.
2023-01-06 15:27:50 +00:00
Guillaume Chatelet
dd56e1c92b [NFC] Only expose getXXXSize functions in TypeSize
Currently 'TypeSize' exposes two functions that serve the same purpose:
 - getFixedSize / getFixedValue
 - getKnownMinSize / getKnownMinValue

source : bf82070ea4/llvm/include/llvm/Support/TypeSize.h (L337-L338)

This patch offers to remove one of the two and stick to a single function in the code base.

Differential Revision: https://reviews.llvm.org/D141134
2023-01-06 15:24:52 +00:00
Nikita Popov
07bf39df80 [MemCpyOpt] Extract processStoreOfLoad() method (NFC) 2023-01-06 16:11:10 +01:00
Nikita Popov
a6a526ec54 [IR] Add AllocaInst::getAllocationSize() (NFC)
When fetching allocation sizes, we almost always want to have the
size in bytes, but we were only providing an InBits API. Also add
the corresponding byte-based conjugate to save some *8 and /8
juggling everywhere.
2023-01-06 15:36:16 +01:00
Florian Hahn
68469a80cb [LV] Disable runtime unrolling for vectorized loops.
This patch adds metadata to disable runtime unrolling to the vectorized
loop. If runtime unrolling/interleaving is considered profitable, LV
will interleave the loop directly. There should be no need to perform
runtime unrolling at a later stage.

Note that we already add metadata to disable runtime unrolling to the
scalar loop after vectorization.

The additional unrolling unnecessarily increases code size and compile
time. In addition to that we have several bug reports of unncessary
runtime unrolling for vectorized loops, e.g. PR40961

Compile-time improvements:

  NewPM-O3: -1.04%
  NewPM-ReleaseThinLTO: -0.59%
  NewPM-ReleaseLTO-g: -0.97%

https://llvm-compile-time-tracker.com/compare.php?from=ce1be13a868d0f8afa367975558c1a6175cce33a&to=78bc2e67f22e9e10e61cdb6cdac4bb857d95eb1b&stat=instructions:u

Fixes #40306.

Reviewed By: lebedev.ri, nikic

Differential Revision: https://reviews.llvm.org/D115261
2023-01-06 10:56:17 +00:00
OCHyams
775af51209 [DebugInfo] Prefer setKillLocation rather than replacing operands with undef
NFC-ish. There is a functional change but the outputs are semantically
identical. Where we might've before replaced one operand with undef (which
means "this is a kill location marker") the use of `setKillLocation` will
replace all location operands with `undef` (which also means "this is a kill
location marker").

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140904
2023-01-06 10:11:14 +00:00
OCHyams
042107494d [DebugInfo][NFC] Rename is/setUndef to is/setKilllocation
These names better reflect the semantics and also the implementation, since
it's not just "undef" operands that are sentinels used to signal that the debug
intrinsic terminates dominating locations definitions.

Related to https://discourse.llvm.org/t/auto-undef-debug-uses-of-a-deleted-value

Reviewed By: StephenTozer

Differential Revision: https://reviews.llvm.org/D140903
2023-01-06 09:15:02 +00:00
Chuanqi Xu
65e3398869 [NFC] [Coroutines] Move collectFrameAlloca to decrease the times to iterate the function
Previously in collectFrameAllocas, we will iterate every instruction in
the Function and we will iterate the function again later. It is
redundnt.
2023-01-06 16:38:07 +08:00
Peter Rong
1db51d8eb2 [Transform] Rewrite LowerSwitch using APInt
This rewrite fixes https://github.com/llvm/llvm-project/issues/59316.

Previously LowerSwitch uses int64_t, which will crash on case branches using integers with more than 64 bits.
Using APInt fixes this problem. This patch also includes a test

Reviewed By: RKSimon

Differential Revision: https://reviews.llvm.org/D140747
2023-01-05 14:30:42 -08:00
Valery N Dmitriev
6d677c0b3d [SLP] Unify GEP cost modeling for load, store and GEP nodes.
Make a separate routine for GEPs cost calculation and make
the approach uniform across load, store and GEP tree nodes.
Additional issue fixed is GEP cost savings were applied twice
for ScatterVectorize nodes (aka gather load) making them look
unrealistically profitable for vectorization.

Differential Revision: https://reviews.llvm.org/D140789
2023-01-05 10:11:36 -08:00
serge-sans-paille
38818b60c5 Move from llvm::makeArrayRef to ArrayRef deduction guides - llvm/ part
Use deduction guides instead of helper functions.

The only non-automatic changes have been:

1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).

Per reviewers' comment, some useless makeArrayRef have been removed in the process.

This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.

Differential Revision: https://reviews.llvm.org/D140955
2023-01-05 14:11:08 +01:00
David Green
586fd86b0a [LoopVectorizer] Fix inloop reductions mask placement
The validation of vplans could fail if an inloop reduction was created
with a block-in mask that did not dominate the reduction. This makes
sure that the insert point is set when creating the mask, to ensure it
dominates the reduction.

Differential Revision: https://reviews.llvm.org/D141003
2023-01-05 11:37:37 +00:00
Dawid Jurczak
7e6c7562cb [NFC][Coroutines] Build DominatorTree only once before collecting frame allocas (PR58650)
Assuming that collecting frame allocas doesn't modify CFG we can safely
move DominatorTree construction outside loop and avoid expensive computations.

Differential Revision: https://reviews.llvm.org/D140818
2023-01-05 10:32:28 +01:00
Joshua Cao
629d880dc5 [LoopUnrollAndJam] Visit phi operand dependencies in post-order
Fixes https://github.com/llvm/llvm-project/issues/58565

The previous implementation visits operands in pre-order, but this does
not guarantee an instruction is visited before its uses. This can cause
instructions to be copied in the incorrect order. For example:

```
a = ...
b = add a, 1
c = add a, b
d = add b, a
```

Pre-order visits does not guarantee the order in which `a` and `b` are
visited. LoopUnrollAndJam may incorrectly insert `b` before `a`.

This patch implements post-order visits. By visiting dependencies first,
we guarantee that an instruction's dependencies are visited first.

Differential Revision: https://reviews.llvm.org/D140255
2023-01-05 00:05:49 -08:00
Akira Hatanaka
665e47777d [ObjC][ARC] Fix non-deterministic behavior in ProvenanceAnalysis
If the second value passed to relatedSelect is a select, check whether
neither arm of the select is related to the first value.
2023-01-04 21:29:42 -08:00
chenglin.bi
87b2c760d0 [Instcombine] fold logic ops to select
(C & X) | ~(C | Y) -> C ? X : ~Y

https://alive2.llvm.org/ce/z/4yLh_i

Reviewed By: spatel

Differential Revision: https://reviews.llvm.org/D139080
2023-01-05 12:04:35 +08:00
Joshua Cao
50be285944 [LoopUnrollAndJam] Forget scalar evolution dispositions. Do no explicitly forget subloop.
Fixes https://github.com/llvm/llvm-project/issues/58454

Scalar evolution dispositions need to be forgotten to pass verification.

We do not need to forget the subloop since it is automatically forgotten
when forgetting the parent loop.

Differential Revision: https://reviews.llvm.org/D140953
2023-01-04 19:35:50 -08:00
Owen Anderson
733740b189 Fix a phase-ordering problem in SimplifyCFG.
Switch simplification could sometimes fail to notice when an
intermediate case removal caused the switch condition to become
constant. This would cause the switch to be simplified into a
conditional branch rather than a direct branch.

Most of the time this didn't matter, except that occasionally
downstream parts of SimplifyCFG expect tautological branches to
already have been eliminated. The missed handling in switch
simplification would cause an assertion failure in the downstream
code.

Triggering the assertion failure is fairly sensitive to the exact
order of various simplifications.

Fixes https://github.com/llvm/llvm-project/issues/59768

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D140831
2023-01-04 16:47:13 -07:00
Augie Fackler
0676156f81 Revert "[VPlan] Also consider operands of sink candidates in same block."
This reverts commit aa2414729e.

Previously-valid IR from a tensorflow test case (as shown on the
Diffusion revision for aa2414729e) started
hanging in the loop-vectorize pass. Reverting to keep everyone working.
2023-01-04 16:17:13 -05:00
Alexey Bataev
a1b18946f9 [SLP]Fix incorrect shuffle results because of missing shuffle mask
analysis.

Missed the analysis of the shuffle mask when trying to analyze the
operands of the shuffle instruction during peeking through shuffle
instructions.
2023-01-04 13:10:40 -08:00
Fangrui Song
73c9f167ff [LowerTypeTests] Add ENDBR to .cfi.jumptable for x86 Indirect Branch Tracking
Similar to D81251 for AArch64 BTI. This fixes `./a.out test` for

```
void foo(void) {}
void bar(void) {}
static void (*fptr)(void);
int main(int argc, char **argv) {
  if (argv[1]) fptr = foo;
  else fptr = bar;
  fptr();
}
```

`clang -flto=thin -fvisibility=hidden -fsanitize=cfi-icall -fcf-protection=branch -fuse-ld=lld a.cc`

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D140655
2023-01-04 12:28:07 -08:00
Sanjay Patel
c43a7874a3 [InstCombine] don't let 'exact' inhibit demanded bits folds for udiv
We shouldn't penalize instructions that have extra flags.

Drop the poison-generating flags if needed instead of bailing out.
This makes canonicalization/optimization more uniform.

There is a chance that dropping flags will cause some
other transform to not fire, but we added a preliminary
patch to avoid that with:
f0faea5714

See D140665 for more details.
2023-01-04 13:13:02 -05:00
Matt Arsenault
192c0e5a7a IROutliner: Fix assert with non-0 alloca addrspace
The arguments are passed as stored to new allocas so the address space
needs to match.
2023-01-04 11:30:50 -05:00