Commit Graph

7098 Commits

Author SHA1 Message Date
Youngsuk Kim
eed067e9fb [llvm] Remove no-op ptr-to-ptr bitcasts (NFC)
Opaque ptr cleanup effort (NFC).
2023-11-13 14:33:41 -06:00
Youngsuk Kim
876236023c [llvm] Remove no-op ptr-to-ptr bitcasts (NFC) (#72133)
Opaque ptr cleanup effort (NFC).
2023-11-13 13:05:27 -05:00
Valery Pykhtin
f054947c0d [SimplifyCFG] Prevent merging cbranch to cbranch if the branch probability from the first to second is too low. (#69375)
AMDGPU target has faced the situation which can be illustrated with the
following testcase:

define void @dont_merge_cbranches(i32 %V) {
  %divergent_cond = icmp ne i32 %V, 0
  %uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
  br i1 %uniform_cond, label %bb2, label %exit, !prof !0
bb2:
  br i1 %divergent_cond, label %bb3, label %exit
bb3:
  call void @bar( )
  br label %exit
exit:
  ret void
}
!0 = !{!"branch_weights", i32 1, i32 100000}

SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive. The merged branch becomes as expensive as the second.

This patch prevents such merging if the branch to the second branch is unlikely to happen.
2023-11-13 15:37:55 +01:00
Kazu Hirata
22b0f7ba6e [Transforms] Include llvm/ADT/SmallSet.h (NFC)
This patch adds #include "llvm/ADT/SmallSet.h" to a couple of files
that are relying on transitive includes of SmallSet.h.  It in turn
unblocks the removal of unnecessary includes of llvm/ADT/SmallSet.h in
several other files.
2023-11-11 12:25:39 -08:00
Nikita Popov
192e7d3d52 [IRBuilder] Add IsNonNeg param to CreateZExt() (NFC) 2023-11-10 12:00:34 +01:00
Chuanqi Xu
b7b5907b56 [Coroutines] Introduce [[clang::coro_only_destroy_when_complete]] (#71014)
Close https://github.com/llvm/llvm-project/issues/56980.

This patch tries to introduce a light-weight optimization attribute for
coroutines which are guaranteed to only be destroyed after it reached
the final suspend.

The rationale behind the patch is simple. See the example:

```C++
A foo() {
  dtor d;
  co_await something();
  dtor d1;
  co_await something();
  dtor d2;
  co_return 43;
}
```

Generally the generated .destroy function may be:

```C++
void foo.destroy(foo.Frame *frame) {
  switch(frame->suspend_index()) {
    case 1:
      frame->d.~dtor();
      break;
    case 2:
      frame->d.~dtor();
      frame->d1.~dtor();
      break;
    case 3:
      frame->d.~dtor();
      frame->d1.~dtor();
      frame->d2.~dtor();
      break;
    default: // coroutine completed or haven't started
      break;
  }

  frame->promise.~promise_type();
  delete frame;
}
```

Since the compiler need to be ready for all the cases that the coroutine
may be destroyed in a valid state.

However, from the user's perspective, we can understand that certain
coroutine types may only be destroyed after it reached to the final
suspend point. And we need a method to teach the compiler about this.
Then this is the patch. After the compiler recognized that the
coroutines can only be destroyed after complete, it can optimize the
above example to:

```C++
void foo.destroy(foo.Frame *frame) {
  frame->promise.~promise_type();
  delete frame;
}
```

I spent a lot of time experimenting and experiencing this in the
downstream. The numbers are really good. In a real-world coroutine-heavy
workload, the size of the build dir (including .o files) reduces 14%.
And the size of final libraries (excluding the .o files) reduces 8% in
Debug mode and 1% in Release mode.
2023-11-09 14:42:07 +08:00
Allen
7ec86f4d68 [SimplifyCFG] Fix the compile crash for invalid upper bound value (#71351)
Fix the crash for the last land PR70542.

Note:
For '%add = add nuw i32 %x, 1', we can only infer the LowerBound is 1,
but the UpperBound is wrapped to 0 in computeConstantRange.
so we can't assume the UpperBound is valid bound when its value is 0.

Fix https://github.com/llvm/llvm-project/issues/71329.
Reviewed By: zmodem, nikic
2023-11-09 12:33:24 +08:00
Jeremy Morse
f1b0a54451 Reapply 7d77bbef4a, adding new debug-info classes
This reverts commit 957efa4ce4.

Original commit message below -- in this follow up, I've shifted
un-necessary inclusions of DebugProgramInstruction.h into being forward
declarations (fixes clang-compile time I hope), and a memory leak in the
DebugInfoTest.cpp IR unittests.

I also tracked a compile-time regression in D154080, more explanation
there, but the result of which is hiding some of the changes behind the
EXPERIMENTAL_DEBUGINFO_ITERATORS compile-time flag. This is tested by the
"new-debug-iterators" buildbot.

[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info

This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-08 16:42:35 +00:00
Markos Horro
9d2903c8e5 [IndVars] Add check of loop invariant for trunc instructions (#71072)
The same idea as in 34d380e1f6, but considering
truncation instructions.
Improvement for #59633.
2023-11-08 11:16:23 +00:00
Vladislav Dzhidzhoev
6beddd668a Revert "[DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)"
This caused assert:
llvm/llvm/lib/CodeGen/AsmPrinter/DwarfFile.cpp:110:
void llvm::DwarfFile::addScopeVariable(LexicalScope *, DbgVariable *):
Assertion `Ret.second' failed.

See comments https://reviews.llvm.org/D144006#4656350.

This reverts commit 3b449bd46a.
2023-11-08 00:29:24 +01:00
Paulo Matos
7b9d73c2f9 [NFC] Remove Type::getInt8PtrTy (#71029)
Replace this with PointerType::getUnqual().
Followup to the opaque pointer transition. Fixes an in-code TODO item.
2023-11-07 17:26:26 +01:00
Philip Reames
551c280cfd [indvars] Always fallback to truncation if AddRec widening fails (#70967)
The current code structure results in cases where if a) we can't clone
the IV user (because it's not in our whitelist) or b) can't prove the
SCEV expressions are identical, we'd sometimes leave both the original
unwiddened IV and the partially widdened IV in code. Instead, just
truncate thw wide IV to the use - same as what we'd do if we couldn't
find an addrec to start with.

Noticed this while playing with changing how we produce addrecs. The
current structure results in a very tight interlock between SCEVs
internal capabilities and indvars code.
2023-11-07 07:49:39 -08:00
Hans Wennborg
05ed92127c Revert "Reland [SimplifyCFG] Delete the unnecessary range check for small mask operation (#70542)"
This caused https://github.com/llvm/llvm-project/issues/71329

> Fix the compile crash when the default result has no result  for
> https://github.com/llvm/llvm-project/pull/65835
>
> Fixes https://github.com/llvm/llvm-project/issues/65120
> Reviewed By: zmodem, nikic

This reverts commit 7c4180a36a.
2023-11-07 10:53:22 +01:00
Simon Pilgrim
3ca4fe80d4 [Transforms] Use StringRef::starts_with/ends_with instead of startswith/endswith. NFC.
startswith/endswith wrap starts_with/ends_with and will eventually go away (to more closely match string_view)
2023-11-06 16:50:18 +00:00
Nikita Popov
be3cef0b2a [LibCallsShrinkWrap] Avoid use of ConstantExpr::getFPExtend() (NFC)
Use the constant folding API instead.
2023-11-06 15:38:42 +01:00
Nikita Popov
a682a9cfd0 Revert "Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)"
This reverts commit 19b5495b65.

PR landed without approval, with severe quality issues.
2023-11-03 21:15:46 +01:00
Philip Reames
5adf6ab7ff Revert "[IndVars] Generate zext nneg when locally obvious"
This reverts commit a6c8e27b3a.  It appears likely to have caused https://lab.llvm.org/buildbot/#/builders/57/builds/30988.
2023-11-03 11:19:14 -07:00
Manman Ren
19b5495b65 Port Swift's merge function pass to llvm: merging functions that differ in constants (#68235)
See RFC for details:
https://discourse.llvm.org/t/rfc-for-moving-swift-s-merge-function-pass-to-llvm/73778

We will need to refactor extension to FunctionComparator/FunctionHash to
StructuralHash. This patch adds a new pass which is ported from Swift,
and will need to discuss on how to migrate Swift’s pass over after we
land this in llvm.

Create this PR to get some early review on the patch.

---------

Co-authored-by: Manman Ren <mren@meta.com>
2023-11-03 11:13:58 -07:00
Philip Reames
7c93452e17 [indvars] Restructure getExtendedOperandRecurrence [nfc]
As suggested during review of https://github.com/llvm/llvm-project/pull/70990.
2023-11-03 10:50:57 -07:00
Philip Reames
1ffea97ffd [indvars] Support known positive extends in getExtendedOperandRecurrence (#70990)
IndVars has the existing notion of a narrow definition which is known to
positive and thus both sign and zero extension kinds are actually the
same operations. There's existing logic for forming a SCEV based on the
extension kind and the no-wrap flags. This change extends that logic to
form the opposite extension kind for a positive def if doing so is
allowed by the flags. Note that we already do something analogous for
the getWideRecurrence case as well.
2023-11-03 10:21:30 -07:00
Philip Reames
a6c8e27b3a [IndVars] Generate zext nneg when locally obvious
zext nneg was recently added to the IR in #67982.  This patch teaches
SimplifyIndVars to prefer zext nneg over *both* sext and plain zext,
when a local SCEV query indicates the source is non-negative.

The choice to prefer zext nneg over sext looks slightly aggressive
here, but probably isn't so much in practice.  For cases where we'd
"remember" the range fact, instcombine would convert the sext into
a zext nneg anyways.  The only cases where this produces a different
result overall are when SCEV knows a non-local fact, and it doesn't
get materialized into the IR.  Those are exactly the cases where
using zext nneg are most useful.  We do run the risk of e.g. a
missing combine - since we haven't updated most of them yet - but
that seems like a manageable risk.

Note that there are much deeper algorithmic changes we could make
to this code to exploit zext nneg, but this seemed like a reasonable
and low risk starting point.
2023-11-03 09:20:59 -07:00
Allen
7c4180a36a Reland [SimplifyCFG] Delete the unnecessary range check for small mask operation (#70542)
Fix the compile crash when the default result has no result  for
https://github.com/llvm/llvm-project/pull/65835

Fixes https://github.com/llvm/llvm-project/issues/65120
Reviewed By: zmodem, nikic
2023-11-03 09:12:29 +08:00
spupyrev
cebc837937 [CodeLayout] Pre-process execution counts before layout (#70501)
BOLT fails to process binaries in non-LBR mode, as some blocks marked as
having
a zero execution count. Adjusting code layout to process such blocks
without
assertions. This is NFC for all other use cases.
2023-11-02 12:08:33 -07:00
Jeremy Morse
957efa4ce4 Revert "[DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info"
And some intervening fixups. There are two remaining problems:
 * A memory leak via https://lab.llvm.org/buildbot/#/builders/236/builds/7120/steps/10/logs/stdio
 * A performance slowdown with -g where I'm not completely sure what the cause it

These might be fairly straightforwards to fix, but it's the end of the day
hear, so I figure I'll clear the buildbots til tomorrow.

This reverts commit 7d77bbef4a.
This reverts commit 9026f35afe.
This reverts commit d97b2b389a.
2023-11-02 17:41:36 +00:00
Vladislav Dzhidzhoev
3b449bd46a [DebugMetadata][DwarfDebug] Support function-local types in lexical block scopes (4/7)
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544

Similar to imported declarations, the patch tracks function-local types in
DISubprogram's 'retainedNodes' field. DwarfDebug is adjusted in accordance with
the aforementioned metadata change and provided a support of function-local
types scoped within a lexical block.

The patch assumes that DICompileUnit's 'enums field' no longer tracks local
types and DwarfDebug would assert if any locally-scoped types get placed there.

Reviewed By: jmmartinez
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144006
2023-11-02 17:44:52 +01:00
Jeremy Morse
7d77bbef4a [DebugInfo][RemoveDIs] Add prototype storage classes for "new" debug-info
This patch adds a variety of classes needed to record variable location
debug-info without using the existing intrinsic approach, see the rationale
at [0].

The two added files and corresponding unit tests are the majority of the
plumbing required for this, but at this point isn't accessible from the
rest of LLVM as we need to stage it into the repo gently. An overview is
that classes are added for recording variable information attached to Real
(TM) instructions, in the form of DPValues and DPMarker objects. The
metadata-uses of DPValues is plumbed into the metadata hierachy, and a
field added to class Instruction, which are all stimulated in the unit
tests. The next few patches in this series add utilities to convert to/from
this new debug-info format and add instruction/block utilities to have
debug-info automatically updated in the background when various operations
occur.

This patch was reviewed in Phab in D153990 and D154080, I've squashed them
together into this commit as there are dependencies between the two
patches, and there's little profit in landing them separately.

[0] https://discourse.llvm.org/t/rfc-instruction-api-changes-needed-to-eliminate-debug-intrinsics-from-ir/68939
2023-11-02 12:44:53 +00:00
David Sherwood
07f0e75b53 [LoopVectorize] Fix bug with code to hoist runtime checks (#70937)
There was a silly mistake in the expandBounds function that was using
the wrong type when calling expandCodeFor and always assuming the stride
is 64 bits. I've added the following test to defend this fix:

Transforms/LoopVectorize/ARM/mve-hoist-runtime-checks.ll
2023-11-02 10:02:50 +00:00
Jacob Lambert
2b898afdef [llvm] Add comment and assert for CloneModule edge case (#67734)
CloneModule is not currently designed to handle un-materialized Modules,
for example one created via a lazy initializer like
getLazyBitcodeModule(). In this case we get a somewhat cryptic
segmentation fault without a clear path forward.

In this patch, we add a comment to inform CloneModule users of this
shortcoming, and an assert to test for empty function bodies before the
segmentation fault is triggered.
2023-11-01 10:05:11 -07:00
Nikita Popov
b87110e298 [SimplifyCFG] Avoid use of ConstantExpr::getIntegerCast() (NFC)
We're working on a ConstantInt here, so constant folding will
always succeed. Just avoid using the ConstantExpr API.
2023-11-01 11:55:11 +01:00
Nikita Popov
6b8ed78719 [IR] Add writable attribute
This adds a writable attribute, which in conjunction with
dereferenceable(N) states that a spurious store of N bytes is
introduced on function entry. This implies that this many bytes
are writable without trapping or introducing data races. See
https://llvm.org/docs/Atomics.html#optimization-outside-atomic for
why the second point is important.

This attribute can be added to sret arguments. I believe Rust will
also be able to use it for by-value (moved) arguments. Rust likely
won't be able to use it for &mut arguments (tree borrows does not
appear to allow spurious stores).

In this patch the new attribute is only used by LICM scalar promotion.
However, the actual motivation for this is to fix a correctness issue
in call slot optimization, which needs this attribute to avoid
optimization regressions.

Followup to the discussion on D157499.

Differential Revision: https://reviews.llvm.org/D158081
2023-11-01 10:46:31 +01:00
Philip Reames
a78f5c0649 [IndVars] Use IRBuilder in eliminateTrunc [nfc-ish] (#70836)
Mostly a cleanup so that we don't need to manually emit instructions,
and can eagerly constant fold where relevant.
2023-10-31 14:37:57 -07:00
Aleksandr Popov
483e92468e [NFC] Extract LoopConstrainer from IRCE to reuse it outside the pass (#70508)
Co-authored-by: Aleksandr Popov <apopov@azul.com>
2023-10-31 18:16:59 +01:00
Philip Reames
f8742b8d6a [SCEV] Teach SCEVExpander to use zext nneg when possible (#70815)
zext nneg was recently added to the IR in #67982. Teaching SCEVExpander
to emit nneg when possible is valuable since SCEV may have proved
non-trivial facts about loop bounds which would otherwise be lost when
materializing the value.
2023-10-31 09:33:07 -07:00
Aleksandr Popov
e8d5db206c [LoopPeeling] Fix weights updating of peeled off branches (#70094)
In https://reviews.llvm.org/D64235 a new algorithm has been introduced
for updating the branch weights of latch blocks and their copies.

It increases the probability of going to the exit block for each next
peel iteration, calculating weights by (F - I * E, E), where:
- F is a weight of the edge from latch to header.
- E is a weight of the edge from latch to exit.
- I is a number of peeling iteration.

E.g: Let's say the latch branch weights are (100,300) and the estimated
trip count is 4. If we peel off all 4 iterations the weights of the
copied branches will be:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,1)

https://godbolt.org/z/93KnoEsT6

So we make the original loop almost unreachable from the 3rd peeled copy
according to the profile data. But that's only true if the profiling
data is accurate.
Underestimated trip count can lead to a performance issues with the
register allocator, which may decide to spill intervals inside the loop
assuming it's unreachable.

Since we don't know how accurate the profiling data is, it seems better
to set neutral 1/1 weights on the last peeled latch branch. After this
change, the weights in the example above will look like this:
0: (100,300)
1: (100,200)
2: (100,100)
3: (100,100)

Co-authored-by: Aleksandr Popov <apopov@azul.com>
2023-10-31 14:02:42 +01:00
Craig Topper
b1c59b516c [SCCP] Infer nneg on zext when forming from non-negative sext. (#70730)
Builds on #67982 which recently introduced the nneg flag on a zext
instruction.
2023-10-30 15:07:22 -07:00
XChy
fc6bdb8549 [SimplifyCFG] Reland transform for redirecting phis between unmergeable BB and SuccBB (#68473)
Reland #67275 with #68953 resolved.
2023-10-28 17:10:20 +08:00
Fangrui Song
8e247b8f47 Replace TypeSize::{getFixed,getScalable} with canonical TypeSize::{Fixed,Scalable}. NFC 2023-10-27 00:30:41 -07:00
spupyrev
f61179f812 [CodeLayout] Changed option names cds to cdsort (#69668)
Renaming cds-> cdsort for consistency. This is NFC unless somebody uses
older names
2023-10-26 18:10:30 -07:00
Allen
851338b126 Revert "[SimplifyCFG] Delete the unnecessary range check for small mask operation (#70324)
This reverts commit 5e07481d42.
2023-10-26 20:39:24 +08:00
zhongyunde 00443407
5e07481d42 [SimplifyCFG] Delete the unnecessary range check for small mask operation
When the small mask value little than 64, we can eliminate the checking
for upper limit of the range by enlarge the lookup table size to the maximum
index value. (Then the final table size grows to the next pow2 value)
```
bool f(unsigned x) {
    switch (x % 8) {
        case 0: return 1;
        case 1: return 0;
        case 2: return 0;
        case 3: return 1;
        case 4: return 1;
        case 5: return 0;
        case 6: return 1;

        // This would remove the range check: case 7: return 0;
    }
    return 0;
}
```
Use WouldFitInRegister instead of fitsInLegalInteger to support
more result type beside bool.

Fixes https://github.com/llvm/llvm-project/issues/65120
Reviewed By: zmodem, nikic, RKSimon
2023-10-26 19:01:22 +08:00
Youngsuk Kim
4c60c0cb4e [LowerMemIntrinsics] Remove no-op ptr-to-ptr bitcasts (NFC)
Remove ptr-to-ptr bitcasts, which are unnecessary with opaque pointers
enabled.

Opaque pointer clean-up effort. NFC.
2023-10-25 16:23:58 -05:00
Alina Sbirlea
d0584e248d [CodeLayout] Update to resolve Wdangling warning.
Change cc2fbc648d introduced -Wdangling
warning, use temporaries to resolve.

llvm/lib/Transforms/Utils/CodeLayout.cpp:764:27: error: temporary whose address is used as value of local variable '[minDensity, maxDensity]' will be destroyed at the end of the full-expression [-Werror,-Wdangling]
  764 |               std::minmax(ChainPred->density(), ChainSucc->density());

llvm/lib/Transforms/Utils/CodeLayout.cpp:764:49: error: temporary whose address is used as value of local variable '[minDensity, maxDensity]' will be destroyed at the end of the full-expression [-Werror,-Wdangling]
  764 |               std::minmax(ChainPred->density(), ChainSucc->density());
2023-10-25 11:31:48 -07:00
spupyrev
cc2fbc648d [CodeLayout] Faster basic block reordering, ext-tsp (#68617)
Aggressive inlining might produce huge functions with >10K of basic 
blocks. Since BFI treats _all_ blocks and jumps as "hot" having 
non-negative (but perhaps small) weight, the current implementation can
be slow, taking minutes to produce an layout. This change introduces a
few modifications that significantly (up to 50x on some instances) 
speeds up the computation. Some notable changes:
- reduced the maximum chain size to 512 (from the prior 4096);
- introduced MaxMergeDensityRatio param to avoid merging chains with
very different densities;
- dropped a couple of params that seem unnecessary.

Looking at some "offline" metrics (e.g., the number of created 
fall-throughs), there shouldn't be problems; in fact, I do see some
metrics go up. But it might be hard/impossible to measure perf 
difference for such small changes. I did test the performance clang-14 
binary and do not record a perf or i-cache-related differences.

My 5 benchmarks, with ext-tsp runtime (the lower the better) and 
"tsp-score" (the higher the better).
**Before**:

- benchmark 1:
  num functions: 13,047
  reordering running time is 2.4 seconds
  score: 125503458 (128.3102%)
- benchmark 2:
  num functions: 16,438
  reordering running time is 3.4 seconds
  score: 12613997277 (129.7495%)
- benchmark 3:
  num functions: 12,359
  reordering running time is 1.9 seconds
  score: 1315881613 (105.8991%)
- benchmark 4:
  num functions: 96,588
  reordering running time is 7.3 seconds
  score: 89513906284 (100.3413%)
- benchmark 5:
  num functions: 1
  reordering running time is 372 seconds
  score: 21292505965077 (99.9979%)
- benchmark 6:
  num functions:  71,155
  reordering running time is 314 seconds
  score: 29795381626270671437824 (102.7519%)

**After**:
- benchmark 1:
  reordering running time is 2.2 seconds
  score: 125510418 (128.3130%)

- benchmark 2:
  reordering running time is 2.6 seconds
  score: 12614502162 (129.7525%)

- benchmark 3:
  reordering running time is 1.6 seconds
  score: 1315938168 (105.9024%)

- benchmark 4:
  reordering running time is 4.9 seconds
  score: 89518095837 (100.3454%)

- benchmark 5:
  reordering running time is 4.8 seconds
  score: 21292295939119 (99.9971%)

- benchmark 6:
  reordering running time is 104 seconds
  score: 29796710925310302879744 (102.7565%)
2023-10-25 07:52:26 -07:00
Kazu Hirata
f9306f6de3 [ADT] Rename llvm::erase_value to llvm::erase (NFC) (#70156)
C++20 comes with std::erase to erase a value from std::vector.  This
patch renames llvm::erase_value to llvm::erase for consistency with
C++20.

We could make llvm::erase more similar to std::erase by having it
return the number of elements removed, but I'm not doing that for now
because nobody seems to care about that in our code base.

Since there are only 50 occurrences of erase_value in our code base,
this patch replaces all of them with llvm::erase and deprecates
llvm::erase_value.
2023-10-24 23:03:13 -07:00
Ruiling, Song
ac24238002 [LowerSwitch] Don't let pass manager handle the dependency (#68662)
Some passes has limitation that only support simple terminators:
branch/unreachable/return. Right now, they ask the pass manager to add
LowerSwitch pass to eliminate `switch`. Let's manage such kind of pass
dependency by ourselves. Also add the assertion in the related passes.
2023-10-25 09:24:36 +08:00
Benjamin Kramer
eb67b34740 [IPSCCP] Don't crash on ptrtoint 2023-10-24 14:14:39 +02:00
Carlos Alberto Enciso
f3b20cb16a [IPSCCP] Variable not visible at Og. (#66745)
https://bugs.llvm.org/show_bug.cgi?id=51559
https://github.com/llvm/llvm-project/issues/50901

IPSCCP pass removes the global variable and does not create a constant
expression for the initializer value.
2023-10-24 06:22:18 +01:00
Sam Clegg
e01c7d54b4 [LowerGlobalDtors] Skip __cxa_atexit call completely when arg0 is unused (#68758)
In emscripten we have a build mode (the default actually) where the
runtime never exits and therefore `__cxa_atexit` is a dummy/stub
function that does nothing. In this case we would like to be able
completely DCE any otherwise-unused global dtor functions.

Fixes: https://github.com/emscripten-core/emscripten/issues/19993
2023-10-23 10:08:08 -07:00
Fangrui Song
a24418375a [CodeLayout] cache-directed sort: limit max chain size (#69039)
When linking an executable with a slightly larger executable,
ld.lld --call-graph-profile-sort=cdsort can be very slow (see #68638).
```
   4.6%  20.7Mi    .text.hot
   3.5%  15.9Mi    .text
   3.4%  15.2Mi    .text.unknown
```

Add cl option `cdsort-max-chain-size`, which is similar to
`ext-tsp-max-chain-size`, and set it to 128, to improve performance.

In `ld.lld @response.txt --threads=4 --call-graph-profile-sort=cdsort
--time-trace"
builds, the "Total Sort sections" time is measured as follows:

* -mllvm  -cdsort-max-chain-size=64: 1.321813
* -mllvm -cdsort-max-chain-size=128: 2.030425
* -mllvm -cdsort-max-chain-size=256: 2.927684
* -mllvm -cdsort-max-chain-size=512: 5.493106
* unlimited: 9 minutes

The rest part takes 6.8s.
2023-10-22 16:50:03 -07:00
Kazu Hirata
9c5a5a421d [llvm] Stop including llvm/ADT/iterator_range.h (NFC)
Identified with misc-include-cleaner.
2023-10-22 15:41:18 -07:00