Commit Graph

5842 Commits

Author SHA1 Message Date
Christian Ulmann
2544d91956 llvm-extract: Replace IFuncs with declarations
This commit ensures that llvm-extract does not copy all IFuncs into the
resulting modules. Before this change, ifuncs were not modified which
could cause the emission unexpected IR files.

Reviewed By: darthscsi

Differential Revision: https://reviews.llvm.org/D152148
2023-06-06 07:18:33 +00:00
Johannes Doerfert
cb17c48fdd [Attributor] Identify and remove no-op fences
The logic and implementation follows the removal of no-op barriers. If
the fence is not making updates visible, either to the world or the
current thread, it is not needed. Said differently, the fences we remove
do not establish synchronization (happens-before) edges.
This allows us to eliminate some of the regression caused by:
  https://reviews.llvm.org/D145290
2023-06-05 17:14:00 -07:00
Johannes Doerfert
8f4fadd1b4 [OpenMP] Use "kernel" attribute consistently 2023-06-05 16:33:53 -07:00
Johannes Doerfert
dbbe9b3776 [Attributor] Create AAMustProgress for the mustprogress attribute
Derive the mustprogress attribute based on the willreturn attribute
or the fact that all callers are mustprogress.

Differential Revision: https://reviews.llvm.org/D94740
2023-06-05 16:33:52 -07:00
Alexandros Lamprineas
b1f41685a6 [IPSCCP] Decouple queries for function analysis results.
The SCCPSolver is using a structure (AnalysisResultsForFn) where it keeps
pointers to various analyses needed by the IPSCCP pass. These analyses are
requested all at the same time, which can become problematic in some cases.
For example one could be retrieved via getCachedAnalysis() prior to the
actual execution of the analysis. In more detail:

The IPSCCP pass uses a DomTreeUpdater to preserve the PostDominatorTree
in case the PostDominatorTreeAnalysis had run before IPSCCP. Starting with
commit 1b1232047e the IPSCCP pass may use BlockFrequencyAnalysis for
some functions in the module. As a result, the PostDominatorTreeAnalysis
may not run until the BlockFrequencyAnalysis has run, since the latter
analysis depends on the former. Currently, we setup the DomTreeUpdater
using getCachedAnalysis to retrieve a PostDominatorTree. This happens
before BlockFrequencyAnalysis has run, therefore the cached analysis can
become invalid by the time we use it.

Differential Revision: https://reviews.llvm.org/D151666
2023-06-01 16:38:04 +01:00
Nikita Popov
96a14f388b Revert "[FuncSpec] Replace LoopInfo with BlockFrequencyInfo"
As reported on https://reviews.llvm.org/D150375#4367861 and
following, this change causes PDT invalidation issues. Revert
it and dependent commits.

This reverts commit 0524534d52.
This reverts commit ced90d1ff6.
This reverts commit 9f992cc935.
This reverts commit 1b1232047e.
2023-05-30 14:49:03 +02:00
Kazu Hirata
65f40cb455 [ModuleInliner] Remove an inapplicable comment
The module inliner has its own logic in deciding the order in which
call sites are inlined, so the comment is inapplicable.
2023-05-27 22:26:37 -07:00
Arthur Eubanks
aceaea6784 [Inliner] Mark inlinings stopped with inlining history as noinline
The inline history makes sure that we don't keep inlining due to mutual devirtualization. But this gets forgotten between inliner invocations.

So mark the inlined calls as noinline so we respect previous inline history decisions.

This overlaps with D121084, but they're not redundant since we may not inline completely through a child SCC, but we still want a cost multiplier when that happens.

See discussions in D145516.

Reviewed By: jmorse

Differential Revision: https://reviews.llvm.org/D150989
2023-05-25 09:55:53 -07:00
Alexandros Lamprineas
0524534d52 [FuncSpec] Enable specialization of literal constants.
To do so we have to tweak the cost model such that specialization
does not trigger excessively.

Differential Revision: https://reviews.llvm.org/D150649
2023-05-25 09:55:46 +01:00
Alexandros Lamprineas
ced90d1ff6 [FuncSpec] Improve the accuracy of the cost model.
Instead of blindly traversing the use-def chain of constant arguments,
compute known constants along the way. Stop as soon as a user cannot
be replaced by a constant. Keep it light-weight by handling some basic
instruction types.

Differential Revision: https://reviews.llvm.org/D150464
2023-05-24 11:40:12 +01:00
Fangrui Song
9f992cc935 [SCCP] Fix -Wunused-lambda-capture 2023-05-22 10:29:04 -07:00
Alexandros Lamprineas
1b1232047e [FuncSpec] Replace LoopInfo with BlockFrequencyInfo.
Using AvgLoopIters on any loop is too imprecise making the cost model
favor users inside loop nests regardless of the actual tripcount.

Differential Revision: https://reviews.llvm.org/D150375
2023-05-22 17:49:52 +01:00
Johannes Doerfert
d037b237de [Attributor] Teach AAMemoryLocation about constant GPU memory
AS(4), when targeting GPUs, is constant. Accesses to constant memory are
(historically) not treated as "memory accesses", hence we should deduce
`memory(none)` for those.
2023-05-18 13:27:43 -07:00
Matt Arsenault
6923a67db8 GlobalOpt: Improve addrspacecast handling
Handle addrspacecast when looking at uses.
2023-05-16 16:32:30 +01:00
Mircea Trofin
901183b596 [IPO] Opt-in local clones for thinlto imports
ThinLTO imports (which appear as `available_externally`) that survive
inlining get deleted. With today's inliner that's reasonable, because
the way the function would be inlined into in other modules would be the
same - because of the bottom-up traversal assumption, and the fact that
the inliner doesn't take into account surrounding context [*]. The
ModuleInliner invalidates the first assumption, and the ML inliner the
second.

This patch adds a way to opt-in a module to keep its variant of an
imported function, even if it survived past inlining.

[*] Almost. Deferred inlining is an exception which can lead to
(empirically) infrequent discrepancies.

Differential Revision: https://reviews.llvm.org/D150148
2023-05-11 15:56:43 -07:00
Hongtao Yu
b7d9322b49 [FS-AFDO] Load pseudo probe profile on MIR
This change enables loading pseudo-probe based profile on MIR. Different from the IR profile loader, callsites are excluded from MIR profile loading since they are not assinged a FS discriminator. Using zero as the discriminator is not accurate and would undo the distribution work done by the IR loader based on pseudo probe distribution factor. We reply on block probes only for FS profile loading.

Some refactoring is done to the IR profile loader so that `getProbeWeight` can be shared by both loaders.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148584
2023-05-10 11:29:37 -07:00
Hongtao Yu
9272d0f079 [PseudoProbe] Clean up dwarf discriminator and avoid duplicating factor.
A pseudo probe is created with dwarf line information shared with its nearest instruction. If the instruction comes with a dwarf discriminator, it will be shared with the probe as well. This can confuse the later FS-AFDO discriminator assignment pass. To fix this, I'm cleaning up the discriminator fields for probes when they are inserted.

I also notice another possibility to change the discriminator field of pseudo probes in the pipeline before the FS discriminator assignment pass. That is the loop unroller, which assigns duplication factor to instruction being vectorized. I'm disabling that for pseudo probe intrinsics specifically, also for callsites with probes.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D148569
2023-05-10 11:26:23 -07:00
Vedant Paranjape
438e1cff7b [PartialInlining] Fix incorrect costing when IR has unreachable BBs
Partial Inlining identifies basic blocks that can be outlined into a
function. It is possible that an unreachable basic block is marked for
outlining. During costing of the outlined region, such unreachable basic
blocks are included as well. However, the CodeExtractor eliminates such
unreachable basic blocks and emits outlined function without them.

Thus, during costing of the outlined function, it is possible that the
cost of the outlined function comes out to be lesser than the cost of
outlined region, which triggers an assert.

Assertion `OutlinedFunctionCost >= Cloner.OutlinedRegionCost && "Outlined
function cost should be no less than the outlined region"' failed.

This patch adds code to eliminate unreachable blocks from the function
body before passing it on to be inlined. It also adds a test that checks
for behaviour of costing in case of unreachable basic blocks.

Discussion: https://discourse.llvm.org/t/incorrect-costing-in-partialinliner-if-ir-has-unreachable-basic-blocks/70163

Reviewed By: fhahn

Differential Revision: https://reviews.llvm.org/D149130
2023-05-09 11:58:05 +00:00
Alexandros Lamprineas
929a8c9f72 [FuncSpec][NFC] Rename cryptic variable to better describe it.
UM -> UniqueSpecs

Brought up on the review of D145379. Committing it seperately for now
since the Cost model improvements need rethink.
2023-05-09 11:55:40 +01:00
Alexandros Lamprineas
93ac2dbefc [FuncSpec][NFC] Add an alias for InstructionCost.
Split from D145379. I'll rethink the Cost model improvements and
commit separately.
2023-05-09 11:37:48 +01:00
Kan Wu
b8d2f7177c [MemProf] Add hot allocation type
Add "Hot" AllocationType (in addition to existing cold, notcold).

Use lifetime access density as metric to identify hot allocations.
Treat hot as notcold for MemProfContextDisambiguation for now
before the disambiguation for "hot" is done.

Reviewed By: tejohnson

Differential Revision: https://reviews.llvm.org/D149932
2023-05-08 10:34:53 -07:00
Teresa Johnson
1768898680 [MemProf] Control availability of hot/cold operator new from LTO link
Adds an LTO option to indicate that whether we are linking with an
allocator that supports hot/cold operator new interfaces. If not,
at the start of the LTO backends any existing memprof hot/cold
attributes are removed from the IR, and we also remove memprof metadata
so that post-LTO inlining doesn't add any new attributes.

This is done via setting a new flag in the module summary index. It is
important to communicate via the index to the LTO backends so that
distributed ThinLTO handles this correctly, as they are invoked by
separate clang processes and the combined index is how we communicate
information from the LTO link. Specifically, for distributed ThinLTO the
LTO related processes look like:
```
   # Thin link:
   $ lld --thinlto-index-only obj1.o ... objN.o -llib ...
   # ThinLTO backends:
   $ clang -x ir obj1.o -fthinlto-index=obj1.o.thinlto.bc -c -O2
   ...
   $ clang -x ir objN.o -fthinlto-index=objN.o.thinlto.bc -c -O2
```

It is during the thin link (lld --thinlto-index-only) that we have
visibility into linker dependences and want to be able to pass the new
option via -Wl,-supports-hot-cold-new. This will be recorded in the
summary indexes created for the distributed backend processes
(*.thinlto.bc) and queried from there, so that we don't need to know
during those individual clang backends what allocation library was
linked. Since in-process ThinLTO and regular LTO also use a combined
index, for consistency we query the flag out of the index in all LTO
backends.

Additionally, when the LTO option is disabled, exit early from the
MemProfContextDisambiguation handling performed during LTO, as this is
unnecessary.

Depends on D149117 and D149192.

Differential Revision: https://reviews.llvm.org/D149215
2023-05-08 08:02:21 -07:00
NAKAMURA Takumi
a29a97d76b Fix a warning in D149117 [-Wunused-but-set-variable] 2023-05-06 11:30:04 +09:00
Teresa Johnson
a28261c711 [MemProf] Create single version of helper function (NFC)
Small clean up to keep a single version of getAllocTypeAttributeString
which was duplicated locally.
2023-05-05 18:31:35 -07:00
Teresa Johnson
cfad2d3a3d [MemProf] Context disambiguation cloning pass [patch 4/4]
Applies ThinLTO cloning decisions made during the thin link and
recorded in the summary index to the IR during the ThinLTO backend.

Depends on D141077.

Differential Revision: https://reviews.llvm.org/D149117
2023-05-05 16:26:32 -07:00
Shoaib Meenai
0e2b4b2dba Revert "[ArgumentPromotion] Bail if any callers are minsize"
This reverts commit 8b8466fd31.

This is causing size regressions with -Oz and FullLTO. Revert while I
come up with a repro.
2023-05-05 14:26:57 -07:00
Teresa Johnson
04f3c5a71e Restore again "[MemProf] Context disambiguation cloning pass [patch 3/4]"
This reverts commit f09807ca9d, restoring
bfe7205975 and follow on fix
e3e6bc6995, now that the nondeterminism
has been addressed by D149924.

Differential Revision: https://reviews.llvm.org/D141077
2023-05-05 13:27:33 -07:00
Teresa Johnson
1a3947df85 [MemProf] Use MapVector to avoid non-determinism
Multiple cases of instability in the cloning behavior occurred due to
iteration of maps indexed by pointers. Fix by changing the maps to
MapVector. This necessitated adding DenseMapInfo specializations for the
structure types used in the keys.

These were found while trying to commit patch 3 of the cloning
(bfe7205975), but the second one turned
out to be in code committed in patch 2, but just exposed by a new test
added with patch 3. Specifically, the iteration in identifyClones().

Added the portion of the new test cases from patch 3 that only relied on
the already committed changes and exposed the issue.

Differential Revision: https://reviews.llvm.org/D149924
2023-05-05 07:06:41 -07:00
Momchil Velikov
de24d08459 [FuncSpec] Fix inconsistent treatment of global variables
There are a few inaccuracies with how FuncSpec handles global
variables.

When specialisation on non-const global variables is disabled (the
default) the pass could nevertheless perform some specializations,
e.g. on a constant GEP expression, or on a SSA variable, for which the
Solver has determined it has the value of a global variable.

When specialisation on non-const global variables is enabled, the pass
would skip non-scalars, e.g. a global array, but this should be
completely inconsequential, a pointer is a pointer.

Reviewed By: SjoerdMeijer

Differential Revision: https://reviews.llvm.org/D149476

Change-Id: Ic73051b2f8602587306760bf2ec552e5860f8d39
2023-05-05 09:56:06 +01:00
Teresa Johnson
f09807ca9d Revert "Restore "[MemProf] Context disambiguation cloning pass [patch 3/4]""
This reverts commit bfe7205975, and follow
on fix e3e6bc6995, due to some remaining
instability exposed by the bot enabling expensive checks:
https://lab.llvm.org/buildbot/#/builders/42/builds/9842
2023-05-04 09:41:48 -07:00
Teresa Johnson
bfe7205975 Restore "[MemProf] Context disambiguation cloning pass [patch 3/4]"
This reverts commit 6fbf022908, restoring
commit bf6ff4fd4b with a fix for a bot
failure due to a previously unstable iteration order.

Differential Revision: https://reviews.llvm.org/D141077
2023-05-04 06:31:44 -07:00
Teresa Johnson
6fbf022908 Revert "[MemProf] Context disambiguation cloning pass [patch 3/4]"
This reverts commit bf6ff4fd4b.

There is a bot failure where we are getting the correct remarks output
but in a different order. I'll need to investigate to see where we are
having nondeterministic behavior.
2023-05-03 14:08:54 -07:00
Teresa Johnson
bf6ff4fd4b [MemProf] Context disambiguation cloning pass [patch 3/4]
Applies cloning decisions to the IR, cloning functions and updating
calls. For Regular LTO, the IR is updated directly during function
assignment, whereas for ThinLTO it is recorded in the summary index
(a subsequent patch will apply to the IR via the index during the
ThinLTO backend.

The function assignment and cloning proceeds greedily, and we create new
clones as needed when we find an incompatible assignment of function
clones to callsite clones (i.e. when different callers need to invoke
different combinations of callsite clones).

Depends on D140949.

Differential Revision: https://reviews.llvm.org/D141077
2023-05-03 13:34:00 -07:00
Arthur Eubanks
8b8466fd31 [ArgumentPromotion] Bail if any callers are minsize
Argument promotion mostly works on functions with more than one caller (otherwise the function would be inlined or is dead), so there's a good chance that performing this increases code size since we introduce loads at every call site. If any caller is marked minsize, bail.

We could compare the number of loads/stores removed from the function with the number of loads introduced in callers, but that's TODO.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D149768
2023-05-03 11:29:15 -07:00
Teresa Johnson
48f18ecd82 [ThinLTO] Loosen up variable importing correctness checks
After importing variables, we do some checking to ensure that variables
marked read or write only, which have been marked exported (e.g.
because a referencing function has been exported), are on at least one
module's imports list. This is because the read or write only variables
will be internalized, so we need a copy any any module that references
it.

This checking is overly conservative in the case of linkonce_odr or
other linkage types where there can already be a duplicate copy in
existence in the importing module, which therefore wouldn't need to
import it. Loosen up the checking for these linkage types.

Fixes https://github.com/llvm/llvm-project/issues/62468.

Differential Revision: https://reviews.llvm.org/D149630
2023-05-02 07:49:03 -07:00
Matt Arsenault
b52db60cbb GlobalOpt: Drop code to handle typed pointers
Fixes assert with pointers with different address spaces. We
could keep looking through addrspacecast, but it would require
checking for null handling of the access address space.

Fixes #62384
2023-04-29 09:48:21 -04:00
wlei
ba3cbc7aad fix a use-after-free failure 2023-04-28 15:55:51 -07:00
wlei
892daede72 [SamplePGO] Stale profile matching(part 2)
Part 2 of https://reviews.llvm.org/D147456
Use callee name on IR as an anchor to match the call target/inlinee name in the profile. The advantages of this in particular:
- Different from the traditional way of encoding hash signatures to every block that would affect binary/profile size and build speed, it doesn't require any additional information for this, all the data is already in the IR and profiles.
- Effective for current nested profile layout in which once a callsite is mismatched all the inlinee's profiles are dropped.
**The input of the algorithm:**
- IR locations: the anchor is the callee name of direct callsite.
- Profile locations: the anchor is the call target name for `BodySample`s or inlinee's profile name for `CallsiteSamples`.
The two lists are populated by parsing the IR and profile and both can be generalized as a sequence of locations with an optional anchor.
For example: say location `1.2(foo)` refers to a callsite at `1.2` with callee name `foo` and `1.3` refers to a non-directcall location `1.3`.
```
// The current build source code:
   int main() {
1.     ...
2.     foo();
3.     ...
4      ...
5.     ...
6.     bar();
7.     ...
   }
```
IR locations are populated and simplified as: `[1, 2(foo), 3, 5, 6(bar), 7]`.
```
; The "stale" profile:
main:350:1
 1: 1
 2: 3
 3: 100 foo:100
 4: 2
 7: 2
 8: 200 bar:200
 9: 30
```
Profile locations are populated and simplified as `[1, 2, 3(foo), 4, 7, 8(bar), 9]`
**Matching heuristic:**
- Match all the anchors in lexical order first.
- Match non-anchors evenly between two anchors: Split the non-anchor range, the first half is matched based on the start anchor, the second half is matched based on the end anchor.
So the example above is matched like:
```
   [1,    2(foo), 3,  5,  6(bar), 7]
    |     |       |   |     |     |
   [1, 2, 3(foo), 4,  7,  8(bar), 9]
```
3 -> 4 matching is based on anchor `foo`, 5 -> 7 matching is based on anchor `bar`.
The output mapping of matching is [2->3, 3->4, 5->7, 6->8, 7->9].

For the implementation, the anchors are saved in a map for fast look-up. The result mapping is saved into `IRToProfileLocationMap`(see https://reviews.llvm.org/D147456) and distributed to all FunctionSamples(`distributeIRToProfileLocationMap`)

**Clang-self build benchmark: **
Current build version: clang-10
The profiled version:  clang-9
Results compared to a refresh profile(collected profile on clang-10) and to be fair, we invalidated new functions' profiles(both refresh and stale profile use the same profile list).
1) Regression to using refresh profile with this off : -3.93%
2) Regression to using refresh profile with this on  : -1.1%
So this algorithm can recover ~72% of the regression.
**Internal(Meta) large-scale services.**
we saw one real instance of a 3 week stale profile., it delivered a ~1.8% win.

**Notes or future work:**
- Classic AutoFDO support: the current version only supports pseudo-probe, but I believe it's not hard to extend to classic line-number based AutoFDO since pseudo-probe and line-number are shared the LineLocation structure.
- The fuzzy matching is an open-ended area and there could be more heuristics to try out, but since the current version already recovers a reasonable percentage of regression(with some pseudo probe order change, it can recover close to 90%), I'm submitting the patch for review and we will try more heuristics in future.
- Profile call target name are only available when the call is hit by samples, the missing anchor might mislead the matching, this can be mitigated in llvm-profgen to generate the call target for the zero samples.
- This doesn't handle function name mismatch, we plan to solve it in future.

Reviewed By: hoy, wenlei

Differential Revision: https://reviews.llvm.org/D147545
2023-04-28 13:07:32 -07:00
wlei
a98d6a11ea [SamplePGO] Stale profile matching(part 1)
AutoFDO/CSSPGO often has to deal with stale profiles collected on binaries built from several revisions behind release. It’s likely to get incorrect profile annotations using the stale profile, which results in unstable or low performing binaries. Currently for source location based profile, once a code change causes a profile mismatch, all the locations afterward are mismatched, the affected samples or inlining info are lost. If we can provide a matching framework to reuse parts of the mismatched profile - aka incremental PGO, it will make PGO more stable, also increase the optimization coverage and boost the performance of binary.

This patch is the part 1 of stale profile matching, summary of the implementation:
 - Added a structure for the matching result:`LocToLocMap`, which is a location to location map meaning the location of current build is matched to the location of the previous build(to be used to query the “stale” profile).
 - In order to use the matching results for sample query, we need to pass them to all the location queries. For code cleanliness, we added a new pointer field(`IRToProfileLocationMap`) to `FunctionSamples`.
 - Added a wrapper(`mapIRLocToProfileLoc`) for the query to the location, the location from input IR will be remapped to the matched profile location.
 - Added a new switch `--salvage-stale-profile`.
 - Some refactoring for the staleness detection.

Test case is in part 2 with the matching algorithm.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D147456
2023-04-28 13:07:32 -07:00
Mircea Trofin
460ea85014 [nfc][thinlto] Handle global constant importing separately
This makes the logic for referenced globals reusable for import criteria
that don't use thresholds - in fact, we currently didn't consider any
thresholds when importing.

Differential Revision: https://reviews.llvm.org/D149298
2023-04-27 12:21:50 -07:00
DianQK
533b7c1f6c [GlobalOpt] Don't replace the aliasee if it has other references.
As long as aliasee has `@llvm.used` or `@llvm.compiler.used` references, we cannot do the related replace or delete operations. Even if it is a Local Linkage, we cannot infer if there is no other use for it, such as asm or other future added cases.

Reviewed By: nikic

Differential Revision: https://reviews.llvm.org/D145293
2023-04-27 09:53:47 +08:00
Kazu Hirata
1ca0cb717a [llvm] Replace None with std::nullopt in comments (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-04-25 23:53:32 -07:00
Arthur Eubanks
dcfdb963d4 [ThinLTOBitcodeWriter] Properly report when module is changed
Happens with split LTO units.

Detected with upcoming changes.
2023-04-25 15:24:01 -07:00
Arthur Eubanks
5ecdc8147b [SCCP] Properly report modifications when deleting globals
Detected by an upcoming change.
2023-04-25 15:08:11 -07:00
Nawal Copty
6e54a57c61 Preserve the address space for llvm.used and llvm.compiler.used global variables in GlobalOpt pass.
The llvm.used (or llvm.compiler.used) global variable is an array that contains a list of pointers to global variables and functions.

The GlobalOpt (Global Variable Optimizer) pass is not preserving the address space for llvm.used and llvm.compiler.used global variables.This patch updates the setUsedInitializer() function in GlobalOpt.cpp, so the address space is preserved.

Reviewed By: aeubanks

Differential Revision: https://reviews.llvm.org/D144518
2023-04-25 13:07:01 -07:00
Alexandros Lamprineas
54e5fb789c [FuncSpec] Track the return values of specializations.
To track the return values of specializations, we need to invalidate all
the lattice values across the use-def chain which originates from the
callsites, recompute and propagate.

Differential Revision: https://reviews.llvm.org/D146158
2023-04-24 13:18:49 +01:00
Teresa Johnson
969268831b [MemProf] Don't use constexpr via lambda capture due to MSVC issues (NFC)
Modifies a104e27030 slightly to switch a
constexpr used via a lambda capture to a const, due to issues with MSVC.
See https://reviews.llvm.org/D140949#inline-1438809 for context.
2023-04-23 15:03:49 -07:00
Shilei Tian
d4ecd1241c Revert "[OpenMP] Introduce kernel environment"
This reverts commit 35cfadfbe2.

It makes a couple of buildbots unhappy because of the following test failures:
- `Transforms/OpenMP/add_attributes.ll'`
- `mapping/declare_mapper_target_data.cpp` on AMDGPU
2023-04-22 20:56:35 -04:00
Shilei Tian
35cfadfbe2 [OpenMP] Introduce kernel environment
This patch introduces per kernel environment. Previously, flags such as execution mode are set through global variables with name like `__kernel_name_exec_mode`. They are accessible on the host by reading the corresponding global variable, but not from the device. Besides, some assumptions, such as no nested parallelism, are not per kernel basis, preventing us applying per kernel optimization in the device runtime.

This is a combination and refinement of patch series D116908, D116909, and D116910.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D142569
2023-04-22 20:46:38 -04:00
Teresa Johnson
a104e27030 Restore "[MemProf] Context disambiguation cloning pass [patch 2/3]"
This restores d0649a6ad8 (reverted in
commit 03bf59d275), with fixes for bot
failures. Confirmed that gcc, which reproduced both failures, now
builds it fine.

Differential Revision: https://reviews.llvm.org/D140949
2023-04-21 19:38:46 -07:00