Commit Graph

4980 Commits

Author SHA1 Message Date
Philip Reames
9ce30fe86f Extract utility function for checking initial value of allocation [NFC]
This is a reoccuring pattern, we can consolidate three copies into one.  The main motivation is to reduce usages of isMallocLike.
2022-01-06 18:02:14 -08:00
Philip Reames
7052670e96 Move getMallocAllocatedType and getMallocArraySize to GlobalOpt [NFC]
These are implementation details of the global-opt transform and not easily reuseable, so remove them from the analysis header.
2022-01-06 18:02:13 -08:00
Philip Reames
67a3331e4f Inline extractMallocCall to sole use and delete [NFC] 2022-01-06 18:02:13 -08:00
Philip Reames
c16fd6a376 Rename doesNotReadMemory to onlyWritesMemory globally [NFC]
The naming has come up as a source of confusion in several recent reviews.  onlyWritesMemory is consist with onlyReadsMemory which we use for the corresponding readonly case as well.
2022-01-05 08:52:55 -08:00
Nikita Popov
99c6b12b92 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case b) it bypasses casts that might not be legal generally but
do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time. This replaces two usages with a more
complete helper. Other usages will be replaced separately, because
they have larger impact.

This is part of D115924.
2022-01-05 12:30:46 +01:00
Sjoerd Meijer
e550dfa4a6 Silence a few unused variable warnings. NFC. 2022-01-05 09:15:07 +00:00
Philip Reames
0b09313cd5 [funcattrs] Infer writeonly argument attribute [part 2]
This builds on the code from D114963, and extends it to handle calls both direct and indirect. With the revised code structure (from series of previously landed NFCs), this is pretty straight forward.

One thing to note is that we can not infer writeonly for arguments which might be captured. If the pointer can be read back by the caller, and then read through, we have no way to track that. This is the same restriction we have for readonly, except that we get no mileage out of the "callee can be readonly" exception since a writeonly param on a readonly function is either a) readnone or b) UB. This means we can't actually infer much unless nocapture has already been inferred.

Differential Revision: https://reviews.llvm.org/D115003
2022-01-04 09:07:54 -08:00
serge-sans-paille
9290ccc3c1 Introduce the AttributeMask class
This class is solely used as a lightweight and clean way to build a set of
attributes to be removed from an AttrBuilder. Previously AttrBuilder was used
both for building and removing, which introduced odd situation like creation of
Attribute with dummy value because the only relevant part was the attribute
kind.

Differential Revision: https://reviews.llvm.org/D116110
2022-01-04 15:37:46 +01:00
Nikita Popov
bbeaf2aac6 [GlobalOpt][Evaluator] Rewrite global ctor evaluation (fixes PR51879)
Global ctor evaluation currently models memory as a map from Constant*
to Constant*. For this to be correct, it is required that there is
only a single Constant* referencing a given memory location. The
Evaluator tries to ensure this by imposing certain limitations that
could result in ambiguities (by limiting types, casts and GEP formats),
but ultimately still fails, as can be seen in PR51879. The approach
is fundamentally fragile and will get more so with opaque pointers.

My original thought was to instead store memory for each global as an
offset => value representation. However, we also need to make sure
that we can actually rematerialize the modified global initializer
into a Constant in the end, which may not be possible if we allow
arbitrary writes.

What this patch does instead is to represent globals as a MutableValue,
which is either a Constant* or a MutableAggregate*. The mutable
aggregate exists to allow efficient mutation of individual aggregate
elements, as mutating an element on a Constant would require interning
a new constant. When a write to the Constant* is made, it is converted
into a MutableAggregate* as needed.

I believe this should make the evaluator more robust, compatible
with opaque pointers, and a bit simpler as well.

Fixes https://github.com/llvm/llvm-project/issues/51221.

Differential Revision: https://reviews.llvm.org/D115530
2022-01-04 09:30:54 +01:00
Kazu Hirata
e5947760c2 Revert "[llvm] Remove redundant member initialization (NFC)"
This reverts commit fd4808887e.

This patch causes gcc to issue a lot of warnings like:

  warning: base class ‘class llvm::MCParsedAsmOperand’ should be
  explicitly initialized in the copy constructor [-Wextra]
2022-01-03 11:28:47 -08:00
Kazu Hirata
fd4808887e [llvm] Remove redundant member initialization (NFC)
Identified with readability-redundant-member-init.
2022-01-01 16:18:18 -08:00
Fangrui Song
b69fe48ccf [IROutliner] Move global namespace cl::opt inside llvm:: 2021-12-30 01:12:55 -08:00
Johannes Doerfert
944aa0421c Reapply "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 73ece231ee and
reapplies 7bfcdbcbf3 with mlir changes.
Also reverts commit 423ba12971 and
includes the unit test changes of
16da214004.
2021-12-29 01:10:38 -06:00
Mehdi Amini
73ece231ee Revert "[OpenMP][NFCI] Embed the source location string size in the ident_t"
This reverts commit 7bfcdbcbf3.
Broke MLIR build
2021-12-29 06:57:36 +00:00
Johannes Doerfert
5602c866c0 [Attributor] Look through allocated heap memory
AAPointerInfo, and thereby other places, can look already through
internal global and stack memory. This patch enables them to look
through heap memory returned by functions with a `noalias` return.

In the future we can look through `noalias` arguments as well but that
will require AAIsDead to learn that such memory can be inspected by the
caller later on. We also need teach AAPointerInfo about dominance to
actually deal with memory that might not be `null` or `undef`
initialized. D106397 is a first step in that direction already.

Reviewed By: kuter

Differential Revision: https://reviews.llvm.org/D109170
2021-12-29 00:21:36 -06:00
Johannes Doerfert
3e0c512ce6 [OpenMP] Simplify all stores in the device code
Similar to loads, we want to be aggressive when it comes to store
simplification. Not everything in LLVM handles dead stores well when
address space casts are involved, we can simply ask the Attributor to do
it for us though.

Reviewed By: tianshilei1992

Differential Revision: https://reviews.llvm.org/D109998
2021-12-29 00:19:38 -06:00
Johannes Doerfert
7bfcdbcbf3 [OpenMP][NFCI] Embed the source location string size in the ident_t
One of the unused ident_t fields now holds the size of the string
(=const char *) field so we have an easier time dealing with those
in the future.

Differential Revision: https://reviews.llvm.org/D113126
2021-12-28 23:53:29 -06:00
Johannes Doerfert
6e2fcf8513 [Attributor][FIX] Ensure store uses are correlated with reloads
While we skipped uses in stores if we can find all copies of the value
when the memory is loaded, we did not correlate the use in the store
with the use in the load. So far this lead to less precise results in the
offset calculations which prevented deductions. With the new
EquivalentUseCB callback argument the user of checkForAllUses can be
informed of the correlation and act on it appropriately.

Differential Revision: https://reviews.llvm.org/D109662
2021-12-28 23:53:29 -06:00
Johannes Doerfert
9f04a0ea43 [OpenMP][FIX] Make AAExecutionDomain deterministic 2021-12-28 23:53:29 -06:00
Johannes Doerfert
ba70f3a5d9 [OpenMP][FIX] Make heap2shared deterministic
Issue #52875 reported non-determinism, this is the first step to avoid
it. We iterate over MallocCalls so we should keep the order stable.
2021-12-28 23:53:28 -06:00
Johannes Doerfert
7de5da2a67 [OpenMP][NFC] Move address space enum into OMPConstants header 2021-12-28 23:53:28 -06:00
Joseph Huber
6e220296d7 [OpenMP] Use alignment information in HeapToShared
This patch uses the return alignment attribute now present in the
`__kmpc_alloc_shared` runtime call to set the alignment of the shared
memory global created to replace it.

Depends on D115971

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D116319
2021-12-27 16:58:27 -05:00
Joseph Huber
38fc89623b [Attributor][Fix] Add alignment return attribute to HeapToStack
This patch changes the HeapToStack optimization to attach the return alignment
attribute information to the created alloca instruction. This would cause
problems when replacing the heap allocation with an alloca did not respect the
alignment of the original heap allocation, which would typically be aligned on
an 8 or 16 byte boundary. Malloc calls now contain alignment attributes,
so we can use that information here.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D115888
2021-12-27 16:58:23 -05:00
Nikita Popov
69ffc3cee9 [Attributor] Directly call areTypesABICompatible() hook
Instead of using the ArgumentPromotion implementation, we now walk
call sites using checkForAllCallSites() and directly call
areTypesABICompatible() using the replacement types. I believe
that resolves the TODO in the code.

Differential Revision: https://reviews.llvm.org/D116033
2021-12-24 09:20:31 +01:00
Philip Reames
ee5d5e19f9 [funcattrs] Use callsite param attributes from indirect calls when inferring access attributes
Arguments to an indirect call is by definition outside the SCC, but there's no reason we can't use locally defined facts on the call site. This also has the nice effect of further simplifying the code.

Differential Revision: https://reviews.llvm.org/D116118
2021-12-22 18:21:59 -08:00
Nikita Popov
f5ac23b5ae [ArgPromotion][TTI] Pass types to ABI compatibility hook
The areFunctionArgsABICompatible() hook currently accepts a list of
pointer arguments, though what we're actually interested in is the
ABI compatibility after these pointer arguments have been converted
into value arguments.

This means that a) the current API is incompatible with opaque
pointers (because it requires inspection of pointee types) and
b) it can only be used in the specific context of ArgPromotion.
I would like to reuse the API when inspecting calls during inlining.

This patch converts it into an areTypesABICompatible() hook, which
accepts a list of types. This makes the method more generally usable,
and compatible with opaque pointers from an API perspective (the
actual usage in ArgPromotion/Attributor is still incompatible,
I'll follow up on that in separate patches).

Differential Revision: https://reviews.llvm.org/D116031
2021-12-22 09:37:51 +01:00
minglotus-6
9c49f8d705 [LTO][WPD] Ignore unreachable function by analyzing IR.
In regular LTO, analyze IR and discard unreachable functions when finding virtual call targets.

Differential Revision: https://reviews.llvm.org/D116056
2021-12-21 18:13:03 +00:00
Philip Reames
b7b308c50a [funcattrs] Infer access attributes for vararg arguments
This change allows us to infer access attributes (readnone, readonly) on arguments passed to vararg functions. Since there isn't a formal argument corresponding to the parameter, they'll never be considered part of the speculative SCC, but they can still benefit from attributes on the call site or the callee function.

The main motivation here is just to simplify the code, and remove some special casing. Previously, an indirect vararg call could return more precise results than an direct vararg call which is just weird.

Differential Revision: https://reviews.llvm.org/D115964
2021-12-21 09:34:14 -08:00
Philip Reames
1fee7195c9 [funcattrs] Fix incorrect readnone/readonly inference on captured arguments
This fixes a bug where we would infer readnone/readonly for a function which passed a value to a function which could capture it. With the value captured in memory, the function could reload the value from memory after the call, and write to it. Inferring the argument as readnone or readonly is unsound.

@jdoerfert apparently noticed this about two years ago, and tests were checked in with 76467c4, but the issue appears to have never gotten fixed.

Since this seems like this issue should break everything, let me explain why the case is actually fairly narrow. The main inference loop over the argument SCCs only analyzes nocapture arguments. As such, we can only hit this when construction the partial SCCs. Due to that restriction, we can only hit this when we have either a) a function declaration with a manually annotated argument, or b) an immediately self recursive call.

It's also worth highlighting that we do have cases we can infer readonly/readnone on a capturing argument validly. The easiest example is a function which simply returns its argument without ever accessing it.

Differential Revision: https://reviews.llvm.org/D115961
2021-12-21 09:34:14 -08:00
Sjoerd Meijer
9e3ae8d296 [FuncSpec] Rename internal option. NFC.
Rename option MaxConstantsThreshold to MaxClonesThreshold. Not only is this
more descriptive, this is also in preparation of introducing another threshold
to analyse more than just 1 constant argument as we currently do, and to better
distinguish these options/thresholds.
2021-12-21 11:02:01 +00:00
Nikita Popov
2926d6d335 [ConstantFold][GlobalOpt] Don't create x86_mmx null value
This fixes the assertion failure reported at
https://reviews.llvm.org/D114889#3198921 with a straightforward
check, until the cleaner fix in D115924 can be reapplied.
2021-12-21 09:11:41 +01:00
Sami Tolvanen
5dc8aaac39 [llvm][IR] Add no_cfi constant
With Control-Flow Integrity (CFI), the LowerTypeTests pass replaces
function references with CFI jump table references, which is a problem
for low-level code that needs the address of the actual function body.

For example, in the Linux kernel, the code that sets up interrupt
handlers needs to take the address of the interrupt handler function
instead of the CFI jump table, as the jump table may not even be mapped
into memory when an interrupt is triggered.

This change adds the no_cfi constant type, which wraps function
references in a value that LowerTypeTestsModule::replaceCfiUses does not
replace.

Link: https://github.com/ClangBuiltLinux/linux/issues/1353

Reviewed By: nickdesaulniers, pcc

Differential Revision: https://reviews.llvm.org/D108478
2021-12-20 12:55:32 -08:00
Nikita Popov
aeb36ae0f4 Revert "[ConstantFolding] Unify handling of load from uniform value"
This reverts commit 9fd4f80e33.

This breaks SingleSource/Regression/C/gcc-c-torture/execute/pr19687.c
in test-suite. Either the test is incorrect, or clang is generating
incorrect union initialization code. I've submitted
https://reviews.llvm.org/D115994 to fix the test, assuming my
interpretation is correct. Reverting this in the meantime as it
may take some time to resolve.
2021-12-18 20:46:52 +01:00
Kazu Hirata
fee57711fe Use DenseMap::lookup (NFC) 2021-12-17 18:19:25 -08:00
Kazu Hirata
2b7be47b22 [llvm] Strip redundant lambda (NFC) 2021-12-17 10:51:40 -08:00
Philip Reames
4c9e31a481 [funcattrs] Use early return to clarify code in determinePointerAccessAttrs [NFC]
Instead of having the speculative path be the untaken path in the branch, explicitly have it return.  This does require tail duplicating one call, but the resulting code is shorter and easier to understand.  Also rewrite the condition using appropriate accessors.
2021-12-17 10:00:36 -08:00
Philip Reames
54ee8bb73a [funcattrs] Use getDataOperandNo where appropriate [NFC]
We'd manually duplicated the same logic and assertions; we can use the utility instead.
2021-12-17 09:35:29 -08:00
Philip Reames
33cbaab141 [funcattrs] Consistently treat calling a function pointer as a non-capturing read
We were being wildly inconsistent about what memory access was implied by an indirect function call. Depending on the call site attributes, you could get anything from a read, to unknown, to none at all. (The last was a miscompile.)

We were also always traversing the uses of a readonly indirect call. This is entirely unneeded as the indirect call does not capture. The callee might capture itself internally, but that has no implications for this caller. (See the nice explanation in the CaptureTracking comments if that case is confusing.)

Note that elsewhere in the same file, we were correctly computing the nocapture attribute for indirect calls. The changed case only resulted in conservatism when computing memory attributes if say the return value was written to.

Differential Revision: https://reviews.llvm.org/D115916
2021-12-17 09:02:03 -08:00
Nikita Popov
9fd4f80e33 [ConstantFolding] Unify handling of load from uniform value
There are a number of places that specially handle loads from a
uniform value where all the bits are the same (zero, one, undef,
poison), because we a) don't care about the load offset in that
case and b) it bypasses casts that might not be legal generally
but do work with uniform values.

We had multiple implementations of this, with a different set of
supported values each time, as well as incomplete type checks in
some cases. In particular, this fixes the assertion reported in
https://reviews.llvm.org/D114889#3198921, as well as a similar
assertion that could be triggered via constant folding.

Differential Revision: https://reviews.llvm.org/D115924
2021-12-17 17:05:06 +01:00
Sjoerd Meijer
b7b61fe091 [FuncSpec] Create helper to update state. NFC.
This creates a helper function updateSpecializedFuncs and is a NFC just to make
the function that drives the transformation easier to read.
2021-12-17 12:14:33 +00:00
Sjoerd Meijer
78a392cf9f [FuncSpec] Respect MaxConstantsThreshold
This is a follow up of D115458 and truncates the worklist of actual arguments
that can be specialised to 'MaxConstantsThreshold' candidates if
MaxConstantsThreshold was exceeded. Thus, this changes the behaviour of option
-func-specialization-max-constants. Before it didn't specialise at all when
this threshold was exceeded, but now it specialises up to MaxConstantsThreshold
candidates from the sorted worklist.

Differential Revision: https://reviews.llvm.org/D115509
2021-12-17 09:25:45 +00:00
Sjoerd Meijer
89bcfd1632 Recommit "[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates."
Replaced llvm:sort with llvm::stable_sort, this was failing on the bot with
expensive checks enabled.
2021-12-17 09:02:51 +00:00
Sjoerd Meijer
5b139a583d Revert "[FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates."
This reverts commit 20b03d6536.

This shows some failed tests on a bot with expensive checks enabled that I need
to look at.
2021-12-16 12:56:11 +00:00
Sjoerd Meijer
20b03d6536 [FuncSpec] Decouple cost/benefit analysis, allowing sorting of candidates.
This mostly is the same code that is refactored to decouple the cost and
benefit analysis. The biggest change is top-level function specializeFunctions
that now drives the transformation more like this:

  specializeFunctions() {
    Cost = getSpecializationCost(F);
    calculateGains(F, Cost);
    specializeFunction(F);
  }

while this is just a restructuring, it helps the functional change in
calculateGains. I.e., we now sort the candidates based on the expected
specialisation gain, which we didn't do before. For this, a book keeping struct
ArgInfo was introduced. If we have a list of N candidates, but we only want
specialise less than N as set by option -func-specialization-max-constants, we
sort the list and discard the candidates that give the least benefit.

Given a formal argument, this change results in selecting the best actual
argument(s). This is NFC'ish in that this shouldn't change the current output
(hence no test change here), but in follow ups starting with D115509, it
should and I want to go one step further and compare all functions and all
arguments, which will mostly build on top of this refactoring and change.

Differential Revision: https://reviews.llvm.org/D115458
2021-12-16 11:55:37 +00:00
minglotus-6
4ab5527c15 [ThinLTO] Ignore unreachable virtual functions in WPD in thin LTO.
Differential Revision: https://reviews.llvm.org/D115648
2021-12-16 02:24:20 +00:00
Hongtao Yu
d7b7b64914 [CSSPGO] Warn instead of error out for modules that are not probed.
Modules that are not compiled with pseudo probe enabled can still be compiled with a sample profile input, such as in LTO postlink where other modules are probed. Since the profile is unrelated to the current modules, we should warn instead of error out the compilation.

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115642
2021-12-14 16:38:50 -08:00
Hongtao Yu
5740bb801a [CSSPGO] Use nested context-sensitive profile.
CSSPGO currently employs a flat profile format for context-sensitive profiles. Such a flat profile allows for precisely manipulating contexts that is either inlined or not inlined. This is a benefit over the nested profile format used by non-CS AutoFDO. A downside of this is the longer build time due to parsing the indexing the full CS contexts.

For a CS flat profile, though only the context profiles relevant to a module are loaded when that module is compiled, the cost to figure out what profiles are relevant is noticeably high when there're many contexts,  since the sample reader will need to scan all context strings anyway. On the contrary, a nested function profile has its related inline subcontexts isolated from other unrelated contexts. Therefore when compiling a set of functions, unrelated contexts will never need to be scanned.

In this change we are exploring using nested profile format for CSSPGO. This is expected to work based on an assumption that with a preinliner-computed profile all contexts are precomputed and expected to be inlined by the compiler. Contexts not expected to be inlined will be cut off and returned to corresponding base profiles (for top-level outlined functions). This naturally forms a nested profile where all nested contexts are expected to be inlined. The compiler will less likely optimize on derived contexts that are not precomputed.

A CS-nested profile will look exactly the same with regular nested profile except that each nested profile can come with an attributes. With pseudo probes,  a nested profile shown as below can also have a CFG checksum.

```

main:1968679:12
 2: 24
 3: 28 _Z5funcAi:18
 3.1: 28 _Z5funcBi:30
 3: _Z5funcAi:1467398
  0: 10
  1: 10 _Z8funcLeafi:11
  3: 24
  1: _Z8funcLeafi:1467299
   0: 6
   1: 6
   3: 287884
   4: 287864 _Z3fibi:315608
   15: 23
   !CFGChecksum: 138828622701
   !Attributes: 2
  !CFGChecksum: 281479271677951
  !Attributes: 2
```

Specific work included in this change:
- A recursive profile converter to convert CS flat profile to nested profile.
- Extend function checksum and attribute metadata to be stored in nested way for text profile and extbinary profile.
- Unifiy sample loader inliner path for CS and preinlined nested profile.
 - Changes in the sample loader to support probe-based nested profile.

I've seen promising results regarding build time. A nested profile can result in a 20% shorter build time than a CS flat profile while keep an on-par performance. This is with -duplicate-contexts-into-base=1.

Test Plan:

Reviewed By: wenlei

Differential Revision: https://reviews.llvm.org/D115205
2021-12-14 14:40:25 -08:00
Mingming Liu
09a704c5ef [LTO] Ignore unreachable virtual functions in WPD in hybrid LTO.
Differential Revision: https://reviews.llvm.org/D115492
2021-12-14 20:18:04 +00:00
Kazu Hirata
36b8a4f9f3 [llvm] Use llvm::is_contained (NFC) 2021-12-11 11:42:09 -08:00
Sami Tolvanen
9a74c753fe [ThinLTO][MC] Use conditional assignments for promotion aliases
Inline assembly refererences to static functions with ThinLTO+CFI were
fixed in D104058 by creating aliases for promoted functions. Creating
the aliases unconditionally resulted in an unexpected size increase in
a Chrome helper binary:

https://bugs.chromium.org/p/chromium/issues/detail?id=1261715

This is caused by the compiler being unable to drop unused code now
referenced by the alias in module-level inline assembly. This change
adds a .set_conditional assembly extension, which emits an assignment
only if the target symbol is also emitted, avoiding phantom references
to functions that could have otherwise been dropped.

This is an alternative to the solution proposed in D112761.

Reviewed By: pcc, nickdesaulniers, MaskRay

Differential Revision: https://reviews.llvm.org/D113613
2021-12-10 12:21:37 -08:00