This allows constructing a MemDesc from a MachineMemoryOperand, a pattern that starts to show up more frequently.
Reviewed By: foad
Differential Revision: https://reviews.llvm.org/D109161
This is important as with exceptions enabled, non-POD allocas often have
two lifetime ends: the exception handler, and the normal one.
Reviewed By: eugenis
Differential Revision: https://reviews.llvm.org/D108365
Set up basic infrastructure for 64-bit ARM architecture support in JITLink. It allows for loading a minimal object file and resolving a single relocation. Advanced features like GOT and PLT handling or relaxations were intentionally left out for the moment.
This patch follows the idea to keep implementations for ARM (32-bit) and Aaarch64 (64-bit) separate, because:
* it might be easier to share code with the MachO "arm64" JITLink backend
* LLVM has individual targets for ARM and Aaarch64 as well
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D108986
Add support for ordered directive in the OpenMPIRBuilder.
This patch also modidies clang to use the ordered directive when the
option -fopenmp-enable-irbuilder is enabled.
Also fix one ICE when parsing one canonical for loop with the relational
operator LE or GE in openmp region by replacing unary increment
operation of the expression of the variable "Expr A" minus the variable
"Expr B" (++(Expr A - Expr B)) with binary addition operation of the
experssion of the variable "Expr A" minus the variable "Expr B" and the
expression with constant value "1" (Expr A - Expr B + "1").
Reviewed By: Meinersbur, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107430
All ExecutorProcessControl subclasses must provide a JITLinkMemoryManager object
that can be used to allocate memory in the executor process. The
EPCGenericJITLinkMemoryManager class provides an off-the-shelf
JITLinkMemoryManager implementation for JITs that do not need (or cannot
provide) a specialized JITLinkMemoryManager implementation. This simplifies the
process of creating new ExecutorProcessControl implementations.
The ExecutionSession versions now just forward to the implementations in
ExecutorProcessControl.
This allows callWrapper / callSPSWrapper to be used while bootstrapping an
ExecutorProcessControl instance.
This adds the following combines:
```
x = ... 0 or 1
c = icmp eq x, 1
->
c = x
```
and
```
x = ... 0 or 1
c = icmp ne x, 0
->
c = x
```
When the target's true value for the relevant types is 1.
This showed up in the following situation:
https://godbolt.org/z/M5jKexWTW
SDAG currently supports the `ne` case, but not the `eq` case. This can probably
be further generalized, but I don't feel like thinking that hard right now.
This gives some minor code size improvements across the board on CTMark at
-Os for AArch64. (0.1% for 7zip and pairlocalalign in particular.)
Differential Revision: https://reviews.llvm.org/D109130
To any downstream users broken by this change, please examine your uses
of these methods and see if you can use a better method. For example,
getAttribute(AttributeList::FunctionIndex) => getFnAttr(), or
addAttribute(AttributeList::FirstArgIndex + ArgNo) =>
addParamAttribute(ArgNo). 0 corresponds to ReturnIndex, ~0 corresponds
to FunctionIndex. This may make future cleanups less painful.
I've made the mistake of assuming that these indexes are for parameters
multiple times, but actually they're based off of a weird indexing
scheme AttributeList::AttrIndex where 0 is the return value and ~0 is
the function. Hopefully renaming these methods will make this clearer.
Ideally users should use more specific methods like
AttributeList::getFnAttr().
This touches all relevant methods in AttributeList, CallBase, and Function.
This hopefully will make easier a future change to cleanup AttrIndex. A
previous worry about cleaning up AttrIndex was that too many downstream
users would have to look through all uses of AttrIndex and relevant
attribute method calls to see if anything was unintentionally hardcoded
(e.g. using 0 instead of ReturnIndex). With this change hopefully
downstream users will look at existing usages of these methods and clean
them up.
Reviewed By: rnk, MaskRay
Differential Revision: https://reviews.llvm.org/D108614
Looks like the MS STL wants StringMapKeyIterator::operator*() to be const.
Return the result by copy instead of reference to do that.
Assigning to a hash map key iterator doesn't make sense anyways.
Also reverts 123f811fe5 which is now hopefully no longer needed.
Differential Revision: https://reviews.llvm.org/D109167
When preinliner is used for CSSPGO, we try to honor global preinliner decision as much as we can except for uninlinable callees. We rely on InlineCost::Never to prevent us from illegal inlining.
However, it turns out that we use InlineCost::Never for both illeagle inlining and some of the "not-so-beneficial" inlining.
The most common one is recursive inlining, while it can bloat size a lot during CGSCC bottom-up inlining, it's less of a problem when recursive inlining is guided by profile and done in top-down manner.
Ideally it'd be better to have a clear separation between inline legality check vs cost-benefit check, but that requires a bigger change.
This change enables InlineCost computation to allow inlining recursive calls, controlled by InlineParams. In SampleLoader, we now enable recursive inlining for CSSPGO when global preinliner decision is used.
With this change, we saw a few perf improvements on SPEC2017 with CSSPGO and preinliner on: 2% for povray_r, 6% for xalancbmk_s, 3% omnetpp_s, while size is about the same (no noticeable perf change for all other benchmarks)
Differential Revision: https://reviews.llvm.org/D109104
Now prints the list of known archs. This requires plumbing a Driver
arg through a few functions.
Also add two more convenience insert() overlods to StringMap.
Differential Revision: https://reviews.llvm.org/D109105
Please refer to
https://lists.llvm.org/pipermail/llvm-dev/2021-September/152440.html
(and that whole thread.)
TLDR: the original patch had no prior RFC, yet it had some changes that
really need a proper RFC discussion. It won't be productive to discuss
such an RFC, once it's actually posted, while said patch is already
committed, because that introduces bias towards already-committed stuff,
and the tree is potentially in broken state meanwhile.
While the end result of discussion may lead back to the current design,
it may also not lead to the current design.
Therefore i take it upon myself
to revert the tree back to last known good state.
This reverts commit 4c4093e6e3.
This reverts commit 0a2b1ba33a.
This reverts commit d9873711cb.
This reverts commit 791006fb8c.
This reverts commit c22b64ef66.
This reverts commit 72ebcd3198.
This reverts commit 5fa6039a5f.
This reverts commit 9efda541bf.
This reverts commit 94d3ff09cf.
Breaks build with -DBUILD_SHARED_LIBS=ON
```
CMake Error: The inter-target dependency graph contains the following strongly connected component (cycle):
"LLVMFrontendOpenMP" of type SHARED_LIBRARY
depends on "LLVMPasses" (weak)
"LLVMipo" of type SHARED_LIBRARY
depends on "LLVMFrontendOpenMP" (weak)
"LLVMCoroutines" of type SHARED_LIBRARY
depends on "LLVMipo" (weak)
"LLVMPasses" of type SHARED_LIBRARY
depends on "LLVMCoroutines" (weak)
depends on "LLVMipo" (weak)
At least one of these targets is not a STATIC_LIBRARY. Cyclic dependencies are allowed only among static libraries.
CMake Generate step failed. Build files cannot be regenerated correctly.
```
This reverts commit 707ce34b06.
llvm.vp.select extends the regular select instruction with an explicit
vector length (%evl).
All lanes with indexes at and above %evl are
undefined. Lanes below %evl are taken from the first input where the
mask is true and from the second input otherwise.
Reviewed By: rogfer01
Differential Revision: https://reviews.llvm.org/D105351
This patch corrects the auto-generated EVL and Mask index positions of
the `VP_LOAD`/`VP_STORE`/`VP_GATHER`/`VP_SCATTER` nodes.
Reviewed By: simoll
Differential Revision: https://reviews.llvm.org/D109063
Add methods for loop unrolling to the OpenMPIRBuilder class and use them in Clang if `-fopenmp-enable-irbuilder` is enabled. The unrolling methods are:
* `unrollLoopFull`
* `unrollLoopPartial`
* `unrollLoopHeuristic`
`unrollLoopPartial` and `unrollLoopHeuristic` can use compiler heuristics to automatically determine the unroll factor. If possible, that is if no CanonicalLoopInfo is required to pass to another method, metadata for LLVM's LoopUnrollPass is added. Otherwise the unroll factor is determined using the same heurstics as user by LoopUnrollPass. Not requiring a CanonicalLoopInfo, especially with `unrollLoopHeuristic` allows greater flexibility.
With full unrolling and partial unrolling with known unroll factor, instead of duplicating instructions by the OpenMPIRBuilder, the full unroll is still delegated to the LoopUnrollPass. In case of partial unrolling the loop is first tiled using the existing `tileLoops` methods, then the inner loop fully unrolled using the same mechanism.
Reviewed By: jdoerfert, kiranchandramohan
Differential Revision: https://reviews.llvm.org/D107764
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Differential Revision: https://reviews.llvm.org/D108298
Added opt option -print-pipeline-passes to print a -passes compatible
string describing the built pass pipeline.
As an example:
$ opt -enable-new-pm=1 -adce -licm -simplifycfg -o /dev/null /dev/null -print-pipeline-passes
verify,function(adce),function(loop-mssa(licm)),function(simplifycfg<bonus-inst-threshold=1;no-forward-switch-cond;no-switch-to-lookup;keep-loops;no-hoist-common-insts;no-sink-common-insts>),verify,BitcodeWriterPass
At the moment this is best-effort only and there are some known
limitations:
- Not all passes accepting parameters will print their parameters
(currently only implemented for simplifycfg).
- Some ClassName to pass-name mappings are not unique.
- Some ClassName to pass-name mappings are missing (e.g.
BitcodeWriterPass).
Document 'use-external-name' and the various bits of logic that make it
work, to avoid others having to repeat the archival work (given that I
added getFileRefReturnsCorrectNameForDifferentStatPath to
FileManagerTest, seems possible I understood this once before!).
- b59cf679e8 added 'use-external-name' to
RedirectingFileSystem. This causes `stat`s to return the external
name for a redirected file instead of the name it was accessed by,
leaking it through the VFS.
- d066d4c849 propagated the external name
further through clang::FileManager.
- 4dc5573acc, which added
clang::FileEntryRef to clang::FileManager, has complicated concession
to account for this as well (since refactored a bit).
The goal of 'use-external-name' is to enable Clang to report "real" file
paths to users (via diagnostics) and to external tools (such as
debuggers reading debug info and build systems reading `.d` files).
I've added FIXMEs to look at other channels for communicating the
external names, since the current implementation adds complexity to
FileManager and exposes an inconsistent interface to clients.
Besides that, the FileManager logic appears to be kicking in outside of
'use-external-name'. Seems that *some* vfs::FileSystem implementations
canonicalize some paths returned by `stat` in *some* cases (the bug
isn't fully understood yet). Volodymyr Sapsai is investigating, this at
least better documents what *is* understood.
With the context split work, the context-based (an array of strings) sorting performed at profile load time is way more expansive than single-string-based sorting. This is likely due to auxiliary operations done on each array element, such as indirect references, std::min operations, also likely cache misses. In this change I'm presorting profiles during profile generation time to avoid sorting at compile time.
Compared to the previous context-split work, this effectively cuts down compile time by 20% for one of our large services and brings us closer to non-CS build, with still a small gap in build time.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D109036
Store the used element type in the InductionDescriptor. For typed
pointers, it remains the pointer element type. For opaque pointers,
we always use an i8 element type, such that the step is a simple
offset.
A previous version of this patch instead tried to guess the element
type from an induction GEP, but this is not reliable, as the GEP
may be hidden (see @both in iv_outside_user.ll).
Differential Revision: https://reviews.llvm.org/D104795
This is used by BOLT to do patching of DebugInfo section, and Line Table. Directly by using find, and through getAttrFieldOffsetForUnit.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D107874
This is part one of a couple of patches to fully rename these methods.
I've made the mistake of assuming that these indexes are for parameters
multiple times, but actually they're based off of a weird indexing
scheme AttributeList::AttrIndex where 0 is the return value and ~0 is
the function. Hopefully renaming these methods will make this clearer.
Ideally users should use more specific methods like
AttributeList::getFnAttr().
This patch simply adds the name that we want in the end. This is so the
removal of the methods with the original names happens in a separate
change to make it easier for downstream users.
This touches all relevant methods in AttributeList, CallBase, and Function.
Reviewed By: rnk
Differential Revision: https://reviews.llvm.org/D108788
Adding the compiler support of MD5 CS profile based on pervious context split work D107299. A MD5 CS profile is about 40% smaller than the string-based extbinary profile. As a result, the compilation is 15% faster.
There are a few conversion from real names to md5 names that have been made on the sample loader and context tracker side to get it work.
Reviewed By: wenlei, wmi
Differential Revision: https://reviews.llvm.org/D108342
Setting -fvisibility=hidden when compiling Target libs has the advantage of
not being intrusive on the codebase, but it also sets the visibility of all
functions within header-only component like ADT. In the end, we end up with
some symbols with hidden visibility within llvm dylib (through the target libs),
and some with external visibility (through other libs). This paves the way for
subtle bugs like https://reviews.llvm.org/D101972
This patch explicitly set the visibility of some classes to `default` so that
`llvm::Any` related symbols keep a `default` visibility. Indeed a template
function with `default` visibility parametrized by a type with `hidden`
visibility is granted `hidden` visibility, and we don't want this for the
uniqueness of `llvm::Any::TypeId`.
Differential Revision: https://reviews.llvm.org/D108943
We have a large compile showing occasional non-deterministic behavior
that is due to DIArgList not being properly uniqued in some cases. I
tracked this down to handleChangedOperands, for which there is a custom
implementation for DIArgList, that does not take care of re-uniquing
after updating the DIArgList Args, unlike the default version of
handleChangedOperands for MDNode.
Since the Args in the DIArgList form the key for the store, this seems
to be occasionally breaking the lookup in that DenseSet. Specifically,
when invoking DIArgList::get() from replaceVariableLocationOp, very
occasionally it returns a new DIArgList object, when one already exists
having the same exact Args pointers. This in turn causes a subsequent
call to Instruction::isIdenticalToWhenDefined on those two otherwise
identical DIArgList objects during a later pass to return false, leading
to different IR in those rare cases.
I modified DIArgList::handleChangedOperands to perform similar
re-uniquing as the MDNode version used by other metadata node types.
This also necessitated a change to the context destructor, since in some
cases we end up with DIArgList as distinct nodes: DIArgList is the only
metadata node type to have a custom dropAllReferences, so we need to
invoke that version on DIArgList in the DistinctMDNodes store to clean
it up properly.
Differential Revision: https://reviews.llvm.org/D108968
SCEVPredicate has a friend declaration. The friend can technically call the
protected destructor, so the warning is legitimate. Clang simply doesn't implement
the friend check.
Make the dtor virtual to fix the issue.
Analogous to the TSFlags for machine instructions, this
patch introduces a bit vector for register classes to have
target specific flags that become a tablegened value in
TargetRegisterClass.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D108767
A new LLVM specific TAG DW_TAG_LLVM_annotation is added.
The name is suggested by Paul Robinson ([1]).
Currently, this tag is used to output __attribute__((btf_tag("string")))
annotations in dwarf. The following is an example for a global
variable with two btf_tag attributes:
0x0000002a: DW_TAG_variable
DW_AT_name ("g1")
DW_AT_type (0x00000052 "int")
DW_AT_external (true)
DW_AT_decl_file ("/tmp/home/yhs/work/tests/llvm/btf_tag/t.c")
DW_AT_decl_line (8)
DW_AT_location (DW_OP_addr 0x0)
0x0000003f: DW_TAG_LLVM_annotation
DW_AT_name ("btf_tag")
DW_AT_const_value ("tag1")
0x00000048: DW_TAG_LLVM_annotation
DW_AT_name ("btf_tag")
DW_AT_const_value ("tag2")
0x00000051: NULL
In the future, DW_TAG_LLVM_annotation may encode other type
of non-string const value.
[1] https://lists.llvm.org/pipermail/llvm-dev/2021-June/151250.html
Differential Revision: https://reviews.llvm.org/D106621
Followup to D99355: SDAG support for vector-predicated load/store/gather/scatter.
Reviewed By: frasercrmck
Differential Revision: https://reviews.llvm.org/D105871
To support Virtual Function Elimination to Swift, this PR adds support for Swift
vtables which contain "relative pointers" instead of direct pointer references.
These are in the form of:
@symbol = ... {
i32 trunc (i64 sub (i64 ptrtoint (<type> @target to i64), i64 ptrtoint (... @symbol to i64)) to i32)
}
The PR extends GlobalDCE's way of looking up a vtable offset into a dependency
to be able to see through this expression and find the target symbol.
Differential Revision: https://reviews.llvm.org/D107645
Currently context strings contain a lot of duplicated function names and that significantly increase the profile size. This change split the context into a series of {name, offset, discriminator} tuples so function names used in the context can be replaced by the index into the name table and that significantly reduce the size consumed by context.
A follow-up improvement made in the compiler and profiling tools is to avoid reconstructing full context strings which is time- and memory- consuming. Instead a context vector of `StringRef` is adopted to represent the full context in all scenarios. As a result, the previous prevalent profile map which was implemented as a `StringRef` is now engineered as an unordered map keyed by `SampleContext`. `SampleContext` is reshaped to using an `ArrayRef` to represent a full context for CS profile. For non-CS profile, it falls back to use `StringRef` to represent a contextless function name. Both the `ArrayRef` and `StringRef` objects are underpinned by real array and string objects that are stored in producer buffers. For compiler, they are maintained by the sample reader. For llvm-profgen, they are maintained in `ProfiledBinary` and `ProfileGenerator`. Full context strings can be generated only in those cases of debugging and printing.
When it comes to profile format, nothing has changed to the text format, though internally CS context is implemented as a vector. Extbinary format is only changed for CS profile, with an additional `SecCSNameTable` section which stores all full contexts logically in the form of `vector<int>`, which each element as an offset points to `SecNameTable`. All occurrences of contexts elsewhere are redirected to using the offset of `SecCSNameTable`.
Testing
This is no-diff change in terms of code quality and profile content (for text profile).
For our internal large service (aka ads), the profile generation is cut to half, with a 20x smaller string-based extbinary format generated.
The compile time of ads is dropped by 25%.
Differential Revision: https://reviews.llvm.org/D107299
Currently, the IROutliner uses a simple metric to outline the largest amount
of IR possible to outline first if it fits the cost model. This is model
loses out on smaller blocks of code that have higher reductions in cost that
are contained within larger blocks of IR.
This reverses the order, where we calculate all of the costs first, and then
reorder and extract items based on the calculated results.
Reviewers: paquette
Differential Revision: https://reviews.llvm.org/D106440
The backend generally uses 64-bit immediates (e.g. what
MachineOperand::getImm() returns), so use that for analyzeCompare()
and optimizeCompareInst() as well. This avoids truncation for
targets that support immediates larger 32-bit. In particular, we
can avoid the bugprone value normalization hack in the AArch64
target.
This is a followup to D108076.
Differential Revision: https://reviews.llvm.org/D108875