In [0] we described an algorithm called //BalancedPartitioning// (bp) to consume function traces [1] and compute a function order that reduces the number of page faults during startup.
This patch adds the `order` command to the `llvm-profdata` tool which uses bp to output a function order that can be passed to the linker via `--symbol-ordering-file=`.
Special thanks to Sergey Pupyrev and Julian Mestre for designing this balanced partitioning algorithm.
[0] https://discourse.llvm.org/t/rfc-temporal-profiling-extension-for-irpgo/68068
[1] https://reviews.llvm.org/D147287
Reviewed By: spupyrev
Differential Revision: https://reviews.llvm.org/D147812
As suggested by @erichkeane in
https://reviews.llvm.org/D141451#inline-1429549
There's potential for a lot more cleanups around these APIs. This is
just a start.
Callers need to be more careful about sub-expressions producing strings
that don't outlast the expression using `llvm::demangle`. Add a
release note.
Differential Revision: https://reviews.llvm.org/D149104
This brings the list of extensions supported here up to date
with what is supported by current git versions of binutils.
Also add a comment to AArch64TargetParser to remind people to
consider adding new ones to the list supported in assembly.
In the case of the "rdma" extension, there's a slight surprise:
LLVM knows of the extension under the name "rdm", while binutils
has it named "rdma". However, binutils appears to accept any
abbreviated prefix of an arch extension, so it does accept the
form "rdm" too even if it formally considers it called "rdma".
Support both spellings for the extensions here, for simplicity.
Differential Revision: https://reviews.llvm.org/D151981
Define the function @llvm.amdgcn.make.buffer.rsrc, which take a 64-bit
pointer, the 16-bit stride/swizzling constant that replace the high 16
bits of an address in a buffer resource, the 32-bit extent/number of
elements, and the 32-bit flags (the latter two being the 3rd and 4th
wards of the resource), and combines them into a ptr addrspace(8).
This intrinsic is lowered during the early phases of the backend.
This intrinsic is needed so that alias analysis can correctly infer
that a certain buffer resource points to the same memory as some
global pointer. Previous methods of constructing buffer resources,
which relied on ptrtoint, would not allow for such an inference.
Depends on D148184
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D148957
In order to enable the LLVM frontend to better analyze buffer
operations (and to potentially enable more precise analyses on the
backend), define versions of the raw and structured buffer intrinsics
that use `ptr addrspace(8)` instead of `<4 x i32>` to represent their
rsrc arguments.
The new intrinsics are named by replacing `buffer.` with `buffer.ptr`.
One advantage to these intrinsic definitions is that, instead of
specifying that a buffer load/store will read/write some memory, we
can indicate that the memory read or written will be based on the
pointer argument. This means that, for example, a read from a
`noalias` buffer can be pulled out of a loop that is modifying a
distinct buffer.
In the future, we will define custom PseudoSourceValues that will
allow us to package up the (buffer, index, offset) triples that buffer
intrinsics contain and allow for more precise backend analysis.
This work also enables creating address space 7, which represents
manipulation of raw buffers using native LLVM load and store
instructions.
Where tests simply used a buffer intrinsic while testing some other
code path (such as the tests for VGPR spills), they have been updated
to use the new intrinsic form. Tests that are "about" buffer
intrinsics (for instance, those that ensure that they codegen as
expected) have been duplicated, either within existing files or into
new ones.
Depends on D145441
Reviewed By: arsenm, #amdgpu
Differential Revision: https://reviews.llvm.org/D147547
The change implements intrinsics 'get_fpenv', 'set_fpenv' and 'reset_fpenv'.
They are used to read floating-point environment, set it or reset to
some default state. They do the same actions as C library functions
'fegetenv' and 'fesetenv'. By default these intrinsics are lowered to calls
to these functions.
The new intrinsics specify FP environment as a value of integer type, it
is convenient of most targets where the FP state is a content of some
register. Some targets however use long representations. On X86 the size
of FP environment is 256 bits, and even half of this size is not a legal
ibteger type. To facilitate legalization in such cases, two sets of DAG
nodes is used. Nodes GET_FPENV and SET_FPENV are used when FP
environment may be represented by a legal integer type. Nodes
GET_FPENV_MEM and SET_FPENV_MEM consider FP environment as a region in
memory, much like `fesetenv` and `fegetenv` do. They are used when
target has long representation for floationg-point state.
Differential Revision: https://reviews.llvm.org/D71742
The comment moved is referring to the --output-asm-syntax flag rather
than the --print-imm-hex flag, but seems to have mistakenly been put
under the definition of that flag due to some misplaced line numbers on
phabricator.
This patch adds LLVM_ENABLE_HTTPLIB to the list of CMake options to make
it more clear exactly what it does and also provide clarity on which
specific project it is referring to/installation.
Reviewed By: phosek
Differential Revision: https://reviews.llvm.org/D152060
Unlike every other analysis and transform, simplifyInstruction
permitted operating on instructions which are not inserted
into a function. This created an edge case no other code needs
to really worry about, and limited transforms in cases that
can make use of the context function. Only the inliner and a handful
of other utilities were making use of this, so just fix up these
edge cases. Results in some IR ordering differences since
cloned blocks are inserted eagerly now. Plus some additional
simplifications trigger (e.g. some add 0s now folded out that
previously didn't).
Emit a 4-byte alignment after the .arm directive and a 2-byte alignment
after the .thumb directive. The new behavior matches GNU assembler.
Fixes#53386
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D147763
The partial move from JITTargetAddress to ExecutorAddr in 8b1771bd9f did not
update the ORC or Kaleidoscope documents. This patch fixes the inconsistency.
Reviewed By: lhames
Differential Revision: https://reviews.llvm.org/D150458
- This patch proposes to add `!getdagarg` and `!getdagname` bang
operators as the inverse operation of `!dag`. They allow us to examine
arguments of a given dag.
Reviewed By: simon_tatham
Differential Revision: https://reviews.llvm.org/D151602
This reverts commit d763c6e5e2.
Adds the patch by @hans from
https://github.com/llvm/llvm-project/issues/62719
This patch fixes the Windows build.
d763c6e5e2 reverted the reviews
D144509 [CMake] Bumps minimum version to 3.20.0.
This partly undoes D137724.
This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193
Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.
D150532 [OpenMP] Compile assembly files as ASM, not C
Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent)
when compiling a file which has been set as having the language
C. This behaviour change only takes place if "cmake_minimum_required"
is set to 3.20 or newer, or if the policy CMP0119 is set to new.
Attempting to compile assembly files with "-x c" fails, however
this is workarounded in many cases, as OpenMP overrides this with
"-x assembler-with-cpp", however this is only added for non-Windows
targets.
Thus, after increasing cmake_minimum_required to 3.20, this breaks
compiling the GNU assembly for Windows targets; the GNU assembly is
used for ARM and AArch64 Windows targets when building with Clang.
This patch unbreaks that.
D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump
The build uses other mechanism to select the runtime.
Fixes#62719
Reviewed By: #libc, Mordante
Differential Revision: https://reviews.llvm.org/D151344
The generic implementation is umin(TC, VF * vscale).
Lowering to vsetvli for RISC-V will come in a future patch.
This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D149916
At the moment, dsymutil drops all remarks without debug location.
There are many cases where debug location may be missing for remarks,
mostly due LLVM not preserving debug locations. When using bitstream
remarks for statistical analysis, those missed remarks mean we get an
incomplete picture.
The patch flips the default to keeping all remarks and leaving it to
tools that display remarks to filter out remarks without debug locations
as needed.
The new --remarks-drop-without-debug flag can be used to drop remarks
without debug locations, i.e. restore the previous behavior.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D151089
In D148197, we have made `defvar` statement able to refer to class
template arguments. However, the priority of class/multiclass
template argument is higher than variables defined by `defvar`, which
is a little counterintuitive.
In this patch, we unify the priority of variables. Each pair of
braces introduces a new scope, which may contain some additional
variables like template arguments, loop iterators, etc. We can
define local variables inside this scope via `defvar` and these
variables are of higher priority than additional variables. This
means that `defvar` will shadow additional variables with the same
name. The scope can be nested, and we use the innermost variable.
This make variables defined by `defvar` prior to class/multiclass
template arguments, loop iterators, etc. The shadow rules now are:
* `V` in a record body shadows a global `V`.
* `V` in a record body shadows template argument `V`.
* `V` in template arguments shadows a global `V`.
* `V` in a `foreach` statement list shadows any `V` in surrounding record or global scopes.
Reviewed By: tra
Differential Revision: https://reviews.llvm.org/D149016
getModuleMatchQuality was removed by:
commit c3719c36e6
Author: Daniel Dunbar <daniel@zuster.org>
Date: Sun Aug 2 23:37:13 2009 +0000
The Hexagon port was later added with getModuleMatchQuality by:
commit 1213a7a57f
Author: Tony Linthicum <tlinth@codeaurora.org>
Date: Mon Dec 12 21:14:40 2011 +0000
While we are at it, this patch removes a reference to
getModuleMatchQuality in the documentation.
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
Provides MC layer support for Zvfbfwma: vector BF16 widening mul-add.
As currently specified, Zvfbfwma does not appear to have a dependency on
Zvfbfmin or Zfbfmin.
Differential Revision: https://reviews.llvm.org/D147612
Provides MC layer support for Zfbfmin: vector BF16 conversions.
Zvfbfmin does not appear to have a dependency on Zfbfmin as currently
specified.
Differential Revision: https://reviews.llvm.org/D147611
Provides MC layer support for Zfbfmin: scalar BF16 conversions.
As documented, this extension includes FLH, FSH, FMV.H.X, and FMH.X.H as
defined in Zfh/Zfhmin, but doesn't require either extension.
No Zfbfinxmin has been defined (though you would expect one in the
future, for symmetry with Zfhinxmin). See issue
https://github.com/riscv/riscv-bfloat16/issues/27.
Differential Revision: https://reviews.llvm.org/D147610
This reverts commit 65429b9af6.
Broke several projects, see https://reviews.llvm.org/D144509#4347562 onwards.
Also reverts follow-up commit "[OpenMP] Compile assembly files as ASM, not C"
This reverts commit 4072c8aee4.
Also reverts fix attempt "[cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump"
This reverts commit 7d47dac5f8.
Make it clearer minnum(+0, +0) cannot return -0. Also remove
a note about the result always being quiet which is directly
contradicted by the following paragraph.
llvm-exegesis has both a capture mode and an analysis mode that can be
used independently of each other. This patch makes it clear that
analysis mode will work on other platforms that LLVM supports in the
documentation which was unclear before.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D150536
This patch changes two instances of an ampersand to a written out and
for more consistency with the rest of the file and brevity. In addition,
the last `cmake --build` reference is removed, again for consistency
with the rest of the file which shows the ninja invocations. This cmake
invocation also passed in the `--parallel` flag which doesn't make sense
with ninja using all threads by default.
This was changed in the previous patch to touch this line
(https://reviews.llvm.org/D88990), but if we want to change this, it
should be done across the entire file.
Currently, there is no documentation on what platforms and architectures
llvm-exegesis is supported on. This patch adds in user-facing
documentation in the CommandGuide about what architectures are supported
as well as developer facing documentation detailing the technical
reasons for why certain platforms are supported and some aren't.
This is a follow-up after discussion in
https://discourse.llvm.org/t/clarification-on-platform-support-for-llvm-exegesis/70206.
Reviewed By: kpdev42
Differential Revision: https://reviews.llvm.org/D149378
When the coroutine splitter splits swift coroutines, variables in the new
funclets are now described in terms of the frame pointer, which is always placed
at a ABI-specified register whose contents are valid upon function entry. As
such, debug intrinsics must be prepended by the `entry_value` operation.
Depends on D149778
Differential Revision: https://reviews.llvm.org/D149779
A follow up patch will make the CoroSplit pass introduce such operations in the
IR level when it is safe to do so.
Depends on D149748
Differential Revision: https://reviews.llvm.org/D149778
At the moment, we set the BC bit in DPP for both bound_ctrl:0 and
bound_ctrl:1, for compatibility with sp3 (see PR35397). However, this
hack is only needed for GFX8. For newer GFXs, sp3 behaves as expected,
i.e. it sets the bit when bound_ctrl:1 and clears it when bound_ctrl:0.
This patch updates LLVM to do the same for GFX11 or newer. We preserve
the current behaviour for GFX9 and 10 so we don't break any existing
code.
Differential Revision: https://reviews.llvm.org/D149254
Annotation metadata supports adding singular annotation strings to annotation block. This patch adds the ability to insert a tuple of strings into the metadata array.
The idea here is that each tuple of strings represents a piece of information that can be all related. It makes it easier to parse through related metadata information given it will be contained in one tuple.
For example in remarks any pass that implements annotation remarks can have different type of remarks and pass additional information for each.
The original behaviour of annotation remarks is preserved here and we can mix tuple annotations and single annotations for the same instruction.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D148328