There is no vp.fpclass after FCLASS_VL(D151176), try to support vp.fpclass.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D152993
This is in preparation for using the same convergence verifier for both LLVM IR
and Machine IR.
Reviewed By: yassingh
Differential Revision: https://reviews.llvm.org/D158394
This patch relaxes the verifier when it checks whether an OP_entry_value has a
valid Value associated with it. We now allow undef/poison values as well, since
those may be introduced naturally through optimization.
Differential Revision: https://reviews.llvm.org/D158101
The refactored template can now be used with MachineVerifier.
Resubmitted after fixing build errors:
- Shared libraries build failed with undefined references due to "extern
template" declarations.
- Modules build failed due to a cycle dependence between llvm/ADT and llvm/IR.
The Generic*Impl.h files should be in llvm/IR to prevent this.
Differential Revision: https://reviews.llvm.org/D156522
This restores commit 93a3706711.
Originally reverted in 466bd99811.
This reverts commit 93a3706711.
The "extern template" declaration of CycleInfo caused problems in a shared build
when CycleInfo was removed from Verifier.cpp. There needs to be an explicit
instantiation corresponding to an extern template in every SO.
Resolve https://github.com/llvm/llvm-project/issues/61932. We should add the validation.
LLVM can't handle IR where subprogram definitions are nested within DICompositeType when doing LTO builds, because there's no good way to cross the CU boundary to insert a nested DISubprogram definition in one CU into a type defined in another CU.
The test cases `cross-cu-inlining-2.ll` and `cross-cu-inlining-ranges.ll` can be deleted. In the `cross-cu-inlining-2.ll`, the low pc and high pc of the CU are also incorrect.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D152095
According the discuss on D154953, we need to make the LangRef change
before the optimization relied on the new behaviour:
vscale_range implies vscale is a power-of-two value, parse of the
attribute to reject values that are not a power-of-two.
Thanks nikic for the wonderful summary of discussing on D154953:
To provide a bit more context here. We would like to have power of two vscale exposed in a target-independent way, so we can make use of this in places like ValueTracking, just like we currently do the vscale range. Some options that have been discussed are:
- Remove support for non-power-of-two vscales entirely. (This is my personal preference, but this is hard to undo if it turns out someone does need them.)
- Add an extra attribute vscale_pow2, or a data layout property.
- Make vscale_range imply power-of-two vscale, as a compromise solution (what this patch does). This would be relatively easy to turn into one of the two above at a later point.
Reviewed By: paulwalker-arm, nikic, efriedma
Differential Revision: https://reviews.llvm.org/D155193
This is a reboot of the original design and implementation by
Nicolai Haehnle <nicolai.haehnle@amd.com>:
https://reviews.llvm.org/D85603
This change also obsoletes an earlier attempt at restarting the work on
convergence tokens:
https://reviews.llvm.org/D104504
Changes relative to D85603:
1. Clean up the definition of a "convergent operation", a convergent
call and convergent function.
2. Clean up the relationship between dynamic instances, sets of threads and
convergence tokens.
3. Redistribute the formal rules into the definitions of the convergence
intrinsics.
4. Expand on the semantics of entering a function from outside LLVM,
and the environment-defined outcome of the entry intrinsic.
5. Replace the term "cycle" with "closed path". The static rules are defined
in terms of closed paths, and then a relation is established with cycles.
6. Specify that if a function contains a controlled convergent operation, then
all convergent operations in that function must be controlled.
7. Describe an optional procedure to infer tokens for uncontrolled convergent
operations.
8. Introduce controlled maximal convergence-before and controlled m-converged
property as an update to the original properties in UniformityAnalysis.
9. Additional constraint that a cycle heart can only occur in the header of a
reducible cycle (natural loop).
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D147116
Move `AttributeMask` out of `llvm/IR/Attributes.h` to a new file
`llvm/IR/AttributeMask.h`. After doing this we can remove the
`#include <bitset>` and `#include <set>` directives from `Attributes.h`.
Since there are many headers including `Attributes.h`, but not needing
the definition of `AttributeMask`, this causes unnecessary bloating of
the translation units and slows down compilation.
This commit adds in the include directive for `llvm/IR/AttributeMask.h`
to the handful of source files that need to see the definition.
This reduces the total number of preprocessing tokens across the LLVM
source files in lib from (roughly) 1,917,509,187 to 1,902,982,273 - a
reduction of ~0.76%. This should result in a small improvement in
compilation time.
Differential Revision: https://reviews.llvm.org/D153728
We only check a subset of the constraints in the verifier:
* that we only call the intrinsic from functions with a restricted set of
calling conventions
* that the 'flags' argument is an immediate
Other checks are (probably) more appropriate for codegen.
Differential Revision: https://reviews.llvm.org/D151995
Add the amdgpu_cs_chain and amdgpu_cs_chain_preserve keywords to
LLVM IR and make sure we can parse and print them. Also make sure we
perform some basic checks in the IR verifier - similar to what we check
for many of the other AMDGPU calling conventions, plus the additional
restriction that we can't have direct calls to functions with these
calling conventions.
Differential Revision: https://reviews.llvm.org/D151994
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
RFC https://discourse.llvm.org/t/rfc-dwarfdebug-fix-and-improve-handling-imported-entities-types-and-static-local-in-subprogram-and-lexical-block-scopes/68544
Fixed PR51501 (tests from D112337).
1. Reuse of DISubprogram's 'retainedNodes' to track other function-local
entities together with local variables and labels (this patch cares about
function-local import while D144006 and D144008 use the same approach for
local types and static variables). So, effectively this patch moves ownership
of tracking local import from DICompileUnit's 'imports' field to DISubprogram's
'retainedNodes' and adjusts DWARF emitter for the new layout. The old layout
is considered unsupported (DwarfDebug would assert on such debug metadata).
DICompileUnit's 'imports' field is supposed to track global imported
declarations as it does before.
This addresses various FIXMEs and simplifies the next part of the patch.
2. Postpone emission of function-local imported entities from
`DwarfDebug::endFunctionImpl()` to `DwarfDebug::endModule()`.
While in `DwarfDebug::endFunctionImpl()` we do not have all the
information about a parent subprogram or a referring subprogram
(whether a subprogram inlined or not), so we can't guarantee we emit
an imported entity correctly and place it in a proper subprogram tree.
So now, we just gather needed details about the import itself and its
parent entity (either a Subprogram or a LexicalBlock) during
processing in `DwarfDebug::endFunctionImpl()`, but all the real work is
done in `DwarfDebug::endModule()` when we have all the required
information to make proper emission.
Authored-by: Kristina Bessonova <kbessonova@accesssoftek.com>
Differential Revision: https://reviews.llvm.org/D144004
The generic implementation is umin(TC, VF * vscale).
Lowering to vsetvli for RISC-V will come in a future patch.
This patch is a pre-requisite to be able to CodeGen vectorized code from
D99750.
Reviewed By: reames, frasercrmck
Differential Revision: https://reviews.llvm.org/D149916
This patch-set aims to simplify the existing RVV segment load/store
intrinsics to use a type that represents a tuple of vectors instead.
To achieve this, first we need to relax the current limitation for an
aggregate type to be a target of load/store/alloca when the aggregate
type contains homogeneous scalable vector types. Then to adjust the
prolog of an LLVM function during lowering to clang. Finally we
re-define the RVV segment load/store intrinsics to use the tuple types.
The pull request under the RVV intrinsic specification is
riscv-non-isa/rvv-intrinsic-doc#198
---
This is the 1st patch of the patch-set. This patch is originated from
D98169.
This patch allows aggregate type (StructType) that contains homogeneous
scalable vector types to be a target of load/store/alloca. The RFC of
this patch was posted in LLVM Discourse.
https://discourse.llvm.org/t/rfc-ir-permit-load-store-alloca-for-struct-of-the-same-scalable-vector-type/69527
The main changes in this patch are:
Extend `StructLayout::StructSize` from `uint64_t` to `TypeSize` to
accommodate an expression of scalable size.
Allow `StructType:isSized` to also return true for homogeneous
scalable vector types.
Let `Type::isScalableTy` return true when `Type` is `StructType`
and contains scalable vectors
Extra description is added in the LLVM Language Reference Manual on the
relaxation of this patch.
Authored-by: Hsiangkai Wang <kai.wang@sifive.com>
Co-Authored-by: eop Chen <eop.chen@sifive.com>
Reviewed By: craig.topper, nikic
Differential Revision: https://reviews.llvm.org/D146872
A follow up patch will make the CoroSplit pass introduce such operations in the
IR level when it is safe to do so.
Depends on D149748
Differential Revision: https://reviews.llvm.org/D149778
Annotation metadata supports adding singular annotation strings to annotation block. This patch adds the ability to insert a tuple of strings into the metadata array.
The idea here is that each tuple of strings represents a piece of information that can be all related. It makes it easier to parse through related metadata information given it will be contained in one tuple.
For example in remarks any pass that implements annotation remarks can have different type of remarks and pass additional information for each.
The original behaviour of annotation remarks is preserved here and we can mix tuple annotations and single annotations for the same instruction.
Reviewed By: paquette
Differential Revision: https://reviews.llvm.org/D148328
This patch replaces the uses of PointerUnion.is function by llvm::isa,
PointerUnion.get function by llvm::cast, and PointerUnion.dyn_cast by
llvm::dyn_cast_if_present. This is according to the FIXME in
the definition of the class PointerUnion.
This patch does not remove them as they are being used in other
subprojects.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D148449
Also the patch loose the fixed vector contraint in llvm/lib/IR/Verifier.cpp.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D147380
Extend the verifier to check if the size of the matrix operands of
matrix.multiply match the sizes specified by the numeric arguments.
Reviewed By: thegameg
Differential Revision: https://reviews.llvm.org/D147466
The patch mainly does two things. The first is allowing scalable vector
ISD::STRICT_FP_EXTEND. The second is making RISC-V customized lower
strict_fpextend to riscv_strict_fpextend_vl, the strict version of
riscv_fpextend_vl.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D145548
This carries a bitmask indicating forbidden floating-point value kinds
in the argument or return value. This will enable interprocedural
-ffinite-math-only optimizations. This is primarily to cover the
no-nans and no-infinities cases, but also covers the other floating
point classes for free. Textually, this provides a number of names
corresponding to bits in FPClassTest, e.g.
call nofpclass(nan inf) @must_be_finite()
call nofpclass(snan) @cannot_be_snan()
This is more expressive than the existing nnan and ninf fast math
flags. As an added bonus, you can represent fun things like nanf:
declare nofpclass(inf zero sub norm) float @only_nans()
Compared to nnan/ninf:
- Can be applied to individual call operands as well as the return value
- Can distinguish signaling and quiet nans
- Distinguishes the sign of infinities
- Can be safely propagated since it doesn't imply anything about
other operands.
- Does not apply to FP instructions; it's not a flag
This is one step closer to being able to retire "no-nans-fp-math" and
"no-infs-fp-math". The one remaining situation where we have no way to
represent no-nans/infs is for loads (if we wanted to solve this we
could introduce !nofpclass metadata, following along with
noundef/!noundef).
This is to help simplify the GPU builtin math library
distribution. Currently the library code has explicit finite math only
checks, read from global constants the compiler driver needs to set
based on the compiler flags during linking. We end up having to
internalize the library into each translation unit in case different
linked modules have different math flags. By propagating known-not-nan
and known-not-infinity information, we can automatically prune the
edge case handling in most functions if the function is only reached
from fast math uses.