When constructing vectors from elements, use poison instead of
undef as the base value. These literals always initialize all
elements (padding the remainder with zero), so that the choice
of base value does not affect semantics.
Ever since 98c90a1 (ISel: introduce vector ISD::LRINT, ISD::LLRINT;
custom RISCV lowering) landed, there have been several discussions on
how the lrint and llrint libcalls would lower to LLVM IR via clang on
RV32 and RV64, in an effort to enable vectorization of lrint and llrint
via SLPVectorizer and LoopVectorize. This patch adds a new
math-builtins.c test to the RISC-V target to test the lowering of all
math libcalls, including lrint and llrint.
This patch relaxes the driver logic to permit combinations between
AVX512 and AVX10 options and makes sure we have a unified behavior
between options and features combination.
Here are rules we are following when handle these combinations:
1. evex512 can only be used for avx512xxx options/features. It will be
ignored if used without them;
2. avx512xxx and avx10.xxx are options in two worlds. Avoid to use them
together in any case. It will enable a common super set when they are
used together. E.g., "-mavx512f -mavx10.1-256" euqals "-mavx10.1-512".
Compiler emits warnings when user using combinations like "-mavx512f
-mavx10.1-256" in case they won't get unexpected result silently.
Function target feature attribute follows the same rule now. We have to
add "no-evex512" feature for intrinsics shared between AVX512 and AVX10.
We also add "no-evex512" for early ISAs like AVX etc., because some of
them are called by AVX512 intrinsics.
MSVC allows users to pass structures with required alignments greater
than 4 to variadic functions. It does not pass them indirectly to
correctly align them. Instead, it passes them directly with the usual 4
byte stack alignment.
This change implements the same logic in clang on the passing side. The
receiving side (va_arg) never implemented any of this indirect logic, so
it doesn't need to be updated.
This issue pre-existed, but @aaron.ballman noticed it when we started
passing structs containing aligned fields indirectly in D152752.
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
This is an alternative of D157485 and a pre-feature to support AVX10.
AVX10 Architecture Specification: https://cdrdv2.intel.com/v1/dl/getContent/784267
AVX10 Technical Paper: https://cdrdv2.intel.com/v1/dl/getContent/784343
RFC: https://discourse.llvm.org/t/rfc-design-for-avx10-feature-support/72661
Based on the feedbacks from LLVM and GCC community, we have agreed to
start from supporting `-m[no-]evex512` on existing AVX512 features.
The option `-mno-evex512` can be used with `-mavx512xxx` to build
binaries that can run on both legacy AVX512 targets and AVX10-256.
There're still arguments about what's the expected behavior when this
option as well as `-mavx512xxx` used together with `-mavx10.1-256`. We
decided to defer the support of `-mavx10.1` after we made consensus.
Or furthermore, we start from supporting AVX10.2 and not providing any
AVX10.1 options.
Reviewed By: RKSimon, skan
Differential Revision: https://reviews.llvm.org/D159250
We have a new policy in place making links to private resources
something we try to avoid in source and test files. Normally, we'd
organically switch to the new policy rather than make a sweeping change
across a project. However, Clang is in a somewhat special circumstance
currently: recently, I've had several new contributors run into rdar
links around test code which their patch was changing the behavior of.
This turns out to be a surprisingly bad experience, especially for
newer folks, for a handful of reasons: not understanding what the link
is and feeling intimidated by it, wondering whether their changes are
actually breaking something important to a downstream in some way,
having to hunt down strangers not involved with the patch to impose on
them for help, accidental pressure from asking for potentially private
IP to be made public, etc. Because folks run into these links entirely
by chance (through fixing bugs or working on new features), there's not
really a set of problematic links to focus on -- all of the links have
basically the same potential for causing these problems. As a result,
this is an omnibus patch to remove all such links.
This was not a mechanical change; it was done by manually searching for
rdar, radar, radr, and other variants to find all the various
problematic links. From there, I tried to retain or reword the
surrounding comments so that we would lose as little context as
possible. However, because most links were just a plain link with no
supporting context, the majority of the changes are simple removals.
Differential Review: https://reviews.llvm.org/D158071
Builtin floating-point number classification functions:
- __builtin_isnan,
- __builtin_isinf,
- __builtin_finite, and
- __builtin_isnormal
now are implemented using `llvm.is_fpclass`.
This change makes the target callback `TargetCodeGenInfo::testFPKind`
unneeded. It is preserved in this change and should be removed later.
Differential Revision: https://reviews.llvm.org/D112932
Similar to the existing _mm_load1_pd/_mm_loaddup_pd and broadcast vector loads, these intrinsic should ensure the loads are unaligned and not assume type alignment
Fixes#62325
It's not exactly clear what the meaning of TypeInfo::AlignRequirement
is, so go directly to the ASTRecordLayout for records and check the
required alignment there. Compare that number with the stack alignment
value of 4.
This fixes cases when the alignment attribute does not appear directly
on the record [1], or when the attribute on the record is underaligned
[2].
[1]: `struct Foo { int __declspec(align(16)) x; };`
[2]: `struct __declspec(align(1)) Bar { int x; };`
Fixes https://llvm.org/pr63257
Differential Revision: https://reviews.llvm.org/D152752
This is the consolidation of D151644 and D151943 moved from
InstCombine to FunctionAttrs. This is based on discussion in the above
patches as well as D152081 (Attributor). This patch was written in a
way so it can have an immediate impact in currently active passes
(FunctionAttrs), but should be easy to port elsewhere (Attributor or
Inliner) if that makes more sense later on.
Some function attributes imply the attribute for all/some instructions
in the function. These attributes can be safely propagated to
callsites within the function that are missing the attribute. This can
be useful when 1) analyzing individual instructions in a function
and 2) if the original caller is later inlined, as if the attributes are
not propagated, they will be lost.
This patch implements propagation in a new class/file
`InferCallsiteAttrs` which can hypothetically be included elsewhere.
At the moment this patch infers the following:
Function Attributes:
- mustprogress
- nofree
- willreturn
- All memory attributes (readnone, readonly, writeonly, argmem,
etc...)
- The memory attributes are only propagated IFF the set of
pointers available to the callsite is the same as the set
available outside the caller (i.e no local memory arguments
from alloca or local malloc like functions).
Argument Attributes:
- noundef
- nonnull
- nofree
- readnone
- readonly
- writeonly
- nocapture
- nocapture is only propagated IFF the set of pointers
available to the callsite is the same as the set available
outside the caller and its guranteed that between the
callsite and function return, the state of any capture
pointers will not change (so the nocaptured gurantee of the
caller has been met by the instruction preceding the
callsite and will not changed).
Argument are only propagated to callsite arguments that are also function
arguments, but not derived values.
Return Attributes:
- noundef
- nonnull
Return attributes are only propagated if the callsite's return value
is used as the caller's return and execution is guranteed to pass from
callsite to return.
The compile time hit of this for -O3 and -O3+thinLTO is ~[.02, .37]%
regression. Proper LTO, however, has more significant regressions (up
to 3.92%):
https://llvm-compile-time-tracker.com/compare.php?from=94407e1bba9807193afde61c56b6125c0fc0b1d1&to=79feb6e78b818e33ec69abdc58c5f713d691554f&stat=instructions:u
Differential Revision: https://reviews.llvm.org/D152226
Pursuant to discussions at
https://discourse.llvm.org/t/rfc-c-23-p1467r9-extended-floating-point-types-and-standard-names/70033/22,
this commit enhances the handling of the __bf16 type in Clang.
- Firstly, it upgrades __bf16 from a storage-only type to an arithmetic
type.
- Secondly, it changes the mangling of __bf16 to DF16b on all
architectures except ARM. This change has been made in
accordance with the finalization of the mangling for the
std::bfloat16_t type, as discussed at
https://github.com/itanium-cxx-abi/cxx-abi/pull/147.
- Finally, this commit extends the existing excess precision support to
the __bf16 type. This applies to hardware architectures that do not
natively support bfloat16 arithmetic.
Appropriate tests have been added to verify the effects of these
changes and ensure no regressions in other areas of the compiler.
Reviewed By: rjmccall, pengfei, zahiraam
Differential Revision: https://reviews.llvm.org/D150913
The following intrinsics are currently implemented using a shufflevector with
an undefined mask, this is however incorrect according to intel's semantics for
undefined value which expect an unknown but consistent value.
With __builtin_nondeterministic_value we can now match intel's undefined value.
Differential Revision: https://reviews.llvm.org/D143287
Setting __FLT_EVAL_METHOD__ to -1 with fast-math will set
__GLIBC_FLT_EVAL_METHOD to 2 and long double ends up being used for
float_t and double_t. This creates some ABI breakage with various C libraries.
See details here: https://github.com/llvm/llvm-project/issues/60781
This reverts commit bbf0d1932a.
Ignoring freeze(undef) if it has multiple uses in LowerAVXCONCAT_VECTORS
causes the custom INSERT_SUBVECTOR for vector widening to be ignored.
Differential Revision: https://reviews.llvm.org/D144903
Ignoring freeze(undef) if it has multiple uses in LowerAVXCONCAT_VECTORS
causes the custom INSERT_SUBVECTOR for vector widening to be ignored.
Differential Revision: https://reviews.llvm.org/D144903
Ignoring freeze(undef) if it has multiple uses in LowerAVXCONCAT_VECTORS
causes the custom INSERT_SUBVECTOR for vector widening to be ignored.
Differential Revision: https://reviews.llvm.org/D14490
The macro _mm_test_all_ones(V) was defined as _mm_testc_si128((V), _mm_cmpeq_epi32((V), (V))) - which could cause side effects depending on the source of the V value.
The _mm_cmpeq_epi32((V), (V)) trick was just to materialize an all-ones value, which can be more safely generated with _mm_set1_epi32(-1) .
Fixes#60006
Differential Revision: https://reviews.llvm.org/D142477
"__cmpccxadd_epi*" -> "_cmpccxadd_epi*"
This is to align with other intrinsics to follow single leading "_" style. Gcc
and intrinsic guide website will also apply this change.
Reviewed By: LuoYuanke, skan
Differential Revision: https://reviews.llvm.org/D140281