Clang uses a long-time special handling of the case where 3 element
vector loads and stores are performed as 4 element, and then a
shufflevector is used to extract the used elements. Odd sized vector
codegen should now work reasonably well.
This patch removes the compiler argument `-fpreserve-vec3-type` and adds
a target hook to determine if the special handling of vector type is
needed.
---------
Co-authored-by: Matt Arsenault <Matthew.Arsenault@amd.com>
The `Checked` parameter of `CodeGenFunction::EmitCheck` is of type
`ArrayRef<std::pair<llvm::Value *, SanitizerMask>>`, which is overly
generalized: SanitizerMask can denote that zero or more sanitizers are
enabled, but `EmitCheck` requires that exactly one sanitizer is
specified in the SanitizerMask (e.g.,
`SanitizeTrap.has(Checked[i].second)` enforces that).
This patch replaces SanitizerMask with SanitizerOrdinal in the `Checked`
parameter of `EmitCheck` and code that transitively relies on it. This
should not affect the behavior of UBSan, but it has the advantages that:
- the code is clearer: it avoids ambiguity in EmitCheck about what to do
if multiple bits are set
- specifying the wrong number of sanitizers in `Checked[i].second` will
be detected as a compile-time error, rather than a runtime assertion
failure
Suggested by Vitaly in https://github.com/llvm/llvm-project/pull/122392
as an alternative to adding an explicit runtime assertion that the
SanitizerMask contains exactly one sanitizer.
`CounterPair` can hold `<uint32_t, uint32_t>` instead of current
`unsigned`, to hold also the counter number of SkipPath. For now, this
change provides the skeleton and only `CounterPair::Executed` is used.
Each counter number can have `None` to suppress emitting counter
increment. 2nd element `Skipped` is initialized as `None` by default,
since most `Stmt*` don't have a pair of counters.
This change also provides stubs for the verifier. I'll provide the impl
of verifier for `+Asserts` later.
`markStmtAsUsed(bool, Stmt*)` may be used to inform that other side
counter may not emitted.
`markStmtMaybeUsed(S)` may be used for the `Stmt` and its inner will be
excluded for emission in the case of skipping by constant folding. I put
it into places where I found.
`verifyCounterMap()` will check the coverage map and the counter map,
and can be used to report inconsistency.
These verifier methods shall be eliminated in `-Asserts`.
https://discourse.llvm.org/t/rfc-integrating-singlebytecoverage-with-branch-coverage/82492
-fno-sanitize-merge (introduced in
https://github.com/llvm/llvm-project/pull/120511) duplicates the
functionality of -ubsan-unique-traps but also allows individual checks
to be specified e.g.,
* "-fno-sanitize-merge" without arguments is equivalent to
-ubsan-unique-traps
* "-fno-sanitize-merge=bool,enum" will apply it only to those two checks
Additionally, the naming is more consistent with the rest of the
-fsanitize- family.
This patch therefore removes -ubsan-unique-traps. This breaks backwards
compatibility; we hope that this is acceptable since '-mllvm
-ubsan-unique-traps' was an experimental flag.
This patch also adds negative test examples to bounds-checking.c, and
strengthens the NOOPTARRAY assertion to prevent spurious matches.
"-bounds-checking-unique-traps" is unaffected by this patch.
This reverts commit 2691b96415. This
reapply fixes the buildbot breakage of the original patch, by updating
clang/test/CodeGen/ubsan-trap-debugloc.c to specify -fsanitize-merge
(the default, which is merge, is applied by the driver but not
clang_cc1).
This reapply also expands clang/test/CodeGen/ubsan-trap-merge.c.
----
Original commit message:
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
'-mllvm -ubsan-unique-traps'
(https://github.com/llvm/llvm-project/pull/65972) applies to all UBSan
checks. This patch introduces -fsanitize-merge (defaults to on,
maintaining the status quo behavior) and -fno-sanitize-merge (equivalent
to '-mllvm -ubsan-unique-traps'), with the option to selectively
applying non-merged handlers to a subset of UBSan checks (e.g.,
-fno-sanitize-merge=bool,enum).
N.B. we do not use "trap" in the argument name since
https://github.com/llvm/llvm-project/pull/119302 has generalized
-ubsan-unique-traps to work for non-trap modes (min-rt and regular rt).
This patch does not remove the -ubsan-unique-traps flag; that will
override -f(no-)sanitize-merge.
UBSan handler calls are sometimes merged by the backend, which complicates debugging. Merging is currently disabled for UBSan traps if -ubsan-unique-traps is specified or if optimization is disabled. This patch applies the same policy to non-trap handler calls.
N.B. "-ubsan-unique-traps" becomes somewhat of a misnomer since it will now apply to non-trap handler calls as well as traps; nonetheless, we keep the naming for backwards compatibility.
Get inout/out parameters working for HLSL Arrays.
Utilizes the fix from #109323, and corrects the assignment behavior
slightly to allow for Non-LValues on the RHS.
Closes#106917
---------
Co-authored-by: Chris B <beanz@abolishcrlf.org>
This commit addresses several Static Analyzer issues related to
potential null dereference by replacing dyn_cast<> with cast<> and
getAs<> with castAs<> in various parts of the codes.
The cast function asserts that the cast is valid, ensuring that the
pointer is not null and preventing null dereference errors.
The changes are made in the following files:
CGBuiltin.cpp: Ensure vector types have exactly 3 elements.
CGExpr.cpp: Ensure member declarations are field declarations.
AnalysisBasedWarnings.cpp: Ensure operations are member expressions.
SemaExprMember.cpp: Ensure base types are extended vector types.
These changes ensure that the types are correctly cast and prevent
potential null dereference issues, improving the robustness and safety
of the code.
The __builtin_counted_by_ref builtin is used on a flexible array
pointer and returns a pointer to the "counted_by" attribute's COUNT
argument, which is a field in the same non-anonymous struct as the
flexible array member. This is useful for automatically setting the
count field without needing the programmer's intervention. Otherwise
it's possible to get this anti-pattern:
ptr = alloc(<ty>, ..., COUNT);
ptr->FAM[9] = 42; /* <<< Sanitizer will complain */
ptr->count = COUNT;
To prevent this anti-pattern, the user can create an allocator that
automatically performs the assignment:
#define alloc(TY, FAM, COUNT) ({ \
TY __p = alloc(get_size(TY, COUNT)); \
if (__builtin_counted_by_ref(__p->FAM)) \
*__builtin_counted_by_ref(__p->FAM) = COUNT; \
__p; \
})
The builtin's behavior is heavily dependent upon the "counted_by"
attribute existing. It's main utility is during allocation to avoid
the above anti-pattern. If the flexible array member doesn't have that
attribute, the builtin becomes a no-op. Therefore, if the flexible
array member has a "count" field not referenced by "counted_by", it
must be set explicitly after the allocation as this builtin will
return a "nullptr" and the assignment will most likely be elided.
---------
Co-authored-by: Bill Wendling <isanbard@gmail.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Done by calling clang::runWithSufficientStackSpace().
Added CodeGenModule::runWithSufficientStackSpace() method similar to the
one in Sema to provide a single warning when this triggers
Fixes: #111699
Fix counted_by attribute for cases where the flexible array member is
accessed through struct pointer inside another struct:
```
struct variable {
int a;
int b;
int length;
short array[] __attribute__((counted_by(length)));
};
struct bucket {
int a;
struct variable *growable;
int b;
};
```
__builtin_dynamic_object_size(p->growable->array, 0);
This commit makes sure that if the StructBase is both a MemberExpr and a
pointer, it is treated as a pointer. Otherwise clang will generate to
code to access the address of p->growable intead of loading the value of
p->growable->length.
Fixes#110385
`Type::getPointerTo()` is to be removed soon.
This also removes the whole code section for "C99 6.5.2.2p6"; It's
essentially a no-op since llvm uses opaque pointers.
On some targets, an FP libcall with argument types such as long double
will be lowered to pass arguments indirectly via pointers. When this is
the case we should not mark the libcall with "int" TBAA as it may lead
to incorrect optimizations.
Currently, this can be seen for long doubles on x86_64-w64-mingw32. The
`load x86_fp80` after the call is (incorrectly) marked with "int" TBAA
(overwriting the previous metadata for "long double").
Nothing seems to break due to this currently as the metadata is being
incorrectly placed on the load and not the call. But if the metadata
is moved to the call (which this patch ensures), LLVM will optimize out
the setup for the arguments.
This patch is the frontend implementation of the coroutine elide
improvement project detailed in this discourse post:
https://discourse.llvm.org/t/language-extension-for-better-more-deterministic-halo-for-c-coroutines/80044
This patch proposes a C++ struct/class attribute
`[[clang::coro_await_elidable]]`. This notion of await elidable task
gives developers and library authors a certainty that coroutine heap
elision happens in a predictable way.
Originally, after we lower a coroutine to LLVM IR, CoroElide is
responsible for analysis of whether an elision can happen. Take this as
an example:
```
Task foo();
Task bar() {
co_await foo();
}
```
For CoroElide to happen, the ramp function of `foo` must be inlined into
`bar`. This inlining happens after `foo` has been split but `bar` is
usually still a presplit coroutine. If `foo` is indeed a coroutine, the
inlined `coro.id` intrinsics of `foo` is visible within `bar`. CoroElide
then runs an analysis to figure out whether the SSA value of
`coro.begin()` of `foo` gets destroyed before `bar` terminates.
`Task` types are rarely simple enough for the destroy logic of the task
to reference the SSA value from `coro.begin()` directly. Hence, the pass
is very ineffective for even the most trivial C++ Task types. Improving
CoroElide by implementing more powerful analyses is possible, however it
doesn't give us the predictability when we expect elision to happen.
The approach we want to take with this language extension generally
originates from the philosophy that library implementations of `Task`
types has the control over the structured concurrency guarantees we
demand for elision to happen. That is, the lifetime for the callee's
frame is shorter to that of the caller.
The ``[[clang::coro_await_elidable]]`` is a class attribute which can be
applied to a coroutine return type.
When a coroutine function that returns such a type calls another
coroutine function, the compiler performs heap allocation elision when
the following conditions are all met:
- callee coroutine function returns a type that is annotated with
``[[clang::coro_await_elidable]]``.
- In caller coroutine, the return value of the callee is a prvalue that
is immediately `co_await`ed.
From the C++ perspective, it makes sense because we can ensure the
lifetime of elided callee cannot exceed that of the caller if we can
guarantee that the caller coroutine is never destroyed earlier than the
callee coroutine. This is not generally true for any C++ programs.
However, the library that implements `Task` types and executors may
provide this guarantee to the compiler, providing the user with
certainty that HALO will work on their programs.
After this patch, when compiling coroutines that return a type with such
attribute, the frontend checks that the type of the operand of
`co_await` expressions (not `operator co_await`). If it's also
attributed with `[[clang::coro_await_elidable]]`, the FE emits metadata
on the call or invoke instruction as a hint for a later middle end pass
to elide the elision.
The original patch version is
https://github.com/llvm/llvm-project/pull/94693 and as suggested, the
patch is split into frontend and middle end solutions into stacked PRs.
The middle end CoroSplit patch can be found at
https://github.com/llvm/llvm-project/pull/99283
The middle end transformation that performs the elide can be found at
https://github.com/llvm/llvm-project/pull/99285
HLSL output parameters are denoted with the `inout` and `out` keywords
in the function declaration. When an argument to an output parameter is
constructed a temporary value is constructed for the argument.
For `inout` pamameters the argument is initialized via copy-initialization
from the argument lvalue expression to the parameter type. For `out`
parameters the argument is not initialized before the call.
In both cases on return of the function the temporary value is written
back to the argument lvalue expression through an implicit assignment
binary operator with casting as required.
This change introduces a new HLSLOutArgExpr ast node which represents
the output argument behavior. The OutArgExpr has three defined children:
- An OpaqueValueExpr of the argument lvalue expression.
- An OpaqueValueExpr of the copy-initialized parameter.
- A BinaryOpExpr assigning the first with the value of the second.
Fixes#87526
---------
Co-authored-by: Damyan Pepper <damyanp@microsoft.com>
Co-authored-by: John McCall <rjmccall@gmail.com>
As per [1] the indices for a matrix element access operator shall have
integral or unscoped enumeration types and be non-negative. At the
moment, the index expression is converted to SizeType irrespective of
the signedness of the index expression. This causes implicit sign
conversion warnings if any of the indices is signed.
As per the spec, using signed types as indices is allowed and should not
cause any warnings. If the index expression is signed, extend to
SignedSizeType to avoid the warning.
[1]
https://clang.llvm.org/docs/MatrixTypes.html#matrix-type-element-access-operator
PR: https://github.com/llvm/llvm-project/pull/103044
Without this patch compiler-rt ubsan library has a bug displaying
incorrect values for variables of the _BitInt (previously called
_ExtInt) type. This patch affects affects both: generation of metadata
inside code generator and runtime part. The runtime part provided only
for i386 and x86_64 runtimes. Other runtimes should be updated to take
full benefit of this patch.
The patch is constructed the way to be backward compatible and int and
float type runtime diagnostics should be unaffected for not yet updated
runtimes.
This patch fixes issue
https://github.com/llvm/llvm-project/issues/64100.
Co-authored-by: Eänolituri Lómitaurë <vladislav.aranov@ericsson.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
Co-authored-by: Paul Kirth <paulkirth@google.com>
As part of the LLVM effort to eliminate debug-info intrinsics, we're
moving to a world where only iterators should be used to insert
instructions. This isn't a problem in clang when instructions get
generated before any debug-info is inserted, however we're planning on
deprecating and removing the instruction-pointer insertion routines.
Scatter some calls to getIterator in a few places, remove a
deref-then-addrof on another iterator, and add an overload for the
createLoadInstBefore utility. Some callers passes a null insertion
point, which we need to handle explicitly now.
1. It fixes the problem that llvm.trap() not getting the nomerge
attribute.
2. It sets nomerge flag for the node if the instruction has nomerge
arrtibute.
This is a copy of https://reviews.llvm.org/D146164. This only attempts
to fix `nomerge` for `__builtin_trap()`, `__debugbreak()`,
`__builtin_verbose_trap()`, not working for non-trap builtins.
Fixes#53011
As with other loops, we need only look at a RecordDecl's FieldDecls.
Convert to using them. In the meantime, we can improve the generation of
the 'counted_by' FieldDecl's GEP by creating one GEP instead of a series
of GEPs.
A FieldDecl that's an empty struct may not show up in CGRecordLayout. Go
ahead and ignore such a field as it shouldn't make a difference to these
calculations.
Fixes: 1f6f97e2b6 ("[Clang] Loop over FieldDecls instead of all Decls (#99574)")
Co-authored-by: Eli Friedman <efriedma@quicinc.com>
Only FieldDecls are important when determining GEP indices. A struct
defined within another struct has the same semantics as if it were
defined outside of the struct. So there's no need to look into
RecordDecls that aren't a field.
See commit 5bcf31ebfa ("[Clang] Loop over FieldDecls instead of all
Decls (#89453)")
Fixes 2039.
Re-signing occurs when function type discrimination is enabled and a
function pointer is converted to another function pointer type that
requires signing using a different discriminator. A function pointer is
re-signed using discriminator zero when it's converted to a pointer to a
non-function type such as `void*`.
---------
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Co-authored-by: John McCall <rjmccall@apple.com>
This is a follow-up from the conversation starting at
https://github.com/llvm/llvm-project/pull/93809#issuecomment-2173729801
The root problem that motivated the change are external AST sources that
compute `ASTRecordLayout`s themselves instead of letting Clang compute
them from the AST. One such example is LLDB using DWARF to get the
definitive offsets and sizes of C++ structures. Such layouts should be
considered correct (modulo buggy DWARF), but various assertions and
lowering logic around the `CGRecordLayoutBuilder` relies on the AST
having `[[no_unique_address]]` attached to them. This is a
layout-altering attribute which is not encoded in DWARF. This causes us
LLDB to trip over the various LLVM<->Clang layout consistency checks.
There has been precedent for avoiding such layout-altering attributes
from affecting lowering with externally-provided layouts (e.g., packed
structs).
This patch proposes to replace the `isZeroSize` checks in
`CGRecordLayoutBuilder` (which roughly means "empty field with
[[no_unique_address]]") with checks for
`CodeGen::isEmptyField`/`CodeGen::isEmptyRecord`.
**Details**
The main strategy here was to change the `isZeroSize` check in
`CGRecordLowering::accumulateFields` and
`CGRecordLowering::accumulateBases` to use the `isEmptyXXX` APIs
instead, preventing empty fields from being added to the `Members` and
`Bases` structures. The rest of the changes fall out from here, to
prevent lookups into these structures (for field numbers or base
indices) from failing.
Added `isEmptyRecordForLayout` and `isEmptyFieldForLayout` (open to
better naming suggestions). The main difference to the existing
`isEmptyRecord`/`isEmptyField` APIs, is that the `isEmptyXXXForLayout`
counterparts don't have special treatment for `unnamed bitfields`/arrays
and also treat fields of empty types as if they had
`[[no_unique_address]]` (i.e., just like the `AsIfNoUniqueAddr` in
`isEmptyField` does).
There are two problems with _BitInt prior to this patch:
1. For at least some values of N, we cannot use LLVM's iN for the type
of struct elements, array elements, allocas, global variables, and so
on, because the LLVM layout for that type does not match the high-level
layout of _BitInt(N).
Example: Currently for i128:128 targets correct implementation is
possible either for __int128 or for _BitInt(129+) with lowering to iN,
but not both, since we have now correct implementation of __int128 in
place after a21abc7.
When this happens, opaque [M x i8] types used, where M =
sizeof(_BitInt(N)).
2. LLVM doesn't guarantee any particular extension behavior for integer
types that aren't a multiple of 8. For this reason, all _BitInt types
are now have in-memory representation that is a whole number of bytes.
I.e. for example _BitInt(17) now will have memory layout type i32.
This patch also introduces concept of load/store type and adds an API to
CodeGenTypes that returns the IR type that should be used for load and
store operations. This is particularly useful for the case when a
_BitInt ends up having array of bytes as memory layout type. For
_BitInt(N), let M = sizeof(_BitInt(N)), and let BITS = M * 8. Loads and
stores of iM would both (1) produce far better code from the backends
and (2) be far more optimizable by IR passes than loads and stores of [M
x i8].
Fixes https://github.com/llvm/llvm-project/issues/85139
Fixes https://github.com/llvm/llvm-project/issues/83419
---------
Co-authored-by: John McCall <rjmccall@gmail.com>
When BPF object files are linked with bpftool, every symbol must be
accompanied by BTF info. Ensure that extern functions referenced by
global variable initializers are included in BTF.
The primary motivation is "static" initialization of PROG maps:
```c
extern int elsewhere(struct xdp_md *);
struct {
__uint(type, BPF_MAP_TYPE_PROG_ARRAY);
__uint(max_entries, 1);
__type(key, int);
__type(value, int);
__array(values, int (struct xdp_md *));
} prog_map SEC(".maps") = { .values = { elsewhere } };
```
BPF backend needs debug info to produce BTF. Debug info is not
normally generated for external variables and functions. Previously, it
was solved differently for variables (collecting variable declarations
in ExternalDeclarations vector) and functions (logic invoked during
codegen in CGExpr.cpp).
This patch generalises ExternalDefclarations to include both function
and variable declarations. This change ensures that function references
are not missed no matter the context. Previously external functions
referenced in constant expressions lacked debug info.
Virtual function pointer entries in v-tables are signed with address
discrimination in addition to declaration-based discrimination, where an
integer discriminator the string hash (see
`ptrauth_string_discriminator`) of the mangled name of the overridden
method. This notably provides diversity based on the full signature of
the overridden method, including the method name and parameter types.
This patch introduces ItaniumVTableContext logic to find the original
declaration of the overridden method.
On AArch64, these pointers are signed using the `IA` key (the
process-independent code key.)
V-table pointers can be signed with either no discrimination, or a
similar scheme using address and decl-based discrimination. In this
case, the integer discriminator is the string hash of the mangled
v-table identifier of the class that originally introduced the vtable
pointer.
On AArch64, these pointers are signed using the `DA` key (the
process-independent data key.)
Not using discrimination allows attackers to simply copy valid v-table
pointers from one object to another. However, using a uniform
discriminator of 0 does have positive performance and code-size
implications on AArch64, and diversity for the most important v-table
access pattern (virtual dispatch) is already better assured by the
signing schemas used on the virtual functions. It is also known that
some code in practice copies objects containing v-tables with `memcpy`,
and while this is not permitted formally, it is something that may be
invasive to eliminate.
This is controlled by:
```
-fptrauth-vtable-pointer-type-discrimination
-fptrauth-vtable-pointer-address-discrimination
```
In addition, this provides fine-grained controls in the
ptrauth_vtable_pointer attribute, which allows overriding the default
ptrauth schema for vtable pointers on a given class hierarchy, e.g.:
```
[[clang::ptrauth_vtable_pointer(no_authentication, no_address_discrimination,
no_extra_discrimination)]]
[[clang::ptrauth_vtable_pointer(default_key, default_address_discrimination,
custom_discrimination, 0xf00d)]]
```
The override is then mangled as a parametrized vendor extension:
```
"__vtptrauth" I
<key>
<addressDiscriminated>
<extraDiscriminator>
E
```
To support this attribute, this patch adds a small extension to the
attribute-emitter tablegen backend.
Note that there are known areas where signing is either missing
altogether or can be strengthened. Some will be addressed in later
changes (e.g., member function pointers, some RTTI).
`dynamic_cast` in particular is handled by emitting an artificial
v-table pointer load (in a way that always authenticates it) before the
runtime call itself, as the runtime doesn't have enough information
today to properly authenticate it. Instead, the runtime is currently
expected to strip the v-table pointer.
---------
Co-authored-by: John McCall <rjmccall@apple.com>
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>
Without this patch compiler-rt ubsan library has a bug displaying
incorrect values for variables of the _BitInt (previously called
_ExtInt) type. This patch affects affects both: generation of metadata
inside code generator and runtime part. The runtime part provided only
for i386 and x86_64 runtimes. Other runtimes should be updated to take
full benefit of this patch.
The patch is constructed the way to be backward compatible and int and
float type runtime diagnostics should be unaffected for not yet updated
runtimes.
This patch fixes issue:
https://github.com/llvm/llvm-project/issues/64100.
Co-authored-by: Vladislav Aranov <vladislav.aranov@ericsson.com>
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
This patch simplifies instruction creation by replacing all overloads of
instruction constructors/Create methods that are identical other than
the Instruction *InsertBefore/BasicBlock *InsertAtEnd/BasicBlock::iterator
InsertBefore argument with a single version that takes an InsertPosition
argument. The InsertPosition class can be implicitly constructed from
any of the above, internally converting them to the appropriate
BasicBlock::iterator value which can then be used to insert the
instruction (or to not insert it if an invalid iterator is passed).
The upshot of this is that code will be deduplicated, and all callsites
will switch to calling the new unified version without any changes
needed to make the compiler happy. There is at least one exception to
this; the construction of InsertPosition is a user-defined conversion,
so any caller that was already relying on a different user-defined
conversion won't work. In all of LLVM and Clang this happens exactly
once: at clang/lib/CodeGen/CGExpr.cpp:123 we try to construct an alloca
with an AssertingVH<Instruction> argument, which must now be cast to an
Instruction* by using `&*`. If this is more common elsewhere, it could
be fixed by adding an appropriate constructor to InsertPosition.
llvm::hash_value is not guaranteed to be deterministic. Use the
deterministic xxh3_64bits. A strong bit mixer isn't necessary. Use a
simpler one that works well with pointers.