Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.
In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.
With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
The previous API relied on pointer equality of inputs and outputs to
signal whether a change occured. This was too subtle and led to bugs in
practice. It was also very limiting: the override could not return an equivalent (but
not identical) value.
The constructor `Derived(int)` in the newly added test
`ClassDerivedFromOptionalValueConstructor` is not a template, and this
used to
cause an assertion failure in `valueOrConversionHasValue()` because
`F.getTemplateSpecializationArgs()` returns null.
(This is modeled after the `MaybeAlign(Align Value)` constructor, which
similarly causes an assertion failure in the analysis when assigning an
`Align`
to a `MaybeAlign`.)
To fix this, we can simply look at the type of the destination type
which we're
constructing or assigning to (instead of the function template
argument), and
this not only fixes this specific case but actually simplifies the
implementation.
I've added some additional tests for the case of assigning to a nested
optional
because we didn't have coverage for these and I wanted to make sure I
didn't
break anything.
This is currently only used in one place, but I'm working on a patch
that will
use this from a second place. And I think this already improves the
readability
of the one place this is used so far.
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
Reported by Static Analyzer Tool:
In clang::dataflow::Environment::initialize(): Using the auto keyword
without an & causes the copy of an object of type LambdaCapture
When debugging CSA issues, sometimes it would be useful to have a
dedicated note for the analysis entry point, aka. the function name you
would need to pass as "-analyze-function=XYZ" to reproduce a specific
issue.
One way we use (or will use) this downstream is to provide tooling on
top of creduce to enhance to supercharge productivity by automatically
reduce cases on crashes for example.
This will be added only if the "-analyzer-note-analysis-entry-points" is
set or the "analyzer-display-progress" is on.
This additional entry point marker will be the first "note" if enabled,
with the following message: "[debug] analyzing from XYZ". They are
prefixed by "[debug]" to remind the CSA developer that this is only
meant to be visible for them, for debugging purposes.
CPP-5012
This patch vastly simplifies the code handling terminators, without
changing any
behavior. Additionally, the simplification unblocks our ability to
address a
(simple) FIXME in the code to invoke `transferBranch`, even when builtin
options
are disabled.
`llvm::MaybeAlign` does this, for example.
It's not an option to simply ignore these derived classes because they
get cast
back to the optional classes (for example, simply when calling the
optional
member functions), and our transfer functions will then run on those
optional
classes and therefore require them to be properly initialized.
This is a relatively rare case, but
- It's still nice to get this right,
- We can remove the special case for this in
`VisitCXXOperatorCallExpr()` (that
simply bails out), and
- With this in place, I can avoid having to add a similar special case
in an
upcoming patch.
This expresses better what the class actually does, and it reduces the
number of
`Context`s that we have in the codebase.
A deprecated alias `ControlFlowContext` is available from the old
header.
In https://github.com/llvm/llvm-project/pull/72985, I made a change to
discard
expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each
basic
block. I did so with the claim that "we never need to access entries
from these
maps outside of the current basic block", noting that there are
exceptions to
this claim when control flow happens inside a full-expression (the
operands of
`&&`, `||`, and the conditional operator live in different basic blocks
than the
operator itself) but that we already have a mechanism for retrieving the
values
of these operands from the environment for the block they are computed
in.
It turns out, however, that the operands of these operators aren't the
only
expressions whose values can be accessed from a different basic block;
when
control flow happens within a full-expression, that control flow can be
"interposed" between an expression and its parent. Here is an example:
```cxx
void f(int*, int);
bool cond();
void target() {
int i = 0;
f(&i, cond() ? 1 : 0);
}
```
([godbolt](https://godbolt.org/z/hrbj1Mj3o))
In the CFG[^1] , note how the expression for `&i` is computed in block
B4,
but the parent of this expression (the `CallExpr`) is located in block
B1.
The the argument expression `&i` and the `CallExpr` are essentially
"torn apart"
into different basic blocks by the conditional operator in the second
argument.
In other words, the edge between the `CallExpr` and its argument `&i`
straddles
the boundary between two blocks.
I used to think that this scenario -- where an edge between an
expression and
one of its children straddles a block boundary -- could only happen
between the
expression that triggers the control flow (`&&`, `||`, or the
conditional
operator) and its children, but the example above shows that other
expressions
can be affected as well; the control flow is still triggered by `&&`,
`||` or
the conditional operator, but the expressions affected lie outside these
operators.
Discarding expression state too soon is harmful. For example, an
analysis that
checks the arguments of the `CallExpr` above would not be able to
retrieve a
value for the `&i` argument.
This patch therefore ensures that we don't discard expression state
before the
end of a full-expression. In other cases -- when the evaluation of a
full-expression is complete -- we still want to discard expression state
for the
reasons explained in https://github.com/llvm/llvm-project/pull/72985
(avoid
performing joins on boolean values that are no longer needed, which
unnecessarily extends the flow condition; improve debuggability by
removing
clutter from the expression state).
The impact on performance from this change is about a 1% slowdown in the
Crubit nullability check benchmarks:
```
name old cpu/op new cpu/op delta
BM_PointerAnalysisCopyPointer 71.9µs ± 1% 71.9µs ± 2% ~ (p=0.987 n=15+20)
BM_PointerAnalysisIntLoop 190µs ± 1% 192µs ± 2% +1.06% (p=0.000 n=14+16)
BM_PointerAnalysisPointerLoop 325µs ± 5% 324µs ± 4% ~ (p=0.496 n=18+20)
BM_PointerAnalysisBranch 193µs ± 0% 192µs ± 4% ~ (p=0.488 n=14+18)
BM_PointerAnalysisLoopAndBranch 521µs ± 1% 525µs ± 3% +0.94% (p=0.017 n=18+19)
BM_PointerAnalysisTwoLoops 337µs ± 1% 341µs ± 3% +1.19% (p=0.004 n=17+19)
BM_PointerAnalysisJoinFilePath 1.62ms ± 2% 1.64ms ± 3% +0.92% (p=0.021 n=20+20)
BM_PointerAnalysisCallInLoop 1.14ms ± 1% 1.15ms ± 4% ~ (p=0.135 n=16+18)
```
[^1]:
```
[B5 (ENTRY)]
Succs (1): B4
[B1]
1: [B4.9] ? [B2.1] : [B3.1]
2: [B4.4]([B4.6], [B1.1])
Preds (2): B2 B3
Succs (1): B0
[B2]
1: 1
Preds (1): B4
Succs (1): B1
[B3]
1: 0
Preds (1): B4
Succs (1): B1
[B4]
1: 0
2: int i = 0;
3: f
4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int))
5: i
6: &[B4.5]
7: cond
8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void))
9: [B4.8]()
T: [B4.9] ? ... : ...
Preds (1): B5
Succs (2): B2 B3
[B0 (EXIT)]
Preds (1): B1
```
Clang returns an error when compiling this file with c++20
```
error: ISO C++20 does not permit initialization of char array with UTF-8 string literal
```
It seems like c++20 treats u8strings differently than strings (probably
needs char8_t).
Make this a string to fix the error.
This fixes a crash introduced by
https://github.com/llvm/llvm-project/pull/82348
but also adds additional handling to make sure that we treat empty
initializer
lists for both unions and structs/classes correctly (see tests added in
this
patch).
I called this function when investigating the issue
(https://github.com/llvm/llvm-project/issues/78131), and I was surprised
to see the definition is commented out.
I think it makes sense to provide the definition even though the
implementation is not stable.
Reverts llvm/llvm-project#82348, which caused crashes when analyzing
empty InitListExprs for unions, e.g.
```cc
union U {
double double_value;
int int_value;
};
void target() {
U value;
value = {};
}
```
Co-authored-by: Samira Bazuzi <bazuzi@users.noreply.github.com>
See the comments added to the code for details on the inaccuracies that
have
now been fixed.
The patch adds tests that fail with the old implementation.
Without the fix gcc warned like
../../clang/lib/Analysis/UnsafeBufferUsage.cpp:2203:26: warning: unused variable 'CArrTy' [-Wunused-variable]
2203 | } else if (const auto *CArrTy = Ctx.getAsConstantArrayType(
| ^~~~~~
Example:
int * const my_var = my_initializer;
Currently when transforming my_var to std::span the fixits:
- replace "int * const my_var = " with "std::span<int> const my_var {"
- add ", SIZE}" after "my_initializer" where SIZE is either inferred or
a placeholder
This patch makes that behavior less intrusive by not modifying variable
cv-qualifiers and initialization syntax.
The new behavior is:
- replace "int *" with "std::span<int>"
- add "{" before "my_initializer"
- add ", SIZE}" after "my_initializer"
This is an improvement on its own - since we don't touch the identifier,
we automatically can handle macros in them.
It also simplifies future work on initializer fixits.
Example:
int arr[10];
int * ptr = arr;
If ptr is unsafe and we transform it to std::span then the fixit we'd
currently provide transforms the code to:
std::span<int> ptr{arr, 10};
That's suboptimal as that repeats the size of the array in the code.
The idiomatic transformation should rely on the span constructor
that takes just the array argument and relies on template
parameter autodeduction to set the span size.
The transformed code should look like:
std::span<int> ptr = arr;
Note that it just should not change the initializer at all and that also
works for other forms of initialization like:
int * ptr {arr};
becoming:
std::span<int> ptr{arr};
This patch changes the initializer handling to the desired (empty)
fixit.
Introducing CArrayToPtrAssignment gadget and implementing fixits for some cases
of array being assigned to pointer.
Key observations:
- const size array can be assigned to std::span and bounds are propagated
- const size array can't be on LHS of assignment
This means array to pointer assignment has no strategy implications.
Fixits are implemented for cases where one of the variables in the assignment is
safe. For assignment of a safe array to unsafe pointer we know that the RHS will
never be transformed since it's safe and can immediately emit the optimal fixit.
Similarly for assignment of unsafe array to safe pointer.
(Obviously this is not and can't be future-proof in regards to what
variables we consider unsafe and that is fine.)
Fixits for assignment from unsafe array to unsafe pointer (from Array to Span
strategy) are not implemented in this patch as that needs to be properly designed
first - we might possibly implement optimal fixits for partially transformed
cases, put both variables in a single fixit group or do something else.
[-Wunsafe-buffer-usage] Ignore safe array subscripts
Don't emit warnings for array subscripts on constant size arrays where the index is constant and within bounds.
Example:
int arr[10];
arr[5] = 0; //safe, no warning
This patch recognizes only array indices that are integer literals - it doesn't understand more complex expressions (arithmetic on constants, etc.).
-Warray-bounds implemented in Sema::CheckArrayAccess() already solves a similar
(opposite) problem, handles complex expressions and is battle-tested.
Adding -Wunsafe-buffer-usage diagnostics to Sema is a non-starter as we need to emit
both the warnings and fixits and the performance impact of the fixit machine is
unacceptable for Sema.
CheckArrayAccess() as is doesn't distinguish between "safe" and "unknown" array
accesses. It also mixes the analysis that decides if an index is out of bounds
with crafting the diagnostics.
A refactor of CheckArrayAccess() might serve both the original purpose
and help us avoid false-positive with -Wunsafe-buffer-usage on constant
size arrrays.
Currently we ignore calls on function pointers (unlike direct calls of
functions and class methods). This patch adds support for function pointers as
well.
The change is to simply replace use of forEachArgumentWithParam matcher in UPC
gadget with forEachArgumentWithParamType.
from the documentation of forEachArgumentWithParamType:
/// Matches all arguments and their respective types for a \c CallExpr or
/// \c CXXConstructExpr. It is very similar to \c forEachArgumentWithParam but
/// it works on calls through function pointers as well.
Currently the matcher also uses hasPointerType() which checks that the
canonical type of an argument is pointer and won't match on arrays decayed to
pointer. Replacing hasPointerType() with isAnyPointerType() which allows
implicit casts allows for the arrays to be matched as well and this way we get
fixits for array arguments to function pointer calls too.
Covers cases where DeclRefExpr referring to a const-size array decays to a
pointer and is used "as a pointer" (e. g. passed to a pointer type
parameter).
Since std::array<T, N> doesn't implicitly convert to pointer to its element
type T* the cast needs to be done explicitly as part of the fixit
when we retrofit std::array to code that previously worked with constant
size array. std::array::data() method is used for the explicit
cast.
In terms of the fixit machine this covers the UPC(DRE) case for Array fixit strategy.
The emitted fixit inserts call to std::array::data() method similarly to
analogous fixit for Span strategy.
Although in a normal implementation the assumption is reasonable, it
seems that some esoteric implementation are not returning a T&. This
should be handled correctly and the values be propagated.
---------
Co-authored-by: martinboehme <mboehme@google.com>
This function will be useful when we change the behavior of record-type
prvalues
so that they directly initialize the associated result object. See also
the
comment here for more details:
9e73656af5/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (L354)
As part of this patch, we document and assert that synthetic fields may
not have
reference type.
There is no practical use case for this: A `StorageLocation` may not
have
reference type, and a synthetic field of the corresponding non-reference
type
can serve the same purpose.
Array subscript on a const size array is not bounds-checked. The idiomatic
replacement is std::array which is bounds-safe in hardened mode of libc++.
This commit extends the fixit-producing machine to consider std::array as a
transformation target type and teaches it to handle the array subscript on const
size arrays with a trivial (empty) fixit.
This occurs in rewritten candidates for binary operators (a C++20
feature).
The patch modifies UncheckedOptionalAccessModelTest to run in C++20 mode
(as
well as C++17 mode, as before) and to use rewritten candidates. The
modified
test fails without the newly added support for
`CXXRewrittenBinaryOperator`.
Debug notes for unclaimed DeclRefExpr should report any DRE of an unsafe
variable that is not covered by a Fixable (i. e. fixit for the
particular AST pattern isn't implemented for whatever reason). Currently
not all unclaimed DeclRefExpr-s are reported which is a bug. The debug
notes report only those DREs where the referred VarDecl has at least one
other DeclRefExpr which is claimed (covered by a fixit). If there is an
unsafe VarDecl that has exactly one DRE and the DRE isn't claimed then
the debug note about missing fixit won't be emitted. That is because the
debug note is emitted from within a loop over set of successfully
matched FixableGadgets which by-definition is missing those DRE that are
not matched at all.
The new code simply iterates over all unsafe VarDecls and all of their
unclaimed DREs.
This patch adds a new interface for the join operation, now properly
called `join`. Originally, the framework offered a single `merge`
operation, which could serve either as a join or a widening. In
practice, though we found this conflation didn't work for non-trivial
anlyses, and split of the widening operation (`widen`). This change
completes the transition by introducing a proper `join` with strict join
semantics.
In the process, it drops an odd (and often misused) aspect of `merge`
wherein callees could implictly instruct the framework to drop the
current entry by returning `false`. This features was never used
correctly in analyses and doesn't belong in a join operation, so it is
omitted.
---------
Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>
Co-authored-by: martinboehme <mboehme@google.com>
This fixes#69219.
Consider an example:
```
CoTask my_coroutine() {
std::abort();
co_return 1; // unreachable code warning.
}
```
Clang emits a CFG-based unreachable warning on the `co_return` statement
(precisely the `1` subexpr). If we remove this statement, the program
semantic is changed (my_coroutine is not a coroutine anymore).
This patch fixes this issue by never considering coroutine statements as
dead statements.