Commit Graph

437 Commits

Author SHA1 Message Date
martinboehme
7549b45825 Revert "[clang][dataflow] Propagate locations from result objects to initializers." (#88315)
Reverts llvm/llvm-project#87320

This is causing buildbots to fail because
`isOriginalRecordConstructor()` is now unused.
2024-04-10 21:27:10 +02:00
martinboehme
21009f466e [clang][dataflow] Propagate locations from result objects to initializers. (#87320)
Previously, we were propagating storage locations the other way around,
i.e.
from initializers to result objects, using `RecordValue::getLoc()`. This
gave
the wrong behavior in some cases -- see the newly added or fixed tests
in this
patch.

In addition, this patch now unblocks removing the `RecordValue` class
entirely,
as we no longer need `RecordValue::getLoc()`.

With this patch, the test `TransferTest.DifferentReferenceLocInJoin`
started to
fail because the framework now always uses the same storge location for
a
`MaterializeTemporaryExpr`, meaning that the code under test no longer
set up
the desired state where a variable of reference type is mapped to two
different
storage locations in environments being joined. Rather than trying to
modify
this test to set up the test condition again, I have chosen to replace
the test
with an equivalent test in DataflowEnvironmentTest.cpp that sets up the
test
condition directly; because this test is more direct, it will also be
less
brittle in the face of future changes.
2024-04-10 20:03:35 +02:00
NAKAMURA Takumi
d08a76d1ac Fix warnings discovered by #87348 [-Wunused-but-set-variable] 2024-04-07 11:02:08 +09:00
Yitzhak Mandelbaum
bbd259af0a [clang][dataflow] Refactor widen API to be explicit about change effect. (#87233)
The previous API relied on pointer equality of inputs and outputs to
signal whether a change occured. This was too subtle and led to bugs in
practice. It was also very limiting: the override could not return an equivalent (but
not identical) value.
2024-04-04 08:39:51 -04:00
martinboehme
ae280281ce [clang][dataflow] Fix for value constructor in class derived from optional. (#86942)
The constructor `Derived(int)` in the newly added test
`ClassDerivedFromOptionalValueConstructor` is not a template, and this
used to
cause an assertion failure in `valueOrConversionHasValue()` because
`F.getTemplateSpecializationArgs()` returns null.

(This is modeled after the `MaybeAlign(Align Value)` constructor, which
similarly causes an assertion failure in the analysis when assigning an
`Align`
to a `MaybeAlign`.)

To fix this, we can simply look at the type of the destination type
which we're
constructing or assigning to (instead of the function template
argument), and
this not only fixes this specific case but actually simplifies the
implementation.

I've added some additional tests for the case of assigning to a nested
optional
because we didn't have coverage for these and I wanted to make sure I
didn't
break anything.
2024-03-28 16:05:11 +01:00
martinboehme
8d77d362af [clang][dataflow] Introduce a helper class for handling record initializer lists. (#86675)
This is currently only used in one place, but I'm working on a patch
that will
use this from a second place. And I think this already improves the
readability
of the one place this is used so far.
2024-03-28 10:12:45 +01:00
smanna12
4c4ea249ce [NFC][CLANG] Fix static analyzer bugs about unnecessary object copies with auto keyword (#85962)
Reported by Static Analyzer Tool:
In clang::dataflow::Environment::initialize(): Using the auto keyword
without an & causes the copy of an object of type LambdaCapture
2024-03-26 12:05:40 -05:00
martinboehme
e6f63a942a [clang][dataflow] Bail out if input is Objective-C++. (#86479)
We only ever intended to support C++, but the condition we were testing
allowed
Objective-C++ code by mistake.
2024-03-25 14:08:25 +01:00
Eric Li
a6a6066290 [clang][dataflow] Fix crash when analyzing a coroutine (#85957)
A coroutine function body (`CoroutineBodyStmt`) may have null children,
which causes `isa` to segfault.
2024-03-20 12:45:30 -04:00
Yitzhak Mandelbaum
c1328db9d8 [clang][dataflow] Refactor processing of terminator element (#84499)
This patch vastly simplifies the code handling terminators, without
changing any
behavior. Additionally, the simplification unblocks our ability to
address a
(simple) FIXME in the code to invoke `transferBranch`, even when builtin
options
are disabled.
2024-03-19 09:41:29 -04:00
martinboehme
d712c5ed8f [clang][dataflow] Make optional checker work for types derived from optional. (#84138)
`llvm::MaybeAlign` does this, for example.

It's not an option to simply ignore these derived classes because they
get cast
back to the optional classes (for example, simply when calling the
optional
member functions), and our transfer functions will then run on those
optional
classes and therefore require them to be properly initialized.
2024-03-19 12:53:50 +01:00
martinboehme
b788e4655c [clang][dataflow] Model assignment to derived class from base. (#85064)
This is a relatively rare case, but

- It's still nice to get this right,
- We can remove the special case for this in
`VisitCXXOperatorCallExpr()` (that
  simply bails out), and
- With this in place, I can avoid having to add a similar special case
in an
  upcoming patch.
2024-03-19 09:22:35 +01:00
martinboehme
59ff3adcc1 [clang][dataflow][NFC] Rename ControlFlowContext to AdornedCFG. (#85640)
This expresses better what the class actually does, and it reduces the
number of
`Context`s that we have in the codebase.

A deprecated alias `ControlFlowContext` is available from the old
header.
2024-03-19 08:44:08 +01:00
martinboehme
27d504998e [clang][dataflow] Fix getResultObjectLocation() on CXXDefaultArgExpr. (#85072)
This patch includes a test that causes an assertion failure without the
other
changes in this patch.
2024-03-18 13:36:20 +01:00
martinboehme
2d539db246 [clang][dataflow] When analyzing ctors, don't initialize fields of *this with values. (#84164)
This is the constructor's job, and we want to be able to test that it
does this.
2024-03-08 08:19:02 +01:00
martinboehme
d5aecf0c19 [clang][nullability] Don't discard expression state before end of full-expression. (#82611)
In https://github.com/llvm/llvm-project/pull/72985, I made a change to
discard
expression state (`ExprToLoc` and `ExprToVal`) at the beginning of each
basic
block. I did so with the claim that "we never need to access entries
from these
maps outside of the current basic block", noting that there are
exceptions to
this claim when control flow happens inside a full-expression (the
operands of
`&&`, `||`, and the conditional operator live in different basic blocks
than the
operator itself) but that we already have a mechanism for retrieving the
values
of these operands from the environment for the block they are computed
in.

It turns out, however, that the operands of these operators aren't the
only
expressions whose values can be accessed from a different basic block;
when
control flow happens within a full-expression, that control flow can be
"interposed" between an expression and its parent. Here is an example:

```cxx
void f(int*, int);
bool cond();

void target() {
  int i = 0;
  f(&i, cond() ? 1 : 0);
}
```

([godbolt](https://godbolt.org/z/hrbj1Mj3o))

In the CFG[^1] , note how the expression for `&i` is computed in block
B4,
but the parent of this expression (the `CallExpr`) is located in block
B1.
The the argument expression `&i` and the `CallExpr` are essentially
"torn apart"
into different basic blocks by the conditional operator in the second
argument.
In other words, the edge between the `CallExpr` and its argument `&i`
straddles
the boundary between two blocks.

I used to think that this scenario -- where an edge between an
expression and
one of its children straddles a block boundary -- could only happen
between the
expression that triggers the control flow (`&&`, `||`, or the
conditional
operator) and its children, but the example above shows that other
expressions
can be affected as well; the control flow is still triggered by `&&`,
`||` or
the conditional operator, but the expressions affected lie outside these
operators.

Discarding expression state too soon is harmful. For example, an
analysis that
checks the arguments of the `CallExpr` above would not be able to
retrieve a
value for the `&i` argument.

This patch therefore ensures that we don't discard expression state
before the
end of a full-expression. In other cases -- when the evaluation of a
full-expression is complete -- we still want to discard expression state
for the
reasons explained in https://github.com/llvm/llvm-project/pull/72985
(avoid
performing joins on boolean values that are no longer needed, which
unnecessarily extends the flow condition; improve debuggability by
removing
clutter from the expression state).

The impact on performance from this change is about a 1% slowdown in the
Crubit nullability check benchmarks:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     71.9µs ± 1%  71.9µs ± 2%    ~     (p=0.987 n=15+20)
BM_PointerAnalysisIntLoop          190µs ± 1%   192µs ± 2%  +1.06%  (p=0.000 n=14+16)
BM_PointerAnalysisPointerLoop      325µs ± 5%   324µs ± 4%    ~     (p=0.496 n=18+20)
BM_PointerAnalysisBranch           193µs ± 0%   192µs ± 4%    ~     (p=0.488 n=14+18)
BM_PointerAnalysisLoopAndBranch    521µs ± 1%   525µs ± 3%  +0.94%  (p=0.017 n=18+19)
BM_PointerAnalysisTwoLoops         337µs ± 1%   341µs ± 3%  +1.19%  (p=0.004 n=17+19)
BM_PointerAnalysisJoinFilePath    1.62ms ± 2%  1.64ms ± 3%  +0.92%  (p=0.021 n=20+20)
BM_PointerAnalysisCallInLoop      1.14ms ± 1%  1.15ms ± 4%    ~     (p=0.135 n=16+18)
```

[^1]:
```
 [B5 (ENTRY)]
   Succs (1): B4

 [B1]
   1: [B4.9] ? [B2.1] : [B3.1]
   2: [B4.4]([B4.6], [B1.1])
   Preds (2): B2 B3
   Succs (1): B0

 [B2]
   1: 1
   Preds (1): B4
   Succs (1): B1

 [B3]
   1: 0
   Preds (1): B4
   Succs (1): B1

 [B4]
   1: 0
   2: int i = 0;
   3: f
   4: [B4.3] (ImplicitCastExpr, FunctionToPointerDecay, void (*)(int *, int))
   5: i
   6: &[B4.5]
   7: cond
   8: [B4.7] (ImplicitCastExpr, FunctionToPointerDecay, _Bool (*)(void))
   9: [B4.8]()
   T: [B4.9] ? ... : ...
   Preds (1): B5
   Succs (2): B2 B3

 [B0 (EXIT)]
   Preds (1): B1
```
2024-03-07 13:31:23 +01:00
martinboehme
a11ab139e4 [clang][dataflow] Fix u8 string error with C++20. (#84302)
See also discussion on https://github.com/llvm/llvm-project/pull/84291.
2024-03-07 12:53:26 +01:00
martinboehme
5830d1a2df Revert "[dataflow][nfc] Fix u8 string usage with c++20" (#84301)
Reverts llvm/llvm-project#84291

The patch broke Windows builds.
2024-03-07 11:48:51 +01:00
Vincent Lee
6e79f77adb [dataflow][nfc] Fix u8 string usage with c++20 (#84291)
Clang returns an error when compiling this file with c++20
```
error: ISO C++20 does not permit initialization of char array with UTF-8 string literal
```
It seems like c++20 treats u8strings differently than strings (probably
needs char8_t).
Make this a string to fix the error.
2024-03-07 10:46:36 +01:00
martinboehme
128780b06f [clang][dataflow] Correctly treat empty initializer lists for unions. (#82986)
This fixes a crash introduced by
https://github.com/llvm/llvm-project/pull/82348
but also adds additional handling to make sure that we treat empty
initializer
lists for both unions and structs/classes correctly (see tests added in
this
patch).
2024-03-01 09:27:59 +01:00
Samira Bazuzi
2730a5c68c [clang][dataflow] Skip array types when handling InitListExprs. (#83013)
Crashes resulted from single-element InitListExprs for arrays with
elements of a record type after #80970.
2024-02-26 10:53:33 -05:00
Samira Bazuzi
c4e94633e8 Revert "[clang][dataflow] Correctly handle InitListExpr of union type." (#82856)
Reverts llvm/llvm-project#82348, which caused crashes when analyzing
empty InitListExprs for unions, e.g.

```cc
union U {
  double double_value;
  int int_value;
};

void target() {
  U value;
  value = {};
}
```

Co-authored-by: Samira Bazuzi <bazuzi@users.noreply.github.com>
2024-02-26 14:23:46 +01:00
martinboehme
e899641df2 [clang][dataflow] Fix inaccuracies in buildStmtToBasicBlockMap(). (#82496)
See the comments added to the code for details on the inaccuracies that
have
now been fixed.

The patch adds tests that fail with the old implementation.
2024-02-22 09:00:20 +01:00
martinboehme
4725993f1a [clang][dataflow] Correctly handle InitListExpr of union type. (#82348) 2024-02-21 10:10:25 +01:00
martinboehme
5911334650 [clang][dataflow][NFC] Add a FIXME to handling of union initialization. (#82239)
We want to make it clear that the current behavior doesn't yet handle
unions
properly.
2024-02-20 08:05:47 +01:00
Yitzhak Mandelbaum
60cb09ba4f [clang][dataflow] Fix crash on unions introduced in ba279934c6 (#81918)
The commit was itself a crash fix, but inadvertently changed the
behavior for unions, which results in crashes.
2024-02-15 16:19:10 -05:00
Paul Semel
ba279934c6 [dataflow] Fix crash when InitListExpr is not a prvalue (#80970) 2024-02-15 10:59:51 +01:00
Paul Semel
a8fb0dcc41 [dataflow] CXXOperatorCallExpr equal operator might not be a glvalue (#80991)
Although in a normal implementation the assumption is reasonable, it
seems that some esoteric implementation are not returning a T&. This
should be handled correctly and the values be propagated.

---------

Co-authored-by: martinboehme <mboehme@google.com>
2024-02-13 11:39:27 +01:00
Antonio Frighetto
8c6e96d9eb [clang][Dataflow] Fix unnecessary copy in initializeFieldsWithValues (NFC) 2024-02-13 11:05:26 +01:00
martinboehme
270f2c5575 [clang][dataflow] Add Environment::initializeFieldsWithValues(). (#81239)
This function will be useful when we change the behavior of record-type
prvalues
so that they directly initialize the associated result object. See also
the
comment here for more details:


9e73656af5/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h (L354)

As part of this patch, we document and assert that synthetic fields may
not have
reference type.

There is no practical use case for this: A `StorageLocation` may not
have
reference type, and a synthetic field of the corresponding non-reference
type
can serve the same purpose.
2024-02-13 10:01:25 +01:00
martinboehme
a446c9bf69 [clang][dataflow] Add support for CXXRewrittenBinaryOperator. (#81086)
This occurs in rewritten candidates for binary operators (a C++20
feature).

The patch modifies UncheckedOptionalAccessModelTest to run in C++20 mode
(as
well as C++17 mode, as before) and to use rewritten candidates. The
modified
test fails without the newly added support for
`CXXRewrittenBinaryOperator`.
2024-02-08 08:38:35 +01:00
Yitzhak Mandelbaum
672fb27b26 [clang][dataflow] Add new join API and replace existing merge implementations. (#80361)
This patch adds a new interface for the join operation, now properly
called `join`. Originally, the framework offered a single `merge`
operation, which could serve either as a join or a widening. In
practice, though we found this conflation didn't work for non-trivial
anlyses, and split of the widening operation (`widen`). This change
completes the transition by introducing a proper `join` with strict join
semantics.

In the process, it drops an odd (and often misused) aspect of `merge`
wherein callees could implictly instruct the framework to drop the
current entry by returning `false`. This features was never used
correctly in analyses and doesn't belong in a join operation, so it is
omitted.

---------

Co-authored-by: Dmitri Gribenko <gribozavr@gmail.com>
Co-authored-by: martinboehme <mboehme@google.com>
2024-02-06 15:38:56 -05:00
martinboehme
0c361270af [clang][dataflow] Display line numbers in the HTML logger timeline. (#80130)
This makes it easier to count how many iterations an analysis takes to
complete.
It also makes it easier to compare how a change to the analysis code
affects
the timeline.

Here's a sample screenshot:


![image](https://github.com/llvm/llvm-project/assets/29098113/b3f44b4d-7037-4f28-9532-5418663250e1)
2024-02-01 05:33:03 +01:00
martinboehme
994493ce05 [clang][dataflow][NFC] Rename a confusingly named variable. (#80182) 2024-02-01 05:31:32 +01:00
Paul Semel
5c2da289d2 [clang][dataflow] fix assert in Environment::getResultObjectLocation (#79608)
When calling `Environment::getResultObjectLocation` with a
CXXOperatorCallExpr that is a prvalue, we just hit an assert because no
record was ever created.

---------

Co-authored-by: martinboehme <mboehme@google.com>
2024-01-31 17:18:16 +01:00
martinboehme
82324bc991 [clang][dataflow] In the CFG visualization, mark converged blocks. (#79999)
Here's an example of the output:


![image](https://github.com/llvm/llvm-project/assets/29098113/63cd509e-c2a7-4794-b758-ea73812ff09f)
2024-01-31 08:31:08 +01:00
martinboehme
c83ec847ac [clang][dataflow] Extend debug output for Environment. (#79982)
*  Print `ReturnLoc`, `ReturnVal`, and `ThisPointeeLoc` if applicable.

* For entries in `LocToVal` that correspond to declarations, print the
names
   of the declarations next to them.

I've removed the FIXME because all relevant fields are now being dumped.
I'm
not sure we actually need the capability for the caller to specify which
fields
to dump, so I've simply deleted this part of the comment.

Some examples of the output:


![image](https://github.com/llvm/llvm-project/assets/29098113/17d0978f-b86d-4555-8a61-d1f2021f8d59)


![image](https://github.com/llvm/llvm-project/assets/29098113/021dbb24-5fe2-4720-8a08-f48dcf4b88f8)
2024-01-31 08:11:13 +01:00
Yitzhak Mandelbaum
a385c379f8 [clang][dataflow] Drop block-relative cap on worklist iterations. (#80033)
As per the FIXME, this cap never really served its purpose. This patch
simplifies to a single, caller-specified, absolute cap.
2024-01-30 16:05:25 -05:00
martinboehme
7a6c2628e9 [clang][dataflow] Eliminate two uses of RecordValue::getLoc(). (#79163)
This is a small step towards eventually eliminating `RecordValue`
entirely.
2024-01-24 08:06:32 +01:00
martinboehme
ccf1e322bd [clang][dataflow] Process terminator condition within transferCFGBlock(). (#78127)
In particular, it's important that we create the "fallback" atomic at
this point
(which we produce if the transfer function didn't produce a value for
the
expression) so that it is placed in the correct environment.

Previously, we processed the terminator condition in the
`TerminatorVisitor`,
which put the fallback atomic in a copy of the environment that is
produced as
input for the _successor_ block, rather than the environment for the
block
containing the expression for which we produce the fallback atomic.

As a result, we produce different fallback atomics every time we process
the
successor block, and hence we don't have a consistent representation of
the
terminator condition in the flow condition.

This patch includes a test (authored by ymand@) that fails without the
fix.
2024-01-23 10:19:06 +01:00
Yitzhak Mandelbaum
f3dd8f10c7 [clang][dataflow] Make cap on block visits configurable by caller. (#77481)
Previously, we hard-coded the cap on block visits inside the framework.
This
patch enables the caller to specify the cap in the APIs for running an
analysis.
2024-01-22 22:41:48 -05:00
martinboehme
a2caa4929e [clang][dataflow] Treat comma operator correctly in getResultObjectLocation(). (#78427) 2024-01-22 09:23:06 +01:00
martinboehme
f1226eea52 [clang][dataflow] Consider CXXDefaultInitExpr to be an "original record ctor". (#78423)
The CFG doesn't contain a CFGElement for the
`CXXDefaultInitExpr::getInit()`, so
it makes sense to consider the `CXXDefaultInitExpr` to be the expression
that
originally constructs the object.
2024-01-18 08:59:26 +01:00
martinboehme
1b1b525147 [clang][dataflow] Use Formula::isLiteral() in a couple more places. (#78404) 2024-01-18 08:58:47 +01:00
Yitzhak Mandelbaum
65ecbdf61f [clang][dataflow] Fix bug in Value comparison. (#76746)
Makes value equivalence require that the values have no properties,
except in
the case of equivalence by pointer equality (if the pointers are equal,
nothing
else is checked).

Fixes issue #76459.
2024-01-16 13:03:49 -05:00
martinboehme
23bfc271a3 [clang][dataflow] Use ignoreCFGOmittedNodes() in setValue(). (#78245)
This is to be consistent with `getValue()`, which also uses
`ignoreCFGOmittedNodes()`.

Before this fix, it was not possible to retrieve a `Value` from a "CFG
omitted"
node that had previously been set using `setValue()`; see the
accompanying test,
which fails without the fix.

I discovered this issue while running internal integration tests on
https://github.com/llvm/llvm-project/pull/78127.
2024-01-16 15:48:44 +01:00
martinboehme
c19cacfa34 [clang][dataflow] Tighten checking for existence of a function body. (#78163)
In various places, we would previously call `FunctionDecl::hasBody()`
(which
checks whether any redeclaration of the function has a body, not
necessarily the
one on which `hasBody()` is being called).

This is bug-prone, as a recent bug in Crubit's nullability checker has
shown

([fix](4b01ed0f14),
[fix for the
fix](e0c5d8ddd7)).

Instead, we now use `FunctionDecl::doesThisDeclarationHaveABody()`
which, as the
name implies, checks whether the specific redeclaration it is being
called on
has a body.

Alternatively, I considered being more lenient and "canonicalizing" to
the
`FunctionDecl` that has the body if the `FunctionDecl` being passed is a
different redeclaration. However, this also risks hiding bugs: A caller
might
inadverently perform the analysis for all redeclarations of a function
and end
up duplicating work without realizing it. By accepting only the
redeclaration
that contains the body, we prevent this.

I've checked, and all clients that I'm aware of do currently pass in the
redeclaration that contains the function body. Typically this is because
they
use the `ast_matchers::hasBody()` matcher which, unlike
`FunctionDecl::hasBody()`, only matches for the redeclaration containing
the
body.
2024-01-16 12:52:55 +01:00
martinboehme
af1463d403 [clang][dataflow] Add an early-out to flowConditionImplies() / flowConditionAllows(). (#78172)
This saves having to assemble the set of constraints and run the SAT
solver in
the trivial case of `flowConditionImplies(true)` or
`flowConditionAllows(false)`.

This is an update / reland of my previous reverted
[#77453](https://github.com/llvm/llvm-project/pull/77453). That PR
contained a
logic bug -- the early-out for `flowConditionAllows()` was wrong because
my
intuition about the logic was wrong. (In particular, note that
`flowConditionImplies(F)` does not imply `flowConditionAllows(F)`, even
though
this may run counter to intuition.)

I've now done what I should have done on the first iteration and added
more
tests. These pass both with and without my early-outs.

This patch is a performance win on the benchmarks for the Crubit
nullability
checker, except for one slight regression on a relatively short
benchmark:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     68.5µs ± 7%  67.6µs ± 4%    ~     (p=0.159 n=18+19)
BM_PointerAnalysisIntLoop          173µs ± 3%   162µs ± 4%  -6.40%  (p=0.000 n=19+20)
BM_PointerAnalysisPointerLoop      307µs ± 2%   312µs ± 4%  +1.56%  (p=0.013 n=18+20)
BM_PointerAnalysisBranch           199µs ± 4%   181µs ± 4%  -8.81%  (p=0.000 n=20+20)
BM_PointerAnalysisLoopAndBranch    503µs ± 3%   508µs ± 2%    ~     (p=0.081 n=18+19)
BM_PointerAnalysisTwoLoops         304µs ± 4%   286µs ± 2%  -6.04%  (p=0.000 n=19+20)
BM_PointerAnalysisJoinFilePath    4.78ms ± 3%  4.54ms ± 4%  -4.97%  (p=0.000 n=20+20)
BM_PointerAnalysisCallInLoop      3.05ms ± 3%  2.90ms ± 4%  -5.05%  (p=0.000 n=19+20)
```

When running clang-tidy on real-world code, the results are less clear.
In
three runs, averaged, on an arbitrarily chosen input file, I get 11.60 s
of user
time without this patch and 11.40 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in
benchmarks, so I
think it is worth making.
2024-01-16 08:25:21 +01:00
Jie Fu
fbac3b0db5 Revert "[clang][dataflow] Remove unused private field 'StmtToEnv' (NFC)"
Revert it after 1aacdfe473
2024-01-12 17:17:39 +08:00
martinboehme
1aacdfe473 Revert "[clang][dataflow] Process terminator condition within transferCFGBlock()." (#77895)
Reverts llvm/llvm-project#77750
2024-01-12 09:54:50 +01:00