Commit Graph

65 Commits

Author SHA1 Message Date
martinboehme
23ae482bd0 [clang][dataflow] Allow DataflowAnalysisContext to use a non-owned Solver. (#91316)
For some callers (see change in DataflowAnalysis.h), this is more
convenient.
2024-05-08 08:36:53 +02:00
Samira Bazuzi
9ec8c96166 [clang][dataflow] Expose getReferencedDecls and relocate free functions. (#88754)
Moves free functions from DataflowEnvironment.h/cc and
DataflowAnalysisContext.h/cc to RecordOps and a new ASTOps and exposes
them as needed for current use and to expose getReferencedDecls for
out-of-tree use.

Minimal change in functionality, only to modify the return type of
getReferenceDecls to return the collected decls instead of using output
params.

Tested with `ninja check-clang-tooling`.
2024-04-16 14:46:05 -04:00
martinboehme
59ff3adcc1 [clang][dataflow][NFC] Rename ControlFlowContext to AdornedCFG. (#85640)
This expresses better what the class actually does, and it reduces the
number of
`Context`s that we have in the codebase.

A deprecated alias `ControlFlowContext` is available from the old
header.
2024-03-19 08:44:08 +01:00
martinboehme
c19cacfa34 [clang][dataflow] Tighten checking for existence of a function body. (#78163)
In various places, we would previously call `FunctionDecl::hasBody()`
(which
checks whether any redeclaration of the function has a body, not
necessarily the
one on which `hasBody()` is being called).

This is bug-prone, as a recent bug in Crubit's nullability checker has
shown

([fix](4b01ed0f14),
[fix for the
fix](e0c5d8ddd7)).

Instead, we now use `FunctionDecl::doesThisDeclarationHaveABody()`
which, as the
name implies, checks whether the specific redeclaration it is being
called on
has a body.

Alternatively, I considered being more lenient and "canonicalizing" to
the
`FunctionDecl` that has the body if the `FunctionDecl` being passed is a
different redeclaration. However, this also risks hiding bugs: A caller
might
inadverently perform the analysis for all redeclarations of a function
and end
up duplicating work without realizing it. By accepting only the
redeclaration
that contains the body, we prevent this.

I've checked, and all clients that I'm aware of do currently pass in the
redeclaration that contains the function body. Typically this is because
they
use the `ast_matchers::hasBody()` matcher which, unlike
`FunctionDecl::hasBody()`, only matches for the redeclaration containing
the
body.
2024-01-16 12:52:55 +01:00
martinboehme
af1463d403 [clang][dataflow] Add an early-out to flowConditionImplies() / flowConditionAllows(). (#78172)
This saves having to assemble the set of constraints and run the SAT
solver in
the trivial case of `flowConditionImplies(true)` or
`flowConditionAllows(false)`.

This is an update / reland of my previous reverted
[#77453](https://github.com/llvm/llvm-project/pull/77453). That PR
contained a
logic bug -- the early-out for `flowConditionAllows()` was wrong because
my
intuition about the logic was wrong. (In particular, note that
`flowConditionImplies(F)` does not imply `flowConditionAllows(F)`, even
though
this may run counter to intuition.)

I've now done what I should have done on the first iteration and added
more
tests. These pass both with and without my early-outs.

This patch is a performance win on the benchmarks for the Crubit
nullability
checker, except for one slight regression on a relatively short
benchmark:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     68.5µs ± 7%  67.6µs ± 4%    ~     (p=0.159 n=18+19)
BM_PointerAnalysisIntLoop          173µs ± 3%   162µs ± 4%  -6.40%  (p=0.000 n=19+20)
BM_PointerAnalysisPointerLoop      307µs ± 2%   312µs ± 4%  +1.56%  (p=0.013 n=18+20)
BM_PointerAnalysisBranch           199µs ± 4%   181µs ± 4%  -8.81%  (p=0.000 n=20+20)
BM_PointerAnalysisLoopAndBranch    503µs ± 3%   508µs ± 2%    ~     (p=0.081 n=18+19)
BM_PointerAnalysisTwoLoops         304µs ± 4%   286µs ± 2%  -6.04%  (p=0.000 n=19+20)
BM_PointerAnalysisJoinFilePath    4.78ms ± 3%  4.54ms ± 4%  -4.97%  (p=0.000 n=20+20)
BM_PointerAnalysisCallInLoop      3.05ms ± 3%  2.90ms ± 4%  -5.05%  (p=0.000 n=19+20)
```

When running clang-tidy on real-world code, the results are less clear.
In
three runs, averaged, on an arbitrarily chosen input file, I get 11.60 s
of user
time without this patch and 11.40 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in
benchmarks, so I
think it is worth making.
2024-01-16 08:25:21 +01:00
martinboehme
7ce010f2fb Revert "[clang][dataflow] Add an early-out to flowConditionImplies() / flowConditionAllows()." (#77570)
Reverts llvm/llvm-project#77453
2024-01-10 10:50:16 +01:00
martinboehme
03a0bfa96a [clang][dataflow] Add an early-out to flowConditionImplies() / flowConditionAllows(). (#77453)
This saves having to assemble the set of constraints and run the SAT
solver in
the trivial case where `F` is true.

This is a performance win on the benchmarks for the Crubit nullability
checker:

```
name                              old cpu/op   new cpu/op   delta
BM_PointerAnalysisCopyPointer     64.1µs ± 5%  63.1µs ± 0%   -1.56%  (p=0.000 n=20+17)
BM_PointerAnalysisIntLoop          172µs ± 2%   171µs ± 0%     ~     (p=0.752 n=20+17)
BM_PointerAnalysisPointerLoop      408µs ± 3%   355µs ± 0%  -12.99%  (p=0.000 n=20+17)
BM_PointerAnalysisBranch           201µs ± 2%   184µs ± 0%   -8.28%  (p=0.000 n=20+19)
BM_PointerAnalysisLoopAndBranch    684µs ± 2%   613µs ± 2%  -10.38%  (p=0.000 n=20+19)
BM_PointerAnalysisTwoLoops         309µs ± 2%   308µs ± 2%     ~     (p=0.728 n=20+19)
BM_PointerAnalysisJoinFilePath    37.9ms ± 2%  37.9ms ± 2%   +0.06%  (p=0.041 n=20+19)
BM_PointerAnalysisCallInLoop      26.5ms ± 2%  26.4ms ± 4%   -0.59%  (p=0.024 n=20+20)
```

When running clang-tidy on real-world code, the results are less clear.
In
three runs, averaged, on an arbitrarily chosen input file, I get 11.91 s
of user
time without this patch and 11.81 s with it, though with considerable
measurement noise (I'm seeing up to 0.2 s of variation between runs).

Still, this is a very simple change, and it is a clear win in
benchmarks, so I
think it is worth making.
2024-01-09 19:18:54 +01:00
martinboehme
71f2ec2db1 [clang][dataflow] Add synthetic fields to RecordStorageLocation (#73860)
Synthetic fields are intended to model the internal state of a class
(e.g. the value stored in a `std::optional`) without having to depend on
that class's implementation details.

Today, this is typically done with properties on `RecordValue`s, but
these have several drawbacks:

* Care must be taken to call `refreshRecordValue()` before modifying a
property so that the modified property values aren’t seen by other
environments that may have access to the same `RecordValue`.

* Properties aren’t associated with a storage location. If an analysis
needs to associate a location with the value stored in a property (e.g.
to model the reference returned by `std::optional::value()`), it needs
to manually add an indirection using a `PointerValue`. (See for example
the way this is done in UncheckedOptionalAccessModel.cpp, specifically
in `maybeInitializeOptionalValueMember()`.)

* Properties don’t participate in the builtin compare, join, and widen
operations. If an analysis needs to apply these operations to
properties, it needs to override the corresponding methods of
`ValueModel`.

* Longer-term, we plan to eliminate `RecordValue`, as by-value
operations on records aren’t really “a thing” in C++ (see
https://discourse.llvm.org/t/70086#changed-structvalue-api-14). This
would obviously eliminate the ability to set properties on
`RecordValue`s.

To demonstrate the advantages of synthetic fields, this patch converts
UncheckedOptionalAccessModel.cpp to synthetic fields. This greatly
simplifies the implementation of the check.

This PR is pretty big; to make it easier to review, I have broken it
down into a stack of three commits, each of which contains a set of
logically related changes. I considered submitting each of these as a
separate PR, but the commits only really make sense when taken together.

To review, I suggest first looking at the changes in
UncheckedOptionalAccessModel.cpp. This gives a flavor for how the
various API changes work together in the context of an analysis. Then,
review the rest of the changes.
2023-12-04 09:29:22 +01:00
Jie Fu
a737a33eb9 [clang][dataflow] Fix -Wrange-loop-construct in DataflowAnalysisContext.cpp (NFC)
/llvm-project/clang/lib/Analysis/FlowSensitive/DataflowAnalysisContext.cpp:247:10: note: use reference type 'const llvm::Sma
llVector<Atom> &' to prevent copying
    for (const llvm::SmallVector<Atom> Class : Info.EquivalentAtoms)
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                                       &
1 error generated.
2023-11-07 22:32:11 +08:00
martinboehme
7c636728c0 [clang][dataflow] Simplify flow conditions displayed in HTMLLogger. (#70848)
This can make the flow condition significantly easier to interpret; see
below
for an example.

I had hoped that adding the simplification as a preprocessing step
before the
SAT solver (in `DataflowAnalysisContext::querySolver()`) would also
speed up SAT
solving and maybe even eliminate SAT solver timeouts, but in my testing,
this
actually turns out to be a pessimization. It appears that these
simplifications
are easy enough for the SAT solver to perform itself.

Nevertheless, the improvement in debugging alone makes this a worthwhile
change.

Example of flow condition output with these changes:

```
Flow condition token: V37
Constraints:
(V16 = (((V15 & (V19 = V12)) & V22) & V25))
(V15 = ((V12 & ((V14 = V9) | (V14 = V4))) & (V13 = V14)))
True atoms: (V0, V1, V2, V5, V6, V7, V29, V30, V32, V34, V35, V37)
False atoms: (V3, V8, V17)
Equivalent atoms:
(V11, V15)

Flow condition constraints before simplification:
V37
((!V3 & !V8) & !V17)
(V37 = V34)
(V34 = (V29 & (V35 = V30)))
(V29 = (((V16 | V2) & V32) & (V30 = V32)))
(V16 = (((V15 & (V19 = V12)) & V22) & V25))
(V15 = V11)
(V11 = ((((V7 | V2) & V12) & ((V7 & (V14 = V9)) | (V2 & (V14 = V4)))) & (V13 = V14)))
(V2 = V1)
(V1 = V0)
V0
(V7 = V6)
(V6 = V5)
(V5 = V2)
```
2023-11-07 15:18:34 +01:00
martinboehme
d1f59544cf [clang][dataflow] Add Environment::allows(). (#70046)
This allows querying whether, given the flow condition, a certain
formula still
has a solution (though it is not necessarily implied by the flow
condition, as
`flowConditionImplies()` would check).

This can be checked today, but only with a double negation, i.e. to
check
whether, given the flow condition, a formula F has a solution, you can
check
`!Env.flowConditionImplies(Arena.makeNot(F))`. The double negation makes
this
hard to reason about, and it would be nicer to have a way of directly
checking
this.

For consistency, this patch also renames `flowConditionImplies()` to
`proves()`;
the old name is kept around for compatibility but deprecated.
2023-10-25 16:02:22 +02:00
martinboehme
3eed23d31c [clang][dataflow] Remove DataflowAnalysisContext::flowConditionIsTautology(). (#69601)
It's only used in its own unit tests.
2023-10-23 08:18:12 +02:00
Sam McCall
7338eb561c Reapply "[dataflow] use true/false literals in formulas, rather than variables"
This reverts commit 3353f7dd3d.

Fixed test bug (unspecified order of arg evaluation)
2023-10-19 11:34:08 +02:00
Stanislav Gatev
52d0696355 [clang][dataflow] Add support for lambda captures (#68558)
This adds support for copy, ref, and this lambda captures to the core
framework and also adds relevant tests in UncheckedOptionalAccessTest.
2023-10-11 22:18:46 +02:00
Douglas Yung
3353f7dd3d Revert "[dataflow] use true/false literals in formulas, rather than variables"
This reverts commit 36bd5bd888.

This change is causing a test failure on several build bots:
- https://lab.llvm.org/buildbot/#/builders/139/builds/50255
- https://lab.llvm.org/buildbot/#/builders/216/builds/27735
- https://lab.llvm.org/buildbot/#/builders/247/builds/9334
2023-09-22 11:43:27 -07:00
Sam McCall
36bd5bd888 [dataflow] use true/false literals in formulas, rather than variables
And simplify formulas containing true/false
It's unclear to me how useful this is, it does make formulas more
conveniently self-contained now (we can usefully print them without
carrying around the "true/false" labels)

(while here, simplify !!X to X, too)

Differential Revision: https://reviews.llvm.org/D153485
2023-09-22 17:12:20 +02:00
Sam McCall
21ab252f97 [dataflow] Add global invariant condition to DataflowAnalysisContext (#65949)
This records facts that are not sensitive to the current flow condition,
and should apply to all environments.

The motivating case is recording information about where a Value
originated, such as nullability:
 - we may see the same Value for multiple expressions (e.g. reads of the
   same field) in multiple environments (multiple blocks or iterations)
 - we want to record information only when we first see the Value
   (e.g. Nullability annotations on fields only add information if we
   don't know where the value came from)
 - this information should be expressible as a SAT condition
 - we must add this SAT condition to every environment where the
   Value may appear

We solve this by recording the information in the global condition.
This doesn't seem particularly elegant, but solves the problem and is
a fairly small and natural extension of the Environment.

Alternatives considered:
 - store the constraint directly as a property on the Value.
   But it's more composable for such properties to always be variables
   (AtomicBoolValue), and constrain them with SAT conditions.
 - add a hook whenever values are created, giving the analysis the
   chance to populate them.
   However the framework relies on/provides the ability to construct
   values in arbitrary places without providing the context such a hook
   would need, this would be a very invasive change.
2023-09-14 19:30:04 +02:00
Martin Braenne
aef05a1232 [clang][dataflow][NFC] Eliminate getStorageLocation() / setStorageLocation() in DataflowAnalysisContext.
Instead, inline them into the `getStableStorageLocation()` overloads, which is the only place they were called from (and should be called from).

`getStorageLocation()` / `setStorageLocation()` were confusing because neither their name nor their documentation made reference to the fact that the storage location is stable.

It didn't make sense to keep these as private member functions either. The code for the two `getStableStorageLocation()` overloads has become only marginally more complex by inlining these functions, and the `Expr` version is actually more efficient because we only call `ignoreCFGOmittedNodes()` once instead of twice.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D158981
2023-08-29 06:53:48 +00:00
Martin Braenne
9ecdbe3855 [clang][dataflow] Rename AggregateStorageLocation to RecordStorageLocation and StructValue to RecordValue.
- Both of these constructs are used to represent structs, classes, and unions;
  Clang uses the collective term "record" for these.

- The term "aggregate" in `AggregateStorageLocation` implies that, at some
  point, the intention may have been to use it also for arrays, but it don't
  think it's possible to use it for arrays. Records and arrays are very
  different and therefore need to be modeled differently. Records have a fixed
  set of named fields, which can have different type; arrays have a variable
  number of elements, but they all have the same type.

- Futhermore, "aggregate" has a very specific meaning in C++
  (https://en.cppreference.com/w/cpp/language/aggregate_initialization).
  Aggregates of class type may not have any user-declared or inherited
  constructors, no private or protected non-static data members, no virtual
  member functions, and so on, but we use `AggregateStorageLocations` to model all objects of class type.

In addition, for consistency, we also rename the following:

- `getAggregateLoc()` (in `RecordValue`, formerly known as `StructValue`) to
  simply `getLoc()`.

- `refreshStructValue()` to `refreshRecordValue()`

We keep the old names around as deprecated synonyms to enable clients to be migrated to the new names.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D156788
2023-08-01 20:29:40 +00:00
Martin Braenne
44f98d0101 [clang][dataflow] Eliminate duplication between AggregateStorageLocation and StructValue.
After this change, `StructValue` is just a wrapper for an `AggregateStorageLocation`. For the wider context, see https://discourse.llvm.org/t/70086.

## How to review

- Start by looking at the comments added / changed in Value.h, StorageLocation.h,
  and DataflowEnvironment.h. This will give you a good overview of the semantic
  changes.

- Look at the corresponding .cpp files that implement the semantic changes.

- Transfer.cpp, TypeErasedDataflowAnalysis.cpp, and RecordOps.cpp show how the
  core of the framework is affected by the semantic changes.

- UncheckedOptionalAccessModel.cpp shows how this complex model is affected by
  the changes.

- Many of the changes in the rest of the patch are mechanical in nature.

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D155446
2023-07-24 13:20:01 +00:00
Martin Braenne
6d768548ec [clang][dataflow] Add DataflowEnvironment::createObject().
This consolidates the code used in various places to initialize objects (usually for variables) into one central location.

It will also help reduce the number of changes needed when we make the upcoming API changes to `AggregateStorageLocation` and `StructValue`.

Depends On D155074

Reviewed By: ymandel, xazax.hun

Differential Revision: https://reviews.llvm.org/D155075
2023-07-17 07:26:10 +00:00
Nikolas Klauser
2a3322bab0 [clang] Create a buildkite-pipeline.yml file for clang
This moves the formatting job to a shell script, which should also fix
the clang pre-commit CI.

Differential Revision: https://reviews.llvm.org/D153920
2023-07-12 08:33:22 -04:00
Martin Braenne
e8a1560d1d [clang][dataflow] Various changes to handling of modeled fields.
- Rename `getReferencedFields()` to `getModeledFields()`. Move the logic that
  returns all object fields when doing a context-sensitive analysis to here from
  `DataflowAnalysisContext::createStorageLocation()`. I think all callers of
  the previous `getReferencedFields()` should use this logic; the fact that they
  were not doing so looks like a bug.

- Make `getModeledFields()` public. I have an upcoming patch that will need to
  use this function from Transfer.cpp, and there doesn't seem to be any reason
  why this function should not be public.

- Use a `SmallSetVector` to get deterministic iteration order. I have a pending
  patch where I'm getting flaky tests because
  `Environment::createValueUnlessSelfReferential()` is non-deterministically
  populating different fields depending on the iteration order. This change
  fixes those flaky tests.

Reviewed By: gribozavr2

Differential Revision: https://reviews.llvm.org/D154586
2023-07-10 06:45:53 +00:00
Sam McCall
fc9821a877 Reland "[dataflow] Add dedicated representation of boolean formulas"
This reverts commit 7a72ce9822.

Test problems were due to unspecified order of function arg evaluation.

Reland "[dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)"

This properly frees the Value hierarchy from managing boolean formulas.

We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.

We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.

For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.

The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.

The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.

Differential Revision: https://reviews.llvm.org/D153469
2023-07-07 11:58:33 +02:00
Sam McCall
2d8cd19512 Revert "Reland "[dataflow] Add dedicated representation of boolean formulas" and followups
These changes are OK, but they break downstream stuff that needs more time to adapt :-(

This reverts commit 71579569f4.
This reverts commit 5e4ad816bf.
This reverts commit 1c3ac8dfa1.
2023-07-05 14:32:25 +02:00
Sam McCall
71579569f4 [dataflow] use true/false literals in formulas, rather than variables
And simplify formulas containing true/false
It's unclear to me how useful this is, it does make formulas more
conveniently self-contained now (we can usefully print them without
carrying around the "true/false" labels)

(while here, simplify !!X to X, too)

Differential Revision: https://reviews.llvm.org/D153485
2023-07-05 14:06:48 +02:00
Sam McCall
5e4ad816bf [dataflow] Replace most BoolValue subclasses with references to Formula (and AtomicBoolValue => Atom and BoolValue => Formula where appropriate)
This properly frees the Value hierarchy from managing boolean formulas.

We still distinguish AtomicBoolValue; this type is used in client code.
However we expect to convert such uses to BoolValue (where the
distinction is not needed) or Atom (where atomic identity is intended),
and then fold AtomicBoolValue into FormulaBoolValue.

We also distinguish TopBoolValue; this has distinct rules for
widen/join/equivalence, and top-ness is not represented in Formula.
It'd be nice to find a cleaner representation (e.g. the absence of a
formula), but no immediate plans.

For now, BoolValues with the same Formula are deduplicated. This doesn't
seem desirable, as Values are mutable by their creators (properties).
We can probably drop this for FormulaBoolValue immediately (not in this
patch, to isolate changes). For AtomicBoolValue we first need to update
clients to stop using value pointers for atom identity.

The data structures around flow conditions are updated:
- flow condition tokens are Atom, rather than AtomicBoolValue*
- conditions are Formula, rather than BoolValue
Most APIs were changed directly, some with many clients had a
new version added and the existing one deprecated.

The factories for BoolValues in Environment keep their existing
signatures for now (e.g. makeOr(BoolValue, BoolValue) => BoolValue)
and are not deprecated. These have very many clients and finding the
most ergonomic API & migration path still needs some thought.

Differential Revision: https://reviews.llvm.org/D153469
2023-07-05 13:54:32 +02:00
Sam McCall
1c3ac8dfa1 Reland "[dataflow] Add dedicated representation of boolean formulas"
This reverts commit 7a72ce9822.

Test problems were due to unspecified order of function arg evaluation.
2023-07-05 13:35:16 +02:00
Tom Weaver
7a72ce9822 Revert "[dataflow] Add dedicated representation of boolean formulas"
This reverts commit 2fd614efc1.

Commit caused failures on the following two build bots:
  http://45.33.8.238/win/80815/step_7.txt
  https://lab.llvm.org/buildbot/#/builders/139/builds/44269
2023-07-04 14:05:54 +01:00
Sam McCall
2fd614efc1 [dataflow] Add dedicated representation of boolean formulas
This is the first step in untangling the two current jobs of BoolValue.

=== Desired end-state: ===

- BoolValue will model C++ booleans e.g. held in StorageLocations.
  this includes describing uncertainty (e.g. "top" is a Value concern)
- Formula describes analysis-level assertions in terms of SAT atoms.

These can still be linked together: a BoolValue may have a corresponding
SAT atom which is constrained by formulas.

=== Done in this patch: ===

BoolValue is left intact, Formula is just the input type to the
SAT solver, and we build formulas as needed to invoke the solver.

=== Incidental changes to debug string printing: ===

- variables renamed from B0 etc to V0 etc
  B0 collides with the names of basic blocks, which is confusing when
  debugging flow conditions.
- debug printing of formulas (Formula and Atom) uses operator<<
  rather than debugString(), so works with gtest.
  Therefore moved out of DebugSupport.h
- Did the same to Solver::Result, and some helper changes to SolverTest,
  so that we get useful messages on unit test failures
- formulas are now printed as infix expressions on one line, rather than
  wrapped/indented S-exprs. My experience is that this is easier to scan
  FCs for small examples, and large ones are unreadable either way.
- most of the several debugString() functions for constraints/results
  are unused, so removed them rather than updating tests.
  Inlined the one that was actually used into its callsite.

Differential Revision: https://reviews.llvm.org/D153366
2023-07-04 12:19:44 +02:00
Sam McCall
d85c233e4b [dataflow] Make SAT solver deterministic
The SAT solver imported its constraints by iterating over an unordered DenseSet.
The path taken, and ultimately the runtime, the specific solution found, and
whether it would time out or complete could depend on the iteration order.

Instead, have the caller specify an ordered collection of constraints.
If this is built in a deterministic way, the system can have deterministic
behavior.
(The main alternative is to sort the constraints by value, but this option
is simpler today).

A lot of nondeterminism still appears to be remain in the framework, so today
the solver's inputs themselves are not deterministic yet.

Differential Revision: https://reviews.llvm.org/D153584
2023-06-26 21:16:25 +02:00
Martin Braenne
5ddb623952 [clang][dataflow] Remove unnecessary ASTContext parameter from ControlFlowContext::build overload.
When introducing this new overload in https://reviews.llvm.org/D151183, I didn't consider that the `ASTContext` parameter was unnecessary because it could also be obtained from the `FunctionDecl`.

Reviewed By: gribozavr2, xazax.hun

Differential Revision: https://reviews.llvm.org/D151549
2023-05-30 07:05:35 +00:00
Martin Braenne
246626a8cf [clang][dataflow] Add a ControlFlowContext::build() overload taking a FunctionDecl.
This is the most common use case, so it makes sense to have a specific overload for it.

Reviewed By: xazax.hun

Differential Revision: https://reviews.llvm.org/D151183
2023-05-25 07:19:12 +00:00
Martin Braenne
1a42f79558 [clang][dataflow] Don't analyze templated declarations.
Attempting to analyze templated code doesn't have a good cost-benefit ratio. We
have so far done a best-effort attempt at this, but maintaining this support has
an ongoing high maintenance cost because the AST for templates can violate a lot
of the invariants that otherwise hold for the AST of concrete code. As just one
example, in concrete code the operand of a UnaryOperator '*' is always a prvalue
(https://godbolt.org/z/s3e5xxMd1), but in templates this isn't true
(https://godbolt.org/z/6W9xxGvoM).

Further rationale for not analyzing templates:

* The semantics of a template itself are weakly defined; semantics can depend
  strongly on the concrete template arguments. Analyzing the template itself (as
  opposed to an instantiation) therefore has limited value.

* Analyzing templates requires a lot of special-case code that isn't necessary
  for concrete code because dependent types are hard to deal with and the AST
  violates invariants that otherwise hold for concrete code (see above).

* There's precedent in that neither Clang Static Analyzer nor the flow-sensitive
  warnings in Clang (such as uninitialized variables) support analyzing
  templates.

Reviewed By: gribozavr2, xazax.hun

Differential Revision: https://reviews.llvm.org/D150352
2023-05-15 11:04:51 +00:00
Sam McCall
a443b3d18e [dataflow] add HTML logger: browse code/cfg/analysis timeline/state
With -dataflow-log=/dir we will write /dir/0.html etc for each
function analyzed.

These files show the function's code and CFG, and the path through
the CFG taken by the analysis. At each analysis point we can see the
lattice state.

Currently the lattice state dump is not terribly useful but we can
improve this: showing values associated with the current Expr,
simplifying flow condition, highlighting changes etc.

(Trying not to let this patch scope-creep too much, so I ripped out the
half-finished features)

Demo: 9718fdd484/analysis.html

Differential Revision: https://reviews.llvm.org/D146591
2023-04-19 15:37:06 +02:00
Sam McCall
bf47c1ed85 [dataflow] Extract arena for Value/StorageLocation out of DataflowAnalysisContext
DataflowAnalysisContext has a few too many responsibilities, this narrows them.

It also allows the Arena to be shared with analysis steps, which need to create
Values, without exposing the whole DACtx API (flow conditions etc).
This means Environment no longer needs to proxy all these methods.
(For now it still does, because there are many callsites to update, and maybe
if we separate bool formulas from values we can avoid churning them twice)

In future, if we untangle the concepts of Values from boolean formulas/atoms,
Arena would also be responsible for creating formulas and managing atom IDs.

Differential Revision: https://reviews.llvm.org/D148554
2023-04-19 14:32:13 +02:00
Martin Braenne
effdfa7d5f [clang][dataflow] Use isRecordType() where appropriate.
This is less verbose than checking for class, struct, and union individually,
and I believe it's also more efficient (not that that should be the overriding
concern).

Reviewed By: sammccall, xazax.hun

Differential Revision: https://reviews.llvm.org/D147603
2023-04-06 07:26:24 +00:00
Jie Fu
c107231cde [clang][dataflow] Fix -Wdeprecated-declarations after D147302 (NFC)
Replace:
 1. createAtomicBoolValue() --> create<AtomicBoolValue>()
 2. createTopBoolValue()    --> create<TopBoolValue>()

/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:386:19: error: 'createAtomicBoolValue' is deprecated: use create<AtomicBoolValue> instead [-Werror,-Wdeprecated-declarations]
    return DACtx->createAtomicBoolValue();
                  ^~~~~~~~~~~~~~~~~~~~~
                  create<AtomicBoolValue>
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:215:3: note: 'createAtomicBoolValue' has been explicitly marked deprecated here
  LLVM_DEPRECATED("use create<AtomicBoolValue> instead",
  ^
/Users/jiefu/llvm-project/llvm/include/llvm/Support/Compiler.h:143:50: note: expanded from macro 'LLVM_DEPRECATED'
                                                 ^
In file included from /Users/jiefu/llvm-project/clang/lib/Analysis/FlowSensitive/Transfer.cpp:14:
In file included from /Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/Transfer.h:19:
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowEnvironment.h:391:19: error: 'createTopBoolValue' is deprecated: use create<TopBoolValue> instead [-Werror,-Wdeprecated-declarations]
    return DACtx->createTopBoolValue();
                  ^~~~~~~~~~~~~~~~~~
                  create<TopBoolValue>
/Users/jiefu/llvm-project/clang/include/clang/Analysis/FlowSensitive/DataflowAnalysisContext.h:227:3: note: 'createTopBoolValue' has been explicitly marked deprecated here
  LLVM_DEPRECATED("use create<TopBoolValue> instead", "create<TopBoolValue>")
  ^
/Users/jiefu/llvm-project/llvm/include/llvm/Support/Compiler.h:143:50: note: expanded from macro 'LLVM_DEPRECATED'
                                                 ^
2 errors generated.
2023-04-04 16:17:25 +08:00
Martin Braenne
745a957f9d [clang][dataflow] Add create<T>() methods to Environment and DataflowAnalysisContext.
These methods provide a less verbose way of allocating `StorageLocation`s and
`Value`s than the existing `takeOwnership(make_unique(...))` pattern.

In addition, because allocation of `StorageLocation`s and `Value`s now happens
within the `DataflowAnalysisContext`, the `create<T>()` open up the possibility
of using `BumpPtrAllocator` to allocate these objects if it turns out this
helps performance.

Reviewed By: ymandel, xazax.hun, gribozavr2

Differential Revision: https://reviews.llvm.org/D147302
2023-04-04 07:13:44 +00:00
Sam McCall
48f97e5751 [FlowSensitive] Log analysis progress for debugging purposes
The goal is to be able to understand how the analysis executes, and what its
incremental and final findings are, by enabling logging and reading the logs.
This should include both framework and analysis-specific information.

Ad-hoc printf-debugging doesn't seem sufficient for my understanding, at least.
Being able to check in logging, turn it on in a production binary, and quickly
find particular analysis steps within complex functions seem important.

This can be enabled programmatically through DataflowAnalysisOptions, or
via the flag -dataflow-log. (Works in unittests, clang-tidy, standalone
tools...)

Important missing pieces here:
 - a logger implementation that produces an interactive report (HTML file)
   which can be navigated via timeline/code/CFG.
   (I think the Logger interface is sufficient for this, but need to prototype).
 - display of the application-specific lattice
 - more useful display for the built-in environment
   (e.g. meaningful & consistent names for values, hiding redundant variables in
   the flow condition, hiding unreachable expressions)

Differential Revision: https://reviews.llvm.org/D144730
2023-03-23 11:36:21 +01:00
Sam McCall
ee2cd606ab [dataflow] Log flow condition to the correct stream.
Differential Revision: https://reviews.llvm.org/D146527
2023-03-22 10:57:21 +01:00
Kazu Hirata
7eaa7b0553 [clang] Use *{Map,Set}::contains (NFC) 2023-03-15 18:06:34 -07:00
Yitzhak Mandelbaum
089a54469f [clang][dataflow][NFC] Refine names and comments for field filtering.
Tweaks elements of the new API for filtering the set of modeled fields.

Differential Revision: https://reviews.llvm.org/D141319
2023-01-10 14:28:45 +00:00
Yitzhak Mandelbaum
264976d98e [clang][dataflow] Unify TransferOptions and DataflowAnalysisContext::Options.
Merges `TransferOptions` into the newly-introduced
`DataflowAnalysisContext::Options` and removes explicit parameter for
`TransferOptions`, relying instead on the common options carried by the analysis
context. Given that there was no intent to allow different options between calls
to `transfer`, a common value for the options is preferable.

Differential Revision: https://reviews.llvm.org/D140703
2023-01-10 14:17:25 +00:00
Yitzhak Mandelbaum
01ccf7b3ce Revert "Revert "[clang][dataflow] Only model struct fields that are used in the function being analyzed.""
This reverts commit 2b1a517a92. It's a fix forward
with two memory errors fixed, one of which was the cause of the build breakage
in the buildbots.

Original message:

Previously, the model for structs modeled all fields in a struct when
`createValue` was called for that type. This patch adds a prepass on the
function under analysis to discover the fields referenced in the scope and then
limits modeling to only those fields. This reduces wasted memory usage
(modeling unused fields) which can be important for programs that use large
structs.

Note: This patch obviates the need for https://reviews.llvm.org/D123032.
2023-01-09 19:32:10 +00:00
Yitzhak Mandelbaum
2b1a517a92 Revert "[clang][dataflow] Only model struct fields that are used in the function being analyzed."
This reverts commit 5e8f597c2f. It caused msan and ubsan breakages.
2023-01-06 01:07:28 +00:00
Yitzhak Mandelbaum
5e8f597c2f [clang][dataflow] Only model struct fields that are used in the function being analyzed.
Previously, the model for structs modeled all fields in a struct when
`createValue` was called for that type. This patch adds a prepass on the
function under analysis to discover the fields referenced in the scope and then
limits modeling to only those fields.  This reduces wasted memory usage
(modeling unused fields) which can be important for programss that use large
structs.

Note: This patch obviates the need for https://reviews.llvm.org/D123032.

Differential Revision: https://reviews.llvm.org/D140694
2023-01-05 21:46:39 +00:00
Yitzhak Mandelbaum
692e03039d [clang][dataflow] Add cache of ControlFlowContexts for function decls.
This patch modifies context-sensitive analysis of functions to use a cache,
rather than recreate the `ControlFlowContext` from a function decl on each
encounter. However, this is just step 1 (of N) in adding support for a
configurable map of "modeled" function decls (see issue #56879). The map will go
from the actual function decl to the `ControlFlowContext` used to model it. Only
functions pre-configured in the map will be modeled in a context-sensitive way.

We start with a cache because it introduces the desired map, while retaining the
current behavior. Here, functions are mapped to their actual implementations
(when available).

Differential Revision: https://reviews.llvm.org/D131039
2022-08-03 15:17:49 +00:00
Stanislav Gatev
817dd5e3fd [clang][dataflow] Rename member to make it clear that it isn't stable
Rename `DataflowAnalysisContext::getStableStorageLocation(QualType)`
to `createStorageLocation`, to make it clear that it doesn't return
a stable storage location.

Differential Revision: https://reviews.llvm.org/D131021

Reviewed-by: ymandel, xazax.hun, gribozavr2
2022-08-03 06:25:02 +00:00
Dmitri Gribenko
b5e3dac33d [clang][dataflow] Add explicit "AST" nodes for implications and iff
Previously we used to desugar implications and biconditionals into
equivalent CNF/DNF as soon as possible. However, this desugaring makes
debug output (Environment::dump()) less readable than it could be.
Therefore, it makes sense to keep the sugared representation of a
boolean formula, and desugar it in the solver.

Reviewed By: sgatev, xazax.hun, wyt

Differential Revision: https://reviews.llvm.org/D130519
2022-07-26 14:19:22 +02:00