The checker may create failure branches for all stream write operations
only if the new option "pedantic" is set to true.
Result of the write operations is often not checked in typical code. If
failure branches are created the checker will warn for unchecked write
operations and generate a lot of "false positives" (these are valid
warnings but the programmer does not care about this problem).
The class `KnownSVal` was very magical abstract class within the `SVal`
class hierarchy: with a hacky `classof` method it acted as if it was the
common ancestor of the classes `UndefinedSVal` and `DefinedSVal`.
However, it was only used in two `getAs<KnownSVal>()` calls and the
signatures of two methods, which does not "pay for" its weird behavior,
so I created this commit that removes it and replaces its use with more
straightforward solutions.
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Until now function `fseek` returned nonzero on error, this is changed to
-1 only. And it does not produce EOF error any more.
This complies better with the POSIX standard.
HLSL constant sized array function parameters do not decay to pointers.
Instead constant sized array types are preserved as unique types for
overload resolution, template instantiation and name mangling.
This implements the change by adding a new `ArrayParameterType` which
represents a non-decaying `ConstantArrayType`. The new type behaves the
same as `ConstantArrayType` except that it does not decay to a pointer.
Values of `ConstantArrayType` in HLSL decay during overload resolution
via a new `HLSLArrayRValue` cast to `ArrayParameterType`.
`ArrayParamterType` values are passed indirectly by-value to functions
in IR generation resulting in callee generated memcpy instructions.
The behavior of HLSL function calls is documented in the [draft language
specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf)
under the Expr.Post.Call heading.
Additionally the design of this implementation approach is documented in
[Clang's
documentation](https://clang.llvm.org/docs/HLSL/FunctionCalls.html)
Resolves#70123
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
When debugging CSA issues, sometimes it would be useful to have a
dedicated note for the analysis entry point, aka. the function name you
would need to pass as "-analyze-function=XYZ" to reproduce a specific
issue.
One way we use (or will use) this downstream is to provide tooling on
top of creduce to enhance to supercharge productivity by automatically
reduce cases on crashes for example.
This will be added only if the "-analyzer-note-analysis-entry-points" is
set or the "analyzer-display-progress" is on.
This additional entry point marker will be the first "note" if enabled,
with the following message: "[debug] analyzing from XYZ". They are
prefixed by "[debug]" to remind the CSA developer that this is only
meant to be visible for them, for debugging purposes.
CPP-5012
This reapplies 80ab8234ac again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d for details).
In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The checker finds a type of undefined behavior, where if the type of a
pointer to an object-array is different from the objects' underlying
type, calling `delete[]` is undefined, as the size of the two objects
might be different.
The checker has been in alpha for a while now, it is a simple checker
that causes no crashes, and considering the severity of the issue, it
has a low result-count on open-source projects (in my last test-run on
my usual projects, it had 0 results).
This commit cleans up the documentation and adds docs for the limitation
related to tracking through references, in addition to moving it to
`cplusplus`.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Co-authored-by: whisperity <whisperity@gmail.com>
According to POSIX 2018.
1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
bytes, or be a NULL pointer.
Additionally, if *lineptr is not NULL, *n must not be undefined.
Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.
When a constructor (of which we evaluate the initializer-list) is
analyzed in top-level context, then the `this` pointer will be
represented by a `SymbolicRegion`, (as it should be).
This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.
```c++
class Bear {
public:
void brum() const;
};
class Door {
public:
// PostInitializer would refer to "FieldRegion{SymRegion{this}}"
// whereas in the store and everywhere else it would be:
// "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
Door() : ptr(nullptr) {
ptr->brum(); // Bug
}
private:
Bear* ptr;
};
```
We (as CSA folks) decided to avoid the creation of FieldRegions directly
of symbolic regions in the past:
f8643a9b31
---
In this patch, I propose to also canonicalize it as in the mentioned
patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0,
Ty}`
This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting in
between.
This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we
use `PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.
Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to follow
the region of the PostInitializer inside the StoreSiteFinder visitor -
because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional<PostInitializer> PIP =
Pred->getLocationAs<PostInitializer>()) {
const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
}
}
```
Notice that the equality check didn't pass for the regions I'm
canonicalizing in this patch.
Given the nature of this change, we would rather upstream this patch.
CPP-4954
The checker alpha.security.ArrayBoundV2 performs bounds checking in two
steps: first it checks for underflow, and if it isn't guaranteed then it
assumes that there is no underflow. After this, it checks for overflow,
and if that's guaranteed or the index is tainted then it reports it.
This meant that in situations where overflow and underflow are both
possible (but the index is either tainted or guaranteed to be invalid),
the checker was reporting just an overflow error.
This commit modifies the messages printed in these cases to mention the
possibility of an underflow.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
* Add support for multiple, potentially overlapping critical sections:
The checker can now simultaneously handle several mutex's critical
sections without confusing them.
* Implement the handling of recursive mutexes:
By identifying the lock events, recursive mutexes are now supported.
A lock event is a pair of a lock expression, and the SVal of the mutex
that it locks, so even multiple locks of the same mutex (and even by
the same expression) is now supported.
* Refine the note tags generated by the checker:
The note tags now correctly show just for mutexes that are
active at the point of error, and multiple acquisitions of the same mutex
are also noted.
StaticAnalyzer didn't check if the variable is declared in
`CompoundStmt` under `SwitchStmt`, which make static analyzer reach root
without finding the declaration.
Fixes#68819
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This reverts commit e48d5a838f.
Fails to build on x86-64 w/gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
with the following message:
../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:41:28: error: declaration of ‘std::unique_ptr<clang::ASTUnit> IsCLibraryFunctionTest::ASTUnit’ changes meaning of ‘ASTUnit’ [-fpermissive]
41 | std::unique_ptr<ASTUnit> ASTUnit;
| ^~~~~~~
In file included from ../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:4:
../llvm-project/clang/include/clang/Frontend/ASTUnit.h:89:7: note: ‘ASTUnit’ declared here as ‘class clang::ASTUnit’
89 | class ASTUnit {
| ^~~~~~~
From issue #73088. I changed `NodeBuilderContext` into a class.
Additionally, there were some other mentions of the former being a
struct which I also changed into a class. This is my first time working
with an issue so I will be open to hearing any advice or changes that
need to be done.
Previously, the function `isCLibraryFunction()` and logic relying on it
only accepted functions that are declared directly within a TU (i.e. not
in a namespace or a class). However C++ headers like <cstdlib> declare
many C standard library functions within the namespace `std`, so this
commit ensures that functions within the namespace `std` are also
accepted.
After this commit it will be possible to match functions like `malloc`
or `free` with `CallDescription::Mode::CLibrary`.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
These functions should not be allowed if the file position is
indeterminate (they return the file position).
This condition is now checked, and tests are improved to check it.
This PR makes alpha.webkit.UncountedLocalVarsChecker ignore raw
references and pointers to a ref counted type which appears within
"trival" statements. To do this, this PR extends TrivialFunctionAnalysis
so that it can also analyze "triviality" of statements as well as that
of functions Each Visit* function is now augmented with
withCachedResult, which is responsible for looking up and updating the
cache for each Visit* functions.
As this PR dramatically improves the false positive rate of the checker,
it also deletes the code to ignore raw pointers and references within if
and for statements.
`getdelim` and `getline` may free, allocate, or re-allocate the input
buffer, ensuring its size is enough to hold the incoming line, the
delimiter, and the null terminator.
`*lineptr` must be a valid argument to `free`, which means it can be
either
1. `NULL`, in which case these functions perform an allocation
equivalent to a call to `malloc` even on failure.
2. A pointer returned by the `malloc` family of functions. Other
pointers are UB (`alloca`, a pointer to a static, to a stack variable, etc.)
This commit adds a testcase which highlights the current incorrect
behavior of the CSA diagnostic generation: it produces a note which says
"Assuming 'arg' is >= 0" in a situation where this is not a fresh
assumption because 'arg' is an unsigned integer.
I also created ticket 78440 to track this bug.
`va_list` is a platform-specific type. On some, it is a struct instead
of a pointer to a struct, so `lookupFn` was ignoring calls to `vfprintf`
and `vfscanf`.
`stream.c` now runs in four different platforms to make sure the logic
works across targets.
In PR #83677 I was surprised to see that outdated checker callback
signatures are a problem. It turns out, we need the `registerChecker...`
function to invoke the `Mgr.registerChecker<>()` which would instantiate
the `_register` calls, that would take the address of the defined
checker callbacks. Consequently, if the expected signatures mismatch, it
won't compile from now on, so we have static guarantee that this issue
never pops up again.
Given we need the `register` call, at this point we could just hook this
checker into the `debug` package and make it never registered. It
shouldn't hurt anyone :)
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
If a stream operation fails the position can become "indeterminate".
This may cause warning from the checker at a later operation. The new
note tag shows the place where the position becomes "indeterminate",
this is where a failure occurred.
Model `getc` and `putc` as equivalent to `fgetc` and `fputc` respectively.
Model `vfscanf` and `vfprintf` as `fscanf` and `fprintf`, except that
`vfscanf` can not invalidate the parameters due to the indirection via a
`va_list`. Nevertheless, we can still track EOF and errors as for `fscanf`.
The checker reported a false positive on this code
void testTaintedSanitizedVLASize(void) {
int x;
scanf("%d", &x);
if (x<1)
return;
int vla[x]; // no-warning
}
After the fix, the checker only emits tainted warning if the vla size is
coming from a tainted source and it cannot prove that it is positive.
Specific arguments passed to stream handling functions are changed by
the function, this means these should be invalidated ("escaped") by the
analyzer. This change adds the argument invalidation (in specific cases)
to the checker.
To fix https://github.com/llvm/llvm-project/issues/81597, I'm planning
to refactor the usage of CallDescription; and as I was preparing for
this I noticed that there are two superfluous references to this header.
A memory access is an out of bounds error if the offset is < the extent
of the memory region. Notice that here "<" is a _mathematical_
comparison between two numbers and NOT a C/C++ operator that compares
two typed C++ values: for example -1 < 1000 is true in mathematics, but
if the `-1` is an `int` and the `1000` is a `size_t` value, then
evaluating the C/C++ operator `<` will return false because the `-1`
will be converted to `SIZE_MAX` by the automatic type conversions.
This means that it's incorrect to perform a bounds check with
`evalBinOpNN(State, BO_LT, ...)` which performs automatic conversions
and can produce wildly incorrect results.
ArrayBoundsCheckerV2 already had a special case where it avoided calling
`evalBinOpNN` in a situation where it would have performed an automatic
conversion; this commit replaces that code with a more general one that
covers more situations. (It's still not perfect, but it's better than
the previous version and I think it will cover practically all
real-world code.)
Note that this is not a limitation/bug of the simplification algorithm
defined in `getSimplifedOffsets()`: the simplification is not applied in
the test case `test_comparison_with_extent_symbol` (because the `Extent`
is not a concrete int), but without the new code it would still run into
a `-1 < UNSIGNED` comparison that evaluates to false because
`evalBinOpNN` performs an automatic type conversion.
Function 'fileno' fails only if invalid pointer is passed, this is a
case that is often ignored in source code. The failure case leads to
many "false positive" reports when `fileno` returns -1 and this is not
checked in the program. Because this, the function is now assumed
to not fail (this is assumption that the passed file pointer is correct).
The change affects `StdCLibraryFunctionsChecker` and
`StreamChecker`.
This PR adds the support for WebKit's RefAllowingPartiallyDestroyed and
RefPtrAllowingPartiallyDestroyed, which are smart pointer types which
may be used after the destructor had started running.
This PR makes the checker ignore / skip calls to methods of Web Template
Platform's container types such as HashMap, HashSet, WeakHashSet,
WeakHashMap, Vector, etc...