The checker may create failure branches for all stream write operations
only if the new option "pedantic" is set to true.
Result of the write operations is often not checked in typical code. If
failure branches are created the checker will warn for unchecked write
operations and generate a lot of "false positives" (these are valid
warnings but the programmer does not care about this problem).
The class `KnownSVal` was very magical abstract class within the `SVal`
class hierarchy: with a hacky `classof` method it acted as if it was the
common ancestor of the classes `UndefinedSVal` and `DefinedSVal`.
However, it was only used in two `getAs<KnownSVal>()` calls and the
signatures of two methods, which does not "pay for" its weird behavior,
so I created this commit that removes it and replaces its use with more
straightforward solutions.
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Until now function `fseek` returned nonzero on error, this is changed to
-1 only. And it does not produce EOF error any more.
This complies better with the POSIX standard.
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
This reapplies 80ab8234ac again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d for details).
In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
The checker finds a type of undefined behavior, where if the type of a
pointer to an object-array is different from the objects' underlying
type, calling `delete[]` is undefined, as the size of the two objects
might be different.
The checker has been in alpha for a while now, it is a simple checker
that causes no crashes, and considering the severity of the issue, it
has a low result-count on open-source projects (in my last test-run on
my usual projects, it had 0 results).
This commit cleans up the documentation and adds docs for the limitation
related to tracking through references, in addition to moving it to
`cplusplus`.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
Co-authored-by: whisperity <whisperity@gmail.com>
According to POSIX 2018.
1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
bytes, or be a NULL pointer.
Additionally, if *lineptr is not NULL, *n must not be undefined.
The checker alpha.security.ArrayBoundV2 performs bounds checking in two
steps: first it checks for underflow, and if it isn't guaranteed then it
assumes that there is no underflow. After this, it checks for overflow,
and if that's guaranteed or the index is tainted then it reports it.
This meant that in situations where overflow and underflow are both
possible (but the index is either tainted or guaranteed to be invalid),
the checker was reporting just an overflow error.
This commit modifies the messages printed in these cases to mention the
possibility of an underflow.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
* Add support for multiple, potentially overlapping critical sections:
The checker can now simultaneously handle several mutex's critical
sections without confusing them.
* Implement the handling of recursive mutexes:
By identifying the lock events, recursive mutexes are now supported.
A lock event is a pair of a lock expression, and the SVal of the mutex
that it locks, so even multiple locks of the same mutex (and even by
the same expression) is now supported.
* Refine the note tags generated by the checker:
The note tags now correctly show just for mutexes that are
active at the point of error, and multiple acquisitions of the same mutex
are also noted.
These functions should not be allowed if the file position is
indeterminate (they return the file position).
This condition is now checked, and tests are improved to check it.
This PR makes alpha.webkit.UncountedLocalVarsChecker ignore raw
references and pointers to a ref counted type which appears within
"trival" statements. To do this, this PR extends TrivialFunctionAnalysis
so that it can also analyze "triviality" of statements as well as that
of functions Each Visit* function is now augmented with
withCachedResult, which is responsible for looking up and updating the
cache for each Visit* functions.
As this PR dramatically improves the false positive rate of the checker,
it also deletes the code to ignore raw pointers and references within if
and for statements.
`getdelim` and `getline` may free, allocate, or re-allocate the input
buffer, ensuring its size is enough to hold the incoming line, the
delimiter, and the null terminator.
`*lineptr` must be a valid argument to `free`, which means it can be
either
1. `NULL`, in which case these functions perform an allocation
equivalent to a call to `malloc` even on failure.
2. A pointer returned by the `malloc` family of functions. Other
pointers are UB (`alloca`, a pointer to a static, to a stack variable, etc.)
`va_list` is a platform-specific type. On some, it is a struct instead
of a pointer to a struct, so `lookupFn` was ignoring calls to `vfprintf`
and `vfscanf`.
`stream.c` now runs in four different platforms to make sure the logic
works across targets.
In PR #83677 I was surprised to see that outdated checker callback
signatures are a problem. It turns out, we need the `registerChecker...`
function to invoke the `Mgr.registerChecker<>()` which would instantiate
the `_register` calls, that would take the address of the defined
checker callbacks. Consequently, if the expected signatures mismatch, it
won't compile from now on, so we have static guarantee that this issue
never pops up again.
Given we need the `register` call, at this point we could just hook this
checker into the `debug` package and make it never registered. It
shouldn't hurt anyone :)
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
If a stream operation fails the position can become "indeterminate".
This may cause warning from the checker at a later operation. The new
note tag shows the place where the position becomes "indeterminate",
this is where a failure occurred.
Model `getc` and `putc` as equivalent to `fgetc` and `fputc` respectively.
Model `vfscanf` and `vfprintf` as `fscanf` and `fprintf`, except that
`vfscanf` can not invalidate the parameters due to the indirection via a
`va_list`. Nevertheless, we can still track EOF and errors as for `fscanf`.
The checker reported a false positive on this code
void testTaintedSanitizedVLASize(void) {
int x;
scanf("%d", &x);
if (x<1)
return;
int vla[x]; // no-warning
}
After the fix, the checker only emits tainted warning if the vla size is
coming from a tainted source and it cannot prove that it is positive.
Specific arguments passed to stream handling functions are changed by
the function, this means these should be invalidated ("escaped") by the
analyzer. This change adds the argument invalidation (in specific cases)
to the checker.
To fix https://github.com/llvm/llvm-project/issues/81597, I'm planning
to refactor the usage of CallDescription; and as I was preparing for
this I noticed that there are two superfluous references to this header.
A memory access is an out of bounds error if the offset is < the extent
of the memory region. Notice that here "<" is a _mathematical_
comparison between two numbers and NOT a C/C++ operator that compares
two typed C++ values: for example -1 < 1000 is true in mathematics, but
if the `-1` is an `int` and the `1000` is a `size_t` value, then
evaluating the C/C++ operator `<` will return false because the `-1`
will be converted to `SIZE_MAX` by the automatic type conversions.
This means that it's incorrect to perform a bounds check with
`evalBinOpNN(State, BO_LT, ...)` which performs automatic conversions
and can produce wildly incorrect results.
ArrayBoundsCheckerV2 already had a special case where it avoided calling
`evalBinOpNN` in a situation where it would have performed an automatic
conversion; this commit replaces that code with a more general one that
covers more situations. (It's still not perfect, but it's better than
the previous version and I think it will cover practically all
real-world code.)
Note that this is not a limitation/bug of the simplification algorithm
defined in `getSimplifedOffsets()`: the simplification is not applied in
the test case `test_comparison_with_extent_symbol` (because the `Extent`
is not a concrete int), but without the new code it would still run into
a `-1 < UNSIGNED` comparison that evaluates to false because
`evalBinOpNN` performs an automatic type conversion.
Function 'fileno' fails only if invalid pointer is passed, this is a
case that is often ignored in source code. The failure case leads to
many "false positive" reports when `fileno` returns -1 and this is not
checked in the program. Because this, the function is now assumed
to not fail (this is assumption that the passed file pointer is correct).
The change affects `StdCLibraryFunctionsChecker` and
`StreamChecker`.
This PR adds the support for WebKit's RefAllowingPartiallyDestroyed and
RefPtrAllowingPartiallyDestroyed, which are smart pointer types which
may be used after the destructor had started running.
This PR makes the checker ignore / skip calls to methods of Web Template
Platform's container types such as HashMap, HashSet, WeakHashSet,
WeakHashMap, Vector, etc...
Continuation of commit 42b5037, apply changes to the remaining
functions.
Code for function `fflush` was not changed, because it is more special
compared to the others.
Now calling `open` with the `O_CREAT` flag and no mode parameter will
raise an issue in any system that defines `O_CREAT`.
The value for this flag is obtained after the full source code has been
parsed, leveraging `checkASTDecl`.
Hence, any `#define` or `#undefine` of `O_CREAT` following an `open` may
alter the results. Nevertheless, since redefining reserved identifiers
is UB, this is probably ok.
A class is added that contains common functions and data members that are used in many of the "eval" functions. This results in shorter "eval" functions and less code repetition.
Allow address-of operator (&), enum constant, and a reference to
constant as well as materializing temporqary expression and an
expression with cleanups to appear within a trivial function.
This PR introduces the concept of a "trivial function" which applies to
a function that only calls other trivial functions and contain literals
and expressions that don't result in heap mutations (specifically it
does not call deref). This is implemented using ConstStmtVisitor and
checking each statement and expression's trivialness.
This PR also introduces the concept of a "ingleton function", which is a
static member function or a free standing function which ends with the
suffix "singleton". Such a function's return value is understood to be
safe to call any function with.
This PR makes the checker not emit warning when a function is called
with a return value of another function when the return value is of type
Ref<T> or RefPtr<T>.
This PR makes alpha.webkit.UncountedCallArgsChecker eplicitly check the
safety of the object argument in a member function call. It also removes
the exemption of local variables from this checker so that each local
variable's safety is checked if it's used in a function call instead of
relying on the local variable checker to find those since local variable
checker currently has exemption for "for" and "if" statements.
The attribute is now allowed on an assortment of declarations, to
suppress warnings related to declarations themselves, or all warnings in
the lexical scope of the declaration.
I don't necessarily see a reason to have a list at all, but it does look
as if some of those more niche items aren't properly supported by the
compiler itself so let's maintain a short safe list for now.
The initial implementation raised a question whether the attribute
should apply to lexical declaration context vs. "actual" declaration
context. I'm using "lexical" here because it results in less warnings
suppressed, which is the conservative behavior: we can always expand it
later if we think this is wrong, without breaking any existing code. I
also think that this is the correct behavior that we will probably never
want to change, given that the user typically desires to keep the
suppressions as localized as possible.
This PR aligns the evaluation of default arguments with other kinds of
arguments by extracting the expressions within them as argument values
to be evaluated.
The bug was caused by isRefCountable erroneously returning false for a
class with both ref() and deref() functions defined because we were not
resetting the base paths results between looking for "ref()" and
"deref()"