The class `KnownSVal` was very magical abstract class within the `SVal`
class hierarchy: with a hacky `classof` method it acted as if it was the
common ancestor of the classes `UndefinedSVal` and `DefinedSVal`.
However, it was only used in two `getAs<KnownSVal>()` calls and the
signatures of two methods, which does not "pay for" its weird behavior,
so I created this commit that removes it and replaces its use with more
straightforward solutions.
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.
Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.
Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.
This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.
This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).
The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.
After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.
This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.
After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.
This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.
As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
HLSL constant sized array function parameters do not decay to pointers.
Instead constant sized array types are preserved as unique types for
overload resolution, template instantiation and name mangling.
This implements the change by adding a new `ArrayParameterType` which
represents a non-decaying `ConstantArrayType`. The new type behaves the
same as `ConstantArrayType` except that it does not decay to a pointer.
Values of `ConstantArrayType` in HLSL decay during overload resolution
via a new `HLSLArrayRValue` cast to `ArrayParameterType`.
`ArrayParamterType` values are passed indirectly by-value to functions
in IR generation resulting in callee generated memcpy instructions.
The behavior of HLSL function calls is documented in the [draft language
specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf)
under the Expr.Post.Call heading.
Additionally the design of this implementation approach is documented in
[Clang's
documentation](https://clang.llvm.org/docs/HLSL/FunctionCalls.html)
Resolves#70123
In PR #79382, I need to add a new type that derives from
ConstantArrayType. This means that ConstantArrayType can no longer use
`llvm::TrailingObjects` to store the trailing optional Expr*.
This change refactors ConstantArrayType to store a 60-bit integer and
4-bits for the integer size in bytes. This replaces the APInt field
previously in the type but preserves enough information to recreate it
where needed.
To reduce the number of places where the APInt is re-constructed I've
also added some helper methods to the ConstantArrayType to allow some
common use cases that operate on either the stored small integer or the
APInt as appropriate.
Resolves#85124.
When debugging CSA issues, sometimes it would be useful to have a
dedicated note for the analysis entry point, aka. the function name you
would need to pass as "-analyze-function=XYZ" to reproduce a specific
issue.
One way we use (or will use) this downstream is to provide tooling on
top of creduce to enhance to supercharge productivity by automatically
reduce cases on crashes for example.
This will be added only if the "-analyzer-note-analysis-entry-points" is
set or the "analyzer-display-progress" is on.
This additional entry point marker will be the first "note" if enabled,
with the following message: "[debug] analyzing from XYZ". They are
prefixed by "[debug]" to remind the CSA developer that this is only
meant to be visible for them, for debugging purposes.
CPP-5012
This reapplies 80ab8234ac again, after
fixing a name collision warning in the unit tests (see the revert commit
13ccaf9b9d for details).
In addition to the previously applied changes, this commit also clarifies the
code in MallocChecker that distinguishes POSIX "getline()" and C++ standard
library "std::getline()" (which are two completely different functions). Note
that "std::getline()" was (accidentally) handled correctly even without this
clarification; but it's better to explicitly handle and test this corner case.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
According to POSIX 2018.
1. lineptr, n and stream can not be NULL.
2. If *n is non-zero, *lineptr must point to a region of at least *n
bytes, or be a NULL pointer.
Additionally, if *lineptr is not NULL, *n must not be undefined.
Inside the ExprEngine when we process the initializers, we create a
PostInitializer program-point, which will refer to the field being
initialized, see `FieldLoc` inside `ExprEngine::ProcessInitializer`.
When a constructor (of which we evaluate the initializer-list) is
analyzed in top-level context, then the `this` pointer will be
represented by a `SymbolicRegion`, (as it should be).
This means that we will form a `FieldRegion{SymbolicRegion{.}}` as the
initialized region.
```c++
class Bear {
public:
void brum() const;
};
class Door {
public:
// PostInitializer would refer to "FieldRegion{SymRegion{this}}"
// whereas in the store and everywhere else it would be:
// "FieldRegion{ELementRegion{SymRegion{Ty*, this}, 0, Ty}".
Door() : ptr(nullptr) {
ptr->brum(); // Bug
}
private:
Bear* ptr;
};
```
We (as CSA folks) decided to avoid the creation of FieldRegions directly
of symbolic regions in the past:
f8643a9b31
---
In this patch, I propose to also canonicalize it as in the mentioned
patch, into this: `FieldRegion{ElementRegion{SymbolicRegion{Ty*, .}, 0,
Ty}`
This would mean that FieldRegions will/should never simply wrap a
SymbolicRegion directly, but rather an ElementRegion that is sitting in
between.
This patch should have practically no observable effects, as the store
(due to the mentioned patch) was made resilient to this issue, but we
use `PostInitializer::getLocationValue()` for an alternative reporting,
where we faced this issue.
Note that in really rare cases it suppresses now dereference bugs, as
demonstrated in the test. It is because in the past we failed to follow
the region of the PostInitializer inside the StoreSiteFinder visitor -
because it was using this code:
```c++
// If this is a post initializer expression, initializing the region, we
// should track the initializer expression.
if (std::optional<PostInitializer> PIP =
Pred->getLocationAs<PostInitializer>()) {
const MemRegion *FieldReg = (const MemRegion *)PIP->getLocationValue();
if (FieldReg == R) {
StoreSite = Pred;
InitE = PIP->getInitializer()->getInit();
}
}
```
Notice that the equality check didn't pass for the regions I'm
canonicalizing in this patch.
Given the nature of this change, we would rather upstream this patch.
CPP-4954
StaticAnalyzer didn't check if the variable is declared in
`CompoundStmt` under `SwitchStmt`, which make static analyzer reach root
without finding the declaration.
Fixes#68819
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This reverts commit e48d5a838f.
Fails to build on x86-64 w/gcc version 11.4.0 (Ubuntu 11.4.0-1ubuntu1~22.04)
with the following message:
../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:41:28: error: declaration of ‘std::unique_ptr<clang::ASTUnit> IsCLibraryFunctionTest::ASTUnit’ changes meaning of ‘ASTUnit’ [-fpermissive]
41 | std::unique_ptr<ASTUnit> ASTUnit;
| ^~~~~~~
In file included from ../llvm-project/clang/unittests/StaticAnalyzer/IsCLibraryFunctionTest.cpp:4:
../llvm-project/clang/include/clang/Frontend/ASTUnit.h:89:7: note: ‘ASTUnit’ declared here as ‘class clang::ASTUnit’
89 | class ASTUnit {
| ^~~~~~~
From issue #73088. I changed `NodeBuilderContext` into a class.
Additionally, there were some other mentions of the former being a
struct which I also changed into a class. This is my first time working
with an issue so I will be open to hearing any advice or changes that
need to be done.
Previously, the function `isCLibraryFunction()` and logic relying on it
only accepted functions that are declared directly within a TU (i.e. not
in a namespace or a class). However C++ headers like <cstdlib> declare
many C standard library functions within the namespace `std`, so this
commit ensures that functions within the namespace `std` are also
accepted.
After this commit it will be possible to match functions like `malloc`
or `free` with `CallDescription::Mode::CLibrary`.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
`getdelim` and `getline` may free, allocate, or re-allocate the input
buffer, ensuring its size is enough to hold the incoming line, the
delimiter, and the null terminator.
`*lineptr` must be a valid argument to `free`, which means it can be
either
1. `NULL`, in which case these functions perform an allocation
equivalent to a call to `malloc` even on failure.
2. A pointer returned by the `malloc` family of functions. Other
pointers are UB (`alloca`, a pointer to a static, to a stack variable, etc.)
This commit adds a testcase which highlights the current incorrect
behavior of the CSA diagnostic generation: it produces a note which says
"Assuming 'arg' is >= 0" in a situation where this is not a fresh
assumption because 'arg' is an unsigned integer.
I also created ticket 78440 to track this bug.
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.
However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597
To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.
Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
- `CDF_None` was the default "match anything" mode,
- `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).
Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
HLSL supports vector truncation and element conversions as part of
standard conversion sequences. The vector truncation conversion is a C++
second conversion in the conversion sequence. If a vector truncation is
in a conversion sequence an element conversion may occur after it before
the standard C++ third conversion.
Vector element conversions can be boolean conversions, floating point or
integral conversions or promotions.
[HLSL Draft
Specification](https://microsoft.github.io/hlsl-specs/specs/hlsl.pdf)
---------
Co-authored-by: Aaron Ballman <aaron@aaronballman.com>
The attribute is now allowed on an assortment of declarations, to
suppress warnings related to declarations themselves, or all warnings in
the lexical scope of the declaration.
I don't necessarily see a reason to have a list at all, but it does look
as if some of those more niche items aren't properly supported by the
compiler itself so let's maintain a short safe list for now.
The initial implementation raised a question whether the attribute
should apply to lexical declaration context vs. "actual" declaration
context. I'm using "lexical" here because it results in less warnings
suppressed, which is the conservative behavior: we can always expand it
later if we think this is wrong, without breaking any existing code. I
also think that this is the correct behavior that we will probably never
want to change, given that the user typically desires to keep the
suppressions as localized as possible.
'serial', 'parallel', and 'kernel' constructs are all considered
'Compute' constructs. This patch creates the AST type, plus the required
infrastructure for such a type, plus some base types that will be useful
in the future for breaking this up.
The only difference between the three is the 'kind'( plus some minor
clause legalization rules, but those can be differentiated easily
enough), so rather than representing them as separate AST nodes, it
seems
to make sense to make them the same.
Additionally, no clause AST functionality is being implemented yet, as
that fits better in a separate patch, and this is enough to get the
'naked' constructs implemented.
This is otherwise an 'NFC' patch, as it doesn't alter execution at all,
so there aren't any tests. I did this to break up the review workload
and to get feedback on the layout.
This is a follow-up for 721dd3bc2 [analyzer] NFC: Don't regenerate
duplicate HTML reports.
Because HTMLRewriter re-runs the Lexer for syntax highlighting and macro
expansion purposes, it may get fairly expensive when the rewriter is
invoked multiple times on the same file. In the static analyzer (which
uses HTMLRewriter for HTML output mode) we only get away with this
because there are usually very few reports emitted per file. But if loud
checkers are enabled, such as `webkit.*`, this may explode in complexity
and even cause the compiler to run over the 32-bit SourceLocation
addressing limit.
This patch caches intermediate results so that re-lexing only needed to
happen once.
As the clever __COUNTER__ test demonstrates, "once" is still too many.
Ideally we shouldn't re-lex anything at all, which remains a TODO.
There are currently a few checkers that don't fill in the bug report's
"decl-with-issue" field (typically a function in which the bug is
found).
The new attribute `[[clang::suppress]]` uses decl-with-issue to reduce
the size of the suppression source range map so that it didn't need to
do that for the entire translation unit.
I'm already seeing a few problems with this approach so I'll probably
redesign it in some point as it looks like a premature optimization. Not
only checkers shouldn't be required to pass decl-with-issue (consider
clang-tidy checkers that never had such notion), but also it's not
necessarily uniquely determined (consider leak suppressions at
allocation site).
For now I'm adding a simple stop-gap solution that falls back to
building the suppression map for the entire TU whenever decl-with-issue
isn't specified. Which won't happen in the default setup because luckily
all default checkers do provide decl-with-issue.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
`OpaqueValueExpr` doesn't necessarily contain a source expression.
Particularly, after #78041, it is used to carry the type and the value
kind of a non-type template argument of floating-point type or referring
to a subobject (those are so called `StructuralValue` arguments).
This fixes#79575.
Implements https://isocpp.org/files/papers/P2662R3.pdf
The feature is exposed as an extension in older language modes.
Mangling is not yet supported and that is something we will have to do before release.
Previously the function `RangeConstraintManager::printValue()` crashed
when it encountered an empty rangeset (because `RangeSet::getBitwidth()`
and `RangeSet::isUnsigned()` assert that the rangeset is not empty).
This commit adds a special case that avoids this behavior.
As `printValue()` is only used by the checker debug.ExprInspection (and
during manual debugging), the impacts of this commit are very limited.
---------
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This is a performance optimization for HTML diagnostics output mode.
Currently they're incredibly inefficient:
* The HTMLRewriter is re-run from scratch on every file on every report.
Each such re-run involves re-lexing the entire file and producing
a syntax-highlighted webpage of the entire file, with text behind macros
duplicated as pop-up macro expansion tooltips. Then, warning and note
bubbles are injected into the page. Only the bubble part is different
across reports; everything else can theoretically be cached.
* Additionally, if duplicate reports are emitted (with the same issue hash),
HTMLRewriter will be re-run even though the output file is going to be
discarded due to filename collision. This is mostly an issue for
path-insensitive bug reports because path-sensitive bug reports
are already deduplicated by the BugReporter as part of searching
for the shortest bug path. But on some translation units almost 80% of
bug reports are dry-run here.
We only get away with all this because there are usually very few reports
emitted per file. But if loud checkers are enabled, such as `webkit.*`,
this may explode in complexity and even cause the compiler to run over
the 32-bit SourceLocation addressing limit. (We're re-lexing everything
each time, remember?)
This patch hotfixes the *second* problem. Adds a FIXME for the first problem,
which will require more yak shaving to solve.
rdar://120801986
Cleanup most of the lazy-init `BugType` legacy.
Some will be preserved, as those are slightly more complicated to
refactor.
Notice, that the default category for `BugType` is `LogicError`. I
omitted setting this explicitly where I could.
Please, actually have a look at the diff. I did this manually, and we
rarely check the bug type descriptions and stuff in tests, so the
testing might be shallow on this one.
The new attribute can be placed on statements in order to suppress
arbitrary warnings produced by static analysis tools at those statements.
Previously such suppressions were implemented as either informal comments
(eg. clang-tidy `// NOLINT:`) or with preprocessor macros (eg.
clang static analyzer's `#ifdef __clang_analyzer__`). The attribute
provides a universal, formal, flexible and neat-looking suppression mechanism.
Implement support for the new attribute in the clang static analyzer;
clang-tidy coming soon.
The attribute allows specifying which specific warnings to suppress,
in the form of free-form strings that are intended to be specific to
the tools, but currently none are actually supported; so this is also
going to be a future improvement.
Differential Revision: https://reviews.llvm.org/D93110
This patch replaces uses of StringRef::{starts,ends}with with
StringRef::{starts,ends}_with for consistency with
std::{string,string_view}::{starts,ends}_with in C++20.
I'm planning to deprecate and eventually remove
StringRef::{starts,ends}with.
This commit extends the class `SValBuilder` with the methods
`getMinValue()` and `getMaxValue()` to that work like
`SValBuilder::getKnownValue()` but return the minimal/maximal possible
value the `SVal` is not perfectly constrained.
This extension of the ConstraintManager API is discussed at:
https://discourse.llvm.org/t/expose-the-inferred-range-information-in-warning-messages/75192
As a simple proof-of-concept application of this new API, this commit
extends a message from `core.BitwiseShift` with some range information
that reports the assumptions of the analyzer.
My main motivation for adding these methods is that I'll also want to
use them in `ArrayBoundCheckerV2` to make the error messages less
awkward, but I'm starting with this simpler and less important usecase
because I want to avoid merge conflicts with my other commit
https://github.com/llvm/llvm-project/pull/72107 which is currently under
review.
The testcase `too_large_right_operand_compound()` shows a situation
where querying the range information does not work (and the extra
information is not added to the error message). This also affects the
debug utility `clang_analyzer_value()`, so the problem isn't in the
fresh code. I'll do some investigations to resolve this, but I think
that this commit is a step forward even with this limitation.
...to model the results of alloca() and _alloca() calls. Previously it
acted as if these functions were returning memory from the heap, which
led to alpha.security.ArrayBoundV2 producing incorrect messages.
As my BSc thesis I've implemented a checker for std::variant and
std::any, and in the following weeks I'll upload a revised version of
them here.
# Prelude
@Szelethus and I sent out an email with our initial plans here:
https://discourse.llvm.org/t/analyzer-new-checker-for-std-any-as-a-bsc-thesis/65613/2
We also created a stub checker patch here:
https://reviews.llvm.org/D142354.
Upon the recommendation of @haoNoQ , we explored an option where instead
of writing a checker, we tried to improve on how the analyzer natively
inlined the methods of std::variant and std::any. Our attempt is in this
patch https://reviews.llvm.org/D145069, but in a nutshell, this is what
happened: The analyzer was able to model much of what happened inside
those classes, but our false positive suppression machinery erroneously
suppressed it. After months of trying, we could not find a satisfying
enhancement on the heuristic without introducing an allowlist/denylist
of which functions to not suppress.
As a result (and partly on the encouragement of @Xazax-hun) I wrote a
dedicated checker!
The advantage of the checker is that it is not dependent on the
standard's implementation and won't put warnings in the standard library
definitions. Also without the checker it would be difficult to create
nice user-friendly warnings and NoteTags -- as per the standard's
specification, the analysis is sinked by an exception, which we don't
model well now.
# Design ideas
The working of the checker is straightforward: We find the creation of
an std::variant instance, store the type of the variable we want to
store in it, then save this type for the instance. When retrieving type
from the instance we check what type we want to retrieve as, and compare
it to the actual type. If the two don't march we emit an error.
Distinguishing variants by instance (e.g. MemRegion *) is not the most
optimal way. Other checkers, like MallocChecker uses a symbol-to-trait
map instead of region-to-trait. The upside of using symbols (which would
be the value of a variant, not the variant itself itself) is that the
analyzer would take care of modeling copies, moves, invalidation, etc,
out of the box. The problem is that for compound types, the analyzer
doesn't create a symbol as a result of a constructor call that is fit
for this job. MallocChecker in contrast manipulates simple pointers.
My colleges and I considered the option of making adjustments directly
to the memory model of the analyzer, but for the time being decided
against it, and go with the bit more cumbersome, but immediately viable
option of simply using MemRegions.
# Current state and review plan
This patch contains an already working checker that can find and report
certain variant/any misuses, but still lands it in alpha. I plan to
upload the rest of the checker in later patches.
The full checker is also able to "follow" the symbolic value held by the
std::variant and updates the program state whenever we assign the value
stored in the variant. I have also built a library that is meant to
model union-like types similar to variant, hence some functions being a
bit more multipurpose then is immediately needed.
I also intend to publish my std::any checker in a later commit.
---------
Co-authored-by: Gabor Spaits <gabor.spaits@ericsson.com>
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
This patch converts `ImplicitParamDecl::ImplicitParamKind` into a scoped enum at namespace scope, making it eligible for forward declaring. This is useful for `preferred_type` annotations on bit-fields.
This patch converts `CXXConstructExpr::ConstructionKind` into a scoped enum in namespace scope, making it eligible for forward declaring. This is useful in cases like annotating bit-fields with `preferred_type`.
The goal of this patch is to refine how the `SVal` base and sub-kinds
are represented by forming one unified enum describing the possible
SVals. This means that the `unsigned SVal::Kind` and the attached
bit-packing semantics would be replaced by a single unified enum. This
is more conventional and leads to a better debugging experience by
default. This eases the need of using debug pretty-printers, or the use
of runtime functions doing the printing for us like we do today by
calling `Val.dump()` whenever we inspect the values.
Previously, the first 2 bits of the `unsigned SVal::Kind` discriminated
the following quartet: `UndefinedVal`, `UnknownVal`, `Loc`, or `NonLoc`.
The rest of the upper bits represented the sub-kind, where the value
represented the index among only the `Loc`s or `NonLoc`s, effectively
attaching 2 meanings of the upper bits depending on the base-kind. We
don't need to pack these bits, as we have plenty even if we would use
just a plan-old `unsigned char`.
Consequently, in this patch, I propose to lay out all the (non-abstract)
`SVal` kinds into a single enum, along with some metadata (`BEGIN_Loc`,
`END_Loc`, `BEGIN_NonLoc`, `END_NonLoc`) artificial enum values, similar
how we do with the `MemRegions`.
Note that in the unified `SVal::Kind` enum, to differentiate
`nonloc::ConcreteInt` from `loc::ConcreteInt`, I had to prefix them with
`Loc` and `NonLoc` to resolve this ambiguity.
This should not surface in general, because I'm replacing the
`nonloc::Kind` enum items with `inline constexpr` global constants to
mimic the original behavior - and offer nicer spelling to these enum
values.
Some `SVal` constructors were not marked explicit, which I now mark as
such to follow best practices, and marked others as `/*implicit*/` to
clarify the intent.
During refactoring, I also found at least one function not marked
`LLVM_ATTRIBUTE_RETURNS_NONNULL`, so I did that.
The `TypeRetrievingVisitor` visitor had some accidental dead code,
namely: `VisitNonLocConcreteInt` and `VisitLocConcreteInt`.
Previously, the `SValVisitor` expected visit handlers of
`VisitNonLocXXXXX(nonloc::XXXXX)` and `VisitLocXXXXX(loc::XXXXX)`, where
I felt that envoding `NonLoc` and `Loc` in the name is not necessary as
the type of the parameter would select the right overload anyways, so I
simplified the naming of those visit functions.
The rest of the diff is a lot of times just formatting, because
`getKind()` by nature, frequently appears in switches, which means that
the whole switch gets automatically reformatted. I could probably undo
the formatting, but I didn't want to deviate from the rule unless
explicitly requested.
Workaround the case when the `this` pointer is actually a `NonLoc`, by
returning `Unknown` instead.
The solution isn't ideal, as `this` should be really a `Loc`, but due to
how casts work, I feel this is our easiest and best option.
As this patch presents, I'm evaluating a cast to transform the `NonLoc`.
However, given that `evalCast()` can't be cast from `NonLoc` to a
pointer type thingy (`Loc`), we end up with `Unknown`.
It is because `EvalCastVisitor::VisitNonLocSymbolVal()` only evaluates
casts that happen from NonLoc to NonLocs.
When I tried to actually implement that case, I figured:
1) Create a `SymbolicRegion` from that `nonloc::SymbolVal`; but
`SymbolRegion` ctor expects a pointer type for the symbol.
2) Okay, just have a `SymbolCast`, getting us the pointer type; but
`SymbolRegion` expects `SymbolData` symbols, not generic `SymExpr`s, as
stated:
> // Because pointer arithmetic is represented by ElementRegion layers,
> // the base symbol here should not contain any arithmetic.
3) We can't use `ElementRegion`s to perform this cast because to have an
`ElementRegion`, you already have to have a `SubRegion` that you want to
cast, but the point is that we don't have that.
At this point, I gave up, and just left a FIXME instead, while still
returning `Unknown` on that path.
IMO this is still better than having a crash.
Fixes#69922
Fixes#70464
When ctor is not declared in the base class, initializing the base class
with the initializer list will not trigger a proper assignment of the
base region, as a CXXConstructExpr doing that is not available in the
AST.
This patch checks whether the init expr is an InitListExpr under a base
initializer, and adds a binding if so.
Static analyze can't report diagnose when statement after a
CXXForRangeStmt and enable widen, because
`ExprEngine::processCFGBlockEntrance` lacks of CXXForRangeStmt and when
`AMgr.options.maxBlockVisitOnPath - 1` equals to `blockCount`, it can't
widen. After next iteration, `BlockCount >=
AMgr.options.maxBlockVisitOnPath` holds and generate a sink node. Add
`CXXForRangeStmt` makes it work.
Co-authored-by: huqizhi <836744285@qq.com>
In the following code:
```cpp
int main() {
struct Wrapper {char c; int &ref; };
Wrapper w = {.c = 'a', .ref = *(int *)0 };
w.ref = 1;
}
```
The clang static analyzer will produce the following warnings and notes:
```
test.cpp:12:11: warning: Dereference of null pointer [core.NullDereference]
12 | w.ref = 1;
| ~~~~~~^~~
test.cpp:11:5: note: 'w' initialized here
11 | Wrapper w = {.c = 'a', .ref = *(int *)0 };
| ^~~~~~~~~
test.cpp:12:11: note: Dereference of null pointer
12 | w.ref = 1;
| ~~~~~~^~~
1 warning generated.
```
In the line where `w` is created, the note gives information about the
initialization of `w` instead of `w.ref`. Let's compare it to a similar
case where a null pointer dereference happens to a pointer member:
```cpp
int main() {
struct Wrapper {char c; int *ptr; };
Wrapper w = {.c = 'a', .ptr = nullptr };
*w.ptr = 1;
}
```
Here the following error and notes are seen:
```
test.cpp:18:12: warning: Dereference of null pointer (loaded from field 'ptr') [core.NullDereference]
18 | *w.ptr = 1;
| ~~~ ^
test.cpp:17:5: note: 'w.ptr' initialized to a null pointer value
17 | Wrapper w = {.c = 'a', .ptr = nullptr };
| ^~~~~~~~~
test.cpp:18:12: note: Dereference of null pointer (loaded from field 'ptr')
18 | *w.ptr = 1;
| ~~~ ^
1 warning generated.
```
Here the note that shows the initialization the initialization of
`w.ptr` in shown instead of `w`.
This commit is here to achieve similar notes for member reference as the
notes of member pointers, so the report looks like the following:
```
test.cpp:12:11: warning: Dereference of null pointer [core.NullDereference]
12 | w.ref = 1;
| ~~~~~~^~~
test.cpp:11:5: note: 'w.ref' initialized to a null pointer value
11 | Wrapper w = {.c = 'a', .ref = *(int *)0 };
| ^~~~~~~~~
test.cpp:12:11: note: Dereference of null pointer
12 | w.ref = 1;
| ~~~~~~^~~
1 warning generated.
```
Here the initialization of `w.ref` is shown instead of `w`.
---------
Authored-by: Gábor Spaits <gabor.spaits@ericsson.com>
Reviewed-by: Donát Nagy <donat.nagy@ericsson.com>
This change avoids a crash in BasicValueFactory by checking the bit
width of an APSInt to avoid calling getZExtValue if greater than
64-bits. This was caught by our internal, randomized test generator.
Clang invocation
clang -cc1 -analyzer-checker=optin.portability.UnixAPI case.c
<src-root>/llvm/include/llvm/ADT/APInt.h:1488:
uint64_t llvm::APInt::getZExtValue() const: Assertion `getActiveBits()
<= 64
&& "Too many bits for uint64_t"' failed.
...
#9 <address> llvm::APInt::getZExtValue() const
<src-root>/llvm/include/llvm/ADT/APInt.h:1488:5
clang::BinaryOperatorKind, llvm::APSInt const&, llvm::APSInt const&)
<src-root>/clang/lib/StaticAnalyzer/Core/BasicValueFactory.cpp:307:37
llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>,
clang::BinaryOperatorKind, clang::ento::NonLoc, clang::ento::NonLoc,
clang::QualType)
<src-root>/clang/lib/StaticAnalyzer/Core/SimpleSValBuilder.cpp:531:31
llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>,
clang::BinaryOperatorKind, clang::ento::SVal, clang::ento::SVal,
clang::QualType)
<src-root>/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp:532:26
...
Previously, bitwise shifts with constant operands were validated by the
checker `core.UndefinedBinaryOperatorResult`. However, this logic was
unreliable, and commit 25b9696b61 added
the dedicated checker `core.BitwiseShift` which validated the
preconditions of all bitwise shifts with a more accurate logic (that
uses the real types from the AST instead of the unreliable type
information encoded in `APSInt` objects).
This commit disables the inaccurate logic that could mark bitwise shifts
as 'undefined' and removes the redundant shift-related warning messages
from core.UndefinedBinaryOperatorResult. The tests that were validating
this logic are also deleted by this commit; but I verified that those
testcases trigger the expected bug reports from `core.BitwiseShift`. (I
didn't convert them to tests of `core.BitwiseShift`, because that
checker already has its own extensive test suite with many analogous
testcases.)
I hope that there will be a time when the constant folding will be
reliable, but until then we need hacky solutions like this improve the
quality of results.
evalIntegralCast was using makeIntVal, and when _BitInt() types were
introduced this exposed a crash in evalIntegralCast as a result.
This is a reapply of a previous patch that failed post merge on the arm
buildbots, because arm cannot handle large
BitInts. Pinning the triple for the testcase solves that problem.
Improve evalIntegralCast to use makeIntVal more efficiently to avoid the
crash exposed by use of _BitInt.
This was caught with our internal randomized testing.
<src-root>/llvm/include/llvm/ADT/APInt.h:1510:
int64_t llvm::APInt::getSExtValue() const: Assertion
`getSignificantBits() <= 64 && "Too many bits for int64_t"' failed.a
...
#9 <address> llvm::APInt::getSExtValue() const
<src-root>/llvm/include/llvm/ADT/APInt.h:1510:5
llvm::IntrusiveRefCntPtr<clang::ento::ProgramState const>,
clang::ento::SVal, clang::QualType, clang::QualType)
<src-root>/clang/lib/StaticAnalyzer/Core/SValBuilder.cpp:607:24
clang::Expr const*, clang::ento::ExplodedNode*,
clang::ento::ExplodedNodeSet&)
<src-root>/clang/lib/StaticAnalyzer/Core/ExprEngineC.cpp:413:61
...
Fixes: https://github.com/llvm/llvm-project/issues/61960
Reviewed By: donat.nagy