Commit Graph

126 Commits

Author SHA1 Message Date
Daniel Krupp
6002e2fd49 [analyzer] Split TaintPropagation checker into reporting and modeling checkers (#98157)
Taint propagation is a a generic modeling feature of the Clang Static
Analyzer which many other checkers depend on. Therefore
GenericTaintChecker is split into a TaintPropagation modeling checker
and a GenericTaint reporting checker.
2024-07-10 17:54:53 +02:00
Donát Nagy
5ffd154cf6 [analyzer] Clean up list of taint propagation functions (#91635)
This commit refactors GenericTaintChecker and performs various
improvements in the list of taint propagation functions:

1. The matching mode (usually `CDM::CLibrary` or
`CDM::CLibraryMaybeHardened`) was specified to avoid matching e.g. C++
methods or functions from a user-defined namespace that happen to share
the name of a well-known library function.
2. With these matching modes, a `CallDescription` can automatically
match builtin variants of the functions, so entries that explicitly
specified a builtin function were removed. This eliminated
inconsistencies where the "normal" and the builtin variant of the same
function was handled differently (e.g. `__builtin_strlcat` was covered,
while plain `strlcat` wasn't; while `__builtin_memcpy` and `memcpy` were
both on the list with different propagation rules).
3. The modeling of the functions `strlcat` and `strncat` was updated to
propagate taint from the first argument (index 0), because a tainted
string should remain tainted even if we append something else to it.
Note that this was already applied to `strcat` and `wcsncat` by commit
6ceb1c0ef9.
4. Some functions were updated to propagate taint from a size/length
argument to the result: e.g. `memcmp(p, q, get_tainted_int())` will now
return a tainted value (because the attacker can manipulate it). This
principle was already present in some propagation rules (e.g.
`__builtin_memcpy` was handled this way), and even after this commit
there are still some functions where it isn't applied. (I only aimed for
consistency within the same function family.)
5. Functions that have hardened `__FOO_chk()` variants are matched in
`CDM:CLibraryMaybeHardened` to ensure consistent handling of the
"normal" and the hardened variant. I added special handling for the
hardened variants of "sprintf" and "snprintf" because there the extra
parameters are inserted into the middle of the parameter list.
6. Modeling of `sscanf_s` was added, to complete the group of `fscanf`,
`fscanf_s` and `sscanf`.
7. The `Source()` specifications for `gets`, `gets_s` and `wgetch` were
ill-formed: they were specifying variadic arguments starting at argument
index `ReturnValueIndex`. (That is, in addition to the return value they
were propagating taint to all arguments.)
8. Functions that were related to each other were grouped together. (I
know that this makes the diff harder to read, but I felt that the full
list is unreadable without some reorganization.)
9. I spotted and removed some redundant curly braces. Perhaps would be
good to switch to a cleaner layout with less nested braces...
10. I updated some obsolete comments and added two TODOs for issues that
should be fixed in followup commits.
2024-05-16 12:10:56 +02:00
Kazu Hirata
deffae5da1 [clang] Use StringRef::operator== instead of StringRef::equals (NFC) (#91844)
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.

- StringRef::operator==/!= outnumber StringRef::equals by a factor of
  24 under clang/ in terms of their usage.

- The elimination of StringRef::equals brings StringRef closer to
  std::string_view, which has operator== but not equals.

- S == "foo" is more readable than S.equals("foo"), especially for
  !Long.Expression.equals("str") vs Long.Expression != "str".
2024-05-11 11:38:52 -07:00
Daniel Krupp
6ceb1c0ef9 [analyzer] Remove untrusted buffer size warning in the TaintPropagation checker (#68607)
Before this commit the the checker alpha.security.taint.TaintPropagation always reported warnings when the size argument of a memcpy-like or malloc-like function was tainted. However, this produced false positive reports in situations where the size was tainted, but correctly performed bound checks guaranteed the safety of the call.
 
This commit removes the rough "always warn if the size argument is tainted" heuristic; but it would be good to add a more refined "warns if the size argument is tainted and can be too large" heuristic in follow-up commits. That logic would belong to CStringChecker and MallocChecker, because those are the checkers responsible for the more detailed modeling of memcpy-like and malloc-like functions. To mark this plan, TODO comments are added in those two checkers.
 
There were several test cases that used these sinks to test generic properties of taint tracking; those were adapted to use different logic.
 
As a minor unrelated change, this commit ensures that strcat (and its wide variant, wcsncat) propagates taint from the first argument to the first argument, i.e. a tainted string remains tainted if we concatenate it with another string. This change was required because the adapted variant of multipleTaintedArgs is relying on strncat to compose a value that combines taint from two different sources.
2024-05-02 16:46:41 +02:00
NagyDonat
fb299cae51 [analyzer] Make recognition of hardened __FOO_chk functions explicit (#86536)
In builds that use source hardening (-D_FORTIFY_SOURCE), many standard
functions are implemented as macros that expand to calls of hardened
functions that take one additional argument compared to the "usual"
variant and perform additional input validation. For example, a `memcpy`
call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`.

Before this commit, `CallDescription`s created with the matching mode
`CDM::CLibrary` automatically matched these hardened variants (in a
addition to the "usual" function) with a fairly lenient heuristic.

Unfortunately this heuristic meant that the `CLibrary` matching mode was
only usable by checkers that were prepared to handle matches with an
unusual number of arguments.

This commit limits the recognition of the hardened functions to a
separate matching mode `CDM::CLibraryMaybeHardened` and applies this
mode for functions that have hardened variants and were previously
recognized with `CDM::CLibrary`.

This way checkers that are prepared to handle the hardened variants will
be able to detect them easily; while other checkers can simply use
`CDM::CLibrary` for matching C library functions (and they won't
encounter surprising argument counts).

The initial motivation for refactoring this area was that previously
`CDM::CLibrary` accepted calls with more arguments/parameters than the
expected number, so I wasn't able to use it for `malloc` without
accidentally matching calls to the 3-argument BSD kernel malloc.

After this commit this "may have more args/params" logic will only
activate when we're actually matching a hardened variant function (in
`CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and
"snprintf()" in CStringChecker was refactored, because previously it was
abusing the behavior that extra arguments are accepted even if the
matched function is not a hardened variant.

This commit also fixes the oversight that the old code would've
recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`.

After this commit I'm planning to create several follow-up commits that
ensure that checkers looking for C library functions use `CDM::CLibrary`
as a "sane default" matching mode.

This commit is not truly NFC (it eliminates some buggy corner cases),
but it does not intentionally modify the behavior of CSA on real-world
non-crazy code.

As a minor unrelated change I'm eliminating the argument/variable
"IsBuiltin" from the evalSprintf function family in CStringChecker,
because it was completely unused.

---------

Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2024-04-05 11:20:27 +02:00
NagyDonat
52a460f9d4 [analyzer] Refactor CallDescription match mode (NFC) (#83432)
The class `CallDescription` is used to define patterns that are used for
matching `CallEvent`s. For example, a
`CallDescription{{"std", "find_if"}, 3}`
matches a call to `std::find_if` with 3 arguments.

However, these patterns are somewhat fuzzy, so this pattern could also
match something like `std::__1::find_if` (with an additional namespace
layer), or, unfortunately, a `CallDescription` for the well-known
function `free()` can match a C++ method named `free()`:
https://github.com/llvm/llvm-project/issues/81597

To prevent this kind of ambiguity this commit introduces the enum
`CallDescription::Mode` which can limit the pattern matching to
non-method function calls (or method calls etc.). After this NFC change,
one or more follow-up commits will apply the right pattern matching
modes in the ~30 checkers that use `CallDescription`s.

Note that `CallDescription` previously had a `Flags` field which had
only two supported values:
 - `CDF_None` was the default "match anything" mode,
 - `CDF_MaybeBuiltin` was a "match only C library functions and accept
some inexact matches" mode.
This commit preserves `CDF_MaybeBuiltin` under the more descriptive
name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`).

Instead of this "Flags" model I'm switching to a plain enumeration
becasue I don't think that there is a natural usecase to combine the
different matching modes. (Except for the default "match anything" mode,
which is currently kept for compatibility, but will be phased out in the
follow-up commits.)
2024-03-04 15:43:37 +01:00
Balazs Benics
f90e063308 [analyzer] Fix taint sink rules for exec-like functions (#66358)
Variadic arguments were not considered as taint sink arguments. I also
decided to extend the list of exec-like functions.

(Juliet CWE78_OS_Command_Injection__char_connect_socket_execl)
2023-09-22 07:14:32 +02:00
Daniel Krupp
97495d3159 [analyzer] TaintPropagation checker strlen() should not propagate (#66086)
strlen(..) call should not propagate taintedness,
because it brings in many false positive findings. It is a common
pattern to copy user provided input to another buffer. In these cases we
always
get warnings about tainted data used as the malloc parameter:

buf = malloc(strlen(tainted_txt) + 1); // false warning

This pattern can lead to a denial of service attack only, when the
attacker can directly specify the size of the allocated area as an
arbitrary large number (e.g. the value is converted from a user provided
string).

Later, we could reintroduce strlen() as a taint propagating function
with the consideration not to emit warnings when the tainted value
cannot be "arbitrarily large" (such as the size of an already allocated
memory block).

The change has been evaluated on the following open source projects:

- memcached: [1 lost false
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)

- tmux: 0 lost reports
- twin [3 lost false
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- vim [1 lost false
positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- openssl 0 lost reports
- sqliste [2 lost false
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline&newcheck=sqlite_version-3.33.0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- ffmpeg 0 lost repots
- postgresql [3 lost false
positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved)
- tinyxml 0 lost reports
- libwebm 0 lost reports
- xerces 0 lost reports

In all cases the lost reports are originating from copying untrusted
environment variables into another buffer.

There are 2 types of lost false positive reports:
1) [Where the warning is emitted at the malloc call by the
TaintPropagation Checker
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648506&report-hash=2079221954026f17e1ecb614f5f054db&report-filepath=%2amemcached.c)
`
            len = strlen(portnumber_filename)+4+1;
            temp_portnumber_filename = malloc(len);
`

2) When pointers are set based on the length of the tainted string by
the ArrayOutofBoundsv2 checker.
For example [this
](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c)case.
2023-09-19 11:04:50 +02:00
Balazs Benics
61924da630 [analyzer] Propagate taint for wchar variants of some APIs (#66074)
Functions like `fgets`, `strlen`, `strcat` propagate taint.
However, their `wchar_t` variants don't. This patch fixes that.

Notice, that there could be many more APIs missing.
This patch intends to fix those that so far surfaced,
instead of exhaustively fixing this issue.

https://github.com/llvm/llvm-project/pull/66074
2023-09-14 11:55:10 +02:00
Balazs Benics
8243bc4045 [analyzer] Make socket accept() propagate taint (#66074)
This allows to track taint on real code from `socket()`
to reading into a buffer using `recv()`.

https://github.com/llvm/llvm-project/pull/66074
2023-09-14 11:55:10 +02:00
Balazs Benics
909c963999 [analyzer] Fix stdin declaration in C++ tests (#66074)
The `stdin` declaration should be within `extern "C" {...}`, in C++
mode. In addition, it should be also marked `extern` in both C and
C++ modes.

I tightened the check to ensure we only accept `stdin` if both of these
match. However, from the Juliet test suite's perspective, this commit
should not matter.

https://github.com/llvm/llvm-project/pull/66074
2023-09-14 11:55:10 +02:00
Daniel Krupp
26b19a67e5 [clang][analyzer]Fix non-effective taint sanitation
There was a bug in alpha.security.taint.TaintPropagation checker
in Clang Static Analyzer.
Taint filtering could only sanitize const arguments.
After this patch, taint filtering is effective also
on non-const parameters.

Differential Revision: https://reviews.llvm.org/D155848
2023-07-21 15:11:13 +02:00
Jordan Rupprecht
8b39527535 [NFC] Wrap entire debug logging loop in LLVM_DEBUG 2023-04-26 05:28:15 -07:00
Daniel Krupp
343bdb1094 [analyzer] Show taint origin and propagation correctly
This patch improves the diagnostics of the alpha.security.taint.TaintPropagation
checker and taint related checkers by showing the "Taint originated here" note
at the correct place, where the attacker may inject it. This greatly improves
the understandability of the taint reports.

In the baseline the taint source was pointing to an invalid location, typically
somewhere between the real taint source and sink.

After the fix, the "Taint originated here" tag is correctly shown at the taint
source. This is the function call where the attacker can inject a malicious data
(e.g. reading from environment variable, reading from file, reading from
standard input etc.).

This patch removes the BugVisitor from the implementation and replaces it with 2
new NoteTags. One, in the taintOriginTrackerTag() prints the "taint originated
here" Note and the other in taintPropagationExplainerTag() explaining how the
taintedness is propagating from argument to argument or to the return value
("Taint propagated to the Xth argument"). This implementation uses the
interestingess BugReport utility to track back the tainted symbols through
propagating function calls to the point where the taintedness was introduced by
a source function call.

The checker which wishes to emit a Taint related diagnostic must use the
categories::TaintedData BugType category and must mark the tainted symbols as
interesting. Then the TaintPropagationChecker will automatically generate the
"Taint originated here" and the "Taint propagated to..." diagnostic notes.
2023-04-26 12:43:36 +02:00
Kazu Hirata
6ad0788c33 [clang] Use std::optional instead of llvm::Optional (NFC)
This patch replaces (llvm::|)Optional< with std::optional<.  I'll post
a separate patch to remove #include "llvm/ADT/Optional.h".

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2023-01-14 12:31:01 -08:00
serge-sans-paille
d9ab3e82f3 [clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This a recommit of e953ae5bbc and the subsequent fixes caa713559b and 06b90e2e9c.

The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in aa171833ab.
The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-27 09:55:19 +01:00
Vitaly Buka
aa171833ab Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information"
Revert "Fix lldb option handling since e953ae5bbc (part 2)"
Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4"

GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104
compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d

The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0.

This reverts commit caa713559b.
This reverts commit 06b90e2e9c.
This reverts commit e953ae5bbc.
2022-12-25 23:12:47 -08:00
serge-sans-paille
e953ae5bbc [clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile
time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This is a recommit of 719d98dfa8 that into
account a GGC issue (probably
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92181) when dealing with
intiailizer_list and constant expressions.

Workaround this by avoiding initializer list, at the expense of a
temporary plain old array.

Differential Revision: https://reviews.llvm.org/D139881
2022-12-24 10:25:06 +01:00
serge-sans-paille
07d9ab9aa5 Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information"
There are still remaining issues with GCC 12, see for instance

https://lab.llvm.org/buildbot/#/builders/93/builds/12669

This reverts commit 5ce4e92264.
2022-12-23 13:29:21 +01:00
serge-sans-paille
5ce4e92264 [clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile
time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

This is a recommit of 719d98dfa8 with a
change to llvm/utils/TableGen/OptParserEmitter.cpp to cope with GCC bug
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158

Differential Revision: https://reviews.llvm.org/D139881
2022-12-23 12:48:17 +01:00
serge-sans-paille
b7065a31b5 Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information"
Failing builds: https://lab.llvm.org/buildbot#builders/9/builds/19030
This is GCC specific and has been reported upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158

This reverts commit 719d98dfa8.
2022-12-23 11:36:56 +01:00
serge-sans-paille
719d98dfa8 [clang] Use a StringRef instead of a raw char pointer to store builtin and call information
This avoids recomputing string length that is already known at compile
time.

It has a slight impact on preprocessing / compile time, see

https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u

Differential Revision: https://reviews.llvm.org/D139881
2022-12-23 10:31:47 +01:00
Kazu Hirata
6c8b8a6a2a [Checkers] Use std::optional in GenericTaintChecker.cpp (NFC)
This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-10 08:00:24 -08:00
Kazu Hirata
180600660b [StaticAnalyzer] Use std::nullopt instead of None (NFC)
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated.  The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.

This is part of an effort to migrate from llvm::Optional to
std::optional:

https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
2022-12-03 11:34:24 -08:00
Kazu Hirata
ca4af13e48 [clang] Don't use Optional::getValue (NFC) 2022-06-20 22:59:26 -07:00
Kazu Hirata
064a08cd95 Don't use Optional::hasValue (NFC) 2022-06-20 20:05:16 -07:00
Kazu Hirata
06decd0b41 [clang] Use value_or instead of getValueOr (NFC) 2022-06-18 23:21:34 -07:00
Balazs Benics
f4fc3f6ba3 [analyzer] Treat system globals as mutable if they are not const
Previously, system globals were treated as immutable regions, unless it
was the `errno` which is known to be frequently modified.

D124244 wants to add a check for stores to immutable regions.
It would basically turn all stores to system globals into an error even
though we have no reason to believe that those mutable sys globals
should be treated as if they were immutable. And this leads to
false-positives if we apply D124244.

In this patch, I'm proposing to treat mutable sys globals actually
mutable, hence allocate them into the `GlobalSystemSpaceRegion`, UNLESS
they were declared as `const` (and a primitive arithmetic type), in
which case, we should use `GlobalImmutableSpaceRegion`.

In any other cases, I'm using the `GlobalInternalSpaceRegion`, which is
no different than the previous behavior.

---

In the tests I added, only the last `expected-warning` was different, compared to the baseline.
Which is this:
```lang=C++
void test_my_mutable_system_global_constraint() {
  assert(my_mutable_system_global > 2);
  clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{TRUE}}
  invalidate_globals();
  clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
void test_my_mutable_system_global_assign(int x) {
  my_mutable_system_global = x;
  clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{TRUE}}
  invalidate_globals();
  clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{UNKNOWN}} It was previously TRUE.
}
```

---

Unfortunately, the taint checker will be also affected.
The `stdin` global variable is a pointer, which is assumed to be a taint
source, and the rest of the taint propagation rules will propagate from
it.
However, since mutable variables are no longer treated immutable, they
also get invalidated, when an opaque function call happens, such as the
first `scanf(stdin, ...)`. This would effectively remove taint from the
pointer, consequently disable all the rest of the taint propagations
down the line from the `stdin` variable.

All that said, I decided to look through `DerivedSymbol`s as well, to
acquire the memregion in that case as well. This should preserve the
previously existing taint reports.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D127306
2022-06-15 17:08:27 +02:00
Balazs Benics
96ccb690a0 [analyzer][NFC] Prefer using isa<> instead getAs<> in conditions
Depends on D125709

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D127742
2022-06-15 16:58:13 +02:00
Balazs Benics
f1b18a79b7 [analyzer][NFC] Remove dead code and modernize surroundings
Thanks @kazu for helping me clean these parts in D127799.

I'm leaving the dump methods, along with the unused visitor handlers and
the forwarding methods.

The dead parts actually helped to uncover two bugs, to which I'm going
to post separate patches.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D127836
2022-06-15 16:50:12 +02:00
Tom Ritter
82f3ed9904 [analyzer] Expose Taint.h to plugins
Reviewed By: NoQ, xazax.hun, steakhal

Differential Revision: https://reviews.llvm.org/D123155
2022-04-19 16:55:01 +02:00
Endre Fülöp
4fd6c6e65a [analyzer] Add more propagations to Taint analysis
Add more functions as taint propators to GenericTaintChecker.

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D120369
2022-03-07 13:18:54 +01:00
Endre Fülöp
34a7387986 [analyzer] Add more sources to Taint analysis
Add more functions as taint sources to GenericTaintChecker.

Reviewed By: steakhal

Differential Revision: https://reviews.llvm.org/D120236
2022-02-28 11:33:02 +01:00
Fangrui Song
ecff9b65b5 [analyzer] Just use default capture after 7fd60ee6e0 2022-02-24 10:06:11 -08:00
Fangrui Song
7fd60ee6e0 [analyzer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off builds 2022-02-24 00:13:13 -08:00
Balazs Benics
7036413dc2 Revert "Revert "[analyzer] Fix taint rule of fgets and setproctitle_init""
This reverts commit 2acead35c1.

Let's try `REQUIRES: asserts`.
2022-02-23 12:55:31 +01:00
Balazs Benics
a848a5cf2f Revert "Revert "[analyzer] Fix taint propagation by remembering to the location context""
This reverts commit d16c5f4192.

Let's try `REQUIRES: asserts`.
2022-02-23 12:53:07 +01:00
Balazs Benics
fa0a80e017 Revert "Revert "[analyzer] Add failing test case demonstrating buggy taint propagation""
This reverts commit b8ae323cca.

Let's try `REQUIRES: asserts`.
2022-02-23 10:48:06 +01:00
Balazs Benics
b8ae323cca Revert "[analyzer] Add failing test case demonstrating buggy taint propagation"
This reverts commit 744745ae19.

I'm reverting this since this patch caused a build breakage.

https://lab.llvm.org/buildbot/#/builders/91/builds/3818
2022-02-14 18:45:46 +01:00
Balazs Benics
d16c5f4192 Revert "[analyzer] Fix taint propagation by remembering to the location context"
This reverts commit b099e1e562.

I'm reverting this since the head of the patch stack caused a build
breakage.

https://lab.llvm.org/buildbot/#/builders/91/builds/3818
2022-02-14 18:45:46 +01:00
Balazs Benics
2acead35c1 Revert "[analyzer] Fix taint rule of fgets and setproctitle_init"
This reverts commit bf5963bf19.

I'm reverting this since the head of the patch stack caused a build
breakage.

https://lab.llvm.org/buildbot/#/builders/91/builds/3818
2022-02-14 18:45:46 +01:00
Balazs Benics
bf5963bf19 [analyzer] Fix taint rule of fgets and setproctitle_init
There was a typo in the rule.
`{{0}, ReturnValueIndex}` meant that the discrete index is `0` and the
variadic index is `-1`.
What we wanted instead is that both `0` and `-1` are in the discrete index
list.

Instead of this, we wanted to express that both `0` and the
`ReturnValueIndex` is in the discrete arg list.

The manual inspection revealed that `setproctitle_init` also suffered a
probably incomplete propagation rule.

Reviewed By: Szelethus, gamesh411

Differential Revision: https://reviews.llvm.org/D119129
2022-02-14 16:55:55 +01:00
Balazs Benics
b099e1e562 [analyzer] Fix taint propagation by remembering to the location context
Fixes the issue D118987 by mapping the propagation to the callsite's
LocationContext.
This way we can keep track of the in-flight propagations.

Note that empty propagation sets won't be inserted.

Reviewed By: NoQ, Szelethus

Differential Revision: https://reviews.llvm.org/D119128
2022-02-14 16:55:55 +01:00
Balazs Benics
744745ae19 [analyzer] Add failing test case demonstrating buggy taint propagation
Recently we uncovered a serious bug in the `GenericTaintChecker`.
It was already flawed before D116025, but that was the patch that turned
this silent bug into a crash.

It happens if the `GenericTaintChecker` has a rule for a function, which
also has a definition.

  char *fgets(char *s, int n, FILE *fp) {
    nested_call();   // no parameters!
    return (char *)0;
  }

  // Within some function:
  fgets(..., tainted_fd);

When the engine inlines the definition and finds a function call within
that, the `PostCall` event for the call will get triggered sooner than the
`PostCall` for the original function.
This mismatch violates the assumption of the `GenericTaintChecker` which
wants to propagate taint information from the `PreCall` event to the
`PostCall` event, where it can actually bind taint to the return value
**of the same call**.

Let's get back to the example and go through step-by-step.
The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>`
event, so it would 'remember' that it needs to taint the return value
and the buffer, from the `PostCall` handler, where it has access to the
return value symbol.
However, the engine will inline fgets and the `nested_call()` gets
evaluated subsequently, which produces an unimportant
`PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is
observed by the `GenericTaintChecker`, which will unconditionally mark
tainted the 'remembered' arg indexes, trying to access a non-existing
argument, resulting in a crash.
If it doesn't crash, it will behave completely unintuitively, by marking
completely unrelated memory regions tainted, which is even worse.

The resulting assertion is something like this:
  Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion
          `Arg < getNumArgs() && "Arg access out of range!"' failed.

The gist of the backtrace:
  CallExpr::getArg(unsigned int) const
  SimpleFunctionCall::getArgExpr(unsigned int)
  CallEvent::getArgSVal(unsigned int) const
  GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const

Prior to D116025, there was a check for the argument count before it
applied taint, however, it still suffered from the same underlying
issue/bug regarding propagation.

This path does not intend to fix the bug, rather start a discussion on
how to fix this.

---

Let me elaborate on how I see this problem.

This pre-call, post-call juggling is just a workaround.
The engine should by itself propagate taint where necessary right where
it invalidates regions.
For the tracked values, which potentially escape, we need to erase the
information we know about them; and this is exactly what is done by
invalidation.
However, in the case of taint, we basically want to approximate from the
opposite side of the spectrum.
We want to preserve taint in most cases, rather than cleansing them.

Now, we basically sanitize all escaping tainted regions implicitly,
since invalidation binds a fresh conjured symbol for the given region,
and that has not been associated with taint.

IMO this is a bad default behavior, we should be more aggressive about
preserving taint if not further spreading taint to the reachable
regions.

We have a couple of options for dealing with it (let's call it //tainting
policy//):
  1) Taint only the parameters which were tainted prior to the call.
  2) Taint the return value of the call, since it likely depends on the
     tainted input - if any arguments were tainted.
  3) Taint all escaped regions - (maybe transitively using the cluster
     algorithm) - if any arguments were tainted.
  4) Not taint anything - this is what we do right now :D

The `ExprEngine` should not deal with taint on its own. It should be done
by a checker, such as the `GenericTaintChecker`.
However, the `Pre`-`PostCall` checker callbacks are not designed for this.
`RegionChanges` would be a much better fit for modeling taint propagation.
What we would need in the `RegionChanges` callback is the `State` prior
invalidation, the `State` after the invalidation, and a `CheckerContext` in
which the checker can create transitions, where it would place `NoteTags`
for the modeled taint propagations and report errors if a taint sink
rule gets violated.
In this callback, we could query from the prior State, if the given
value was tainted; then act and taint if necessary according to the
checker's tainting policy.

By using RegionChanges for this, we would 'fix' the mentioned
propagation bug 'by-design'.

Reviewed By: Szelethus

Differential Revision: https://reviews.llvm.org/D118987
2022-02-14 16:55:55 +01:00
Tres Popp
262cc74e0b Fix pair construction with an implicit constructor inside. 2022-01-18 18:01:52 +01:00
Endre Fülöp
17f74240e6 [analyzer][NFC] Refactor GenericTaintChecker to use CallDescriptionMap
GenericTaintChecker now uses CallDescriptionMap to describe the possible
operation in code which trigger the introduction (sources), the removal
(filters), the passing along (propagations) and detection (sinks) of
tainted values.

Reviewed By: steakhal, NoQ

Differential Revision: https://reviews.llvm.org/D116025
2022-01-18 16:04:04 +01:00
Balazs Benics
49285f43e5 [analyzer] sprintf is a taint propagator not a source
Due to a typo, `sprintf()` was recognized as a taint source instead of a
taint propagator. It was because an empty taint source list - which is
the first parameter of the `TaintPropagationRule` - encoded the
unconditional taint sources.
This typo effectively turned the `sprintf()` into an unconditional taint
source.

This patch fixes that typo and demonstrated the correct behavior with
tests.

Reviewed By: martong

Differential Revision: https://reviews.llvm.org/D112558
2021-10-28 11:03:02 +02:00
Kazu Hirata
0abb5d293c [Sema, StaticAnalyzer] Use StringRef::contains (NFC) 2021-10-20 08:02:36 -07:00
Kazu Hirata
e567f37dab [clang] Use llvm::is_contained (NFC) 2021-10-13 20:41:55 -07:00
Balazs Benics
edde4efc66 [analyzer] Introduce the assume-controlled-environment config option
If the `assume-controlled-environment` is `true`, we should expect `getenv()`
to succeed, and the result should not be considered tainted.
By default, the option will be `false`.

Reviewed By: NoQ, martong

Differential Revision: https://reviews.llvm.org/D111296
2021-10-13 10:50:26 +02:00