clang-p2996

Author	SHA1	Message	Date
Daniel Krupp	6002e2fd49	[analyzer] Split TaintPropagation checker into reporting and modeling checkers (#98157 ) Taint propagation is a a generic modeling feature of the Clang Static Analyzer which many other checkers depend on. Therefore GenericTaintChecker is split into a TaintPropagation modeling checker and a GenericTaint reporting checker.	2024-07-10 17:54:53 +02:00
Donát Nagy	5ffd154cf6	[analyzer] Clean up list of taint propagation functions (#91635 ) This commit refactors GenericTaintChecker and performs various improvements in the list of taint propagation functions: 1. The matching mode (usually `CDM::CLibrary` or `CDM::CLibraryMaybeHardened`) was specified to avoid matching e.g. C++ methods or functions from a user-defined namespace that happen to share the name of a well-known library function. 2. With these matching modes, a `CallDescription` can automatically match builtin variants of the functions, so entries that explicitly specified a builtin function were removed. This eliminated inconsistencies where the "normal" and the builtin variant of the same function was handled differently (e.g. `__builtin_strlcat` was covered, while plain `strlcat` wasn't; while `__builtin_memcpy` and `memcpy` were both on the list with different propagation rules). 3. The modeling of the functions `strlcat` and `strncat` was updated to propagate taint from the first argument (index 0), because a tainted string should remain tainted even if we append something else to it. Note that this was already applied to `strcat` and `wcsncat` by commit `6ceb1c0ef9`. 4. Some functions were updated to propagate taint from a size/length argument to the result: e.g. `memcmp(p, q, get_tainted_int())` will now return a tainted value (because the attacker can manipulate it). This principle was already present in some propagation rules (e.g. `__builtin_memcpy` was handled this way), and even after this commit there are still some functions where it isn't applied. (I only aimed for consistency within the same function family.) 5. Functions that have hardened `__FOO_chk()` variants are matched in `CDM:CLibraryMaybeHardened` to ensure consistent handling of the "normal" and the hardened variant. I added special handling for the hardened variants of "sprintf" and "snprintf" because there the extra parameters are inserted into the middle of the parameter list. 6. Modeling of `sscanf_s` was added, to complete the group of `fscanf`, `fscanf_s` and `sscanf`. 7. The `Source()` specifications for `gets`, `gets_s` and `wgetch` were ill-formed: they were specifying variadic arguments starting at argument index `ReturnValueIndex`. (That is, in addition to the return value they were propagating taint to all arguments.) 8. Functions that were related to each other were grouped together. (I know that this makes the diff harder to read, but I felt that the full list is unreadable without some reorganization.) 9. I spotted and removed some redundant curly braces. Perhaps would be good to switch to a cleaner layout with less nested braces... 10. I updated some obsolete comments and added two TODOs for issues that should be fixed in followup commits.	2024-05-16 12:10:56 +02:00
Kazu Hirata	deffae5da1	[clang] Use StringRef::operator== instead of StringRef::equals (NFC) (#91844 ) I'm planning to remove StringRef::equals in favor of StringRef::operator==. - StringRef::operator==/!= outnumber StringRef::equals by a factor of 24 under clang/ in terms of their usage. - The elimination of StringRef::equals brings StringRef closer to std::string_view, which has operator== but not equals. - S == "foo" is more readable than S.equals("foo"), especially for !Long.Expression.equals("str") vs Long.Expression != "str".	2024-05-11 11:38:52 -07:00
Daniel Krupp	6ceb1c0ef9	[analyzer] Remove untrusted buffer size warning in the TaintPropagation checker (#68607 ) Before this commit the the checker alpha.security.taint.TaintPropagation always reported warnings when the size argument of a memcpy-like or malloc-like function was tainted. However, this produced false positive reports in situations where the size was tainted, but correctly performed bound checks guaranteed the safety of the call. This commit removes the rough "always warn if the size argument is tainted" heuristic; but it would be good to add a more refined "warns if the size argument is tainted and can be too large" heuristic in follow-up commits. That logic would belong to CStringChecker and MallocChecker, because those are the checkers responsible for the more detailed modeling of memcpy-like and malloc-like functions. To mark this plan, TODO comments are added in those two checkers. There were several test cases that used these sinks to test generic properties of taint tracking; those were adapted to use different logic. As a minor unrelated change, this commit ensures that strcat (and its wide variant, wcsncat) propagates taint from the first argument to the first argument, i.e. a tainted string remains tainted if we concatenate it with another string. This change was required because the adapted variant of multipleTaintedArgs is relying on strncat to compose a value that combines taint from two different sources.	2024-05-02 16:46:41 +02:00
NagyDonat	fb299cae51	[analyzer] Make recognition of hardened __FOO_chk functions explicit (#86536 ) In builds that use source hardening (-D_FORTIFY_SOURCE), many standard functions are implemented as macros that expand to calls of hardened functions that take one additional argument compared to the "usual" variant and perform additional input validation. For example, a `memcpy` call may expand to `__memcpy_chk()` or `__builtin___memcpy_chk()`. Before this commit, `CallDescription`s created with the matching mode `CDM::CLibrary` automatically matched these hardened variants (in a addition to the "usual" function) with a fairly lenient heuristic. Unfortunately this heuristic meant that the `CLibrary` matching mode was only usable by checkers that were prepared to handle matches with an unusual number of arguments. This commit limits the recognition of the hardened functions to a separate matching mode `CDM::CLibraryMaybeHardened` and applies this mode for functions that have hardened variants and were previously recognized with `CDM::CLibrary`. This way checkers that are prepared to handle the hardened variants will be able to detect them easily; while other checkers can simply use `CDM::CLibrary` for matching C library functions (and they won't encounter surprising argument counts). The initial motivation for refactoring this area was that previously `CDM::CLibrary` accepted calls with more arguments/parameters than the expected number, so I wasn't able to use it for `malloc` without accidentally matching calls to the 3-argument BSD kernel malloc. After this commit this "may have more args/params" logic will only activate when we're actually matching a hardened variant function (in `CDM::CLibraryMaybeHardened` mode). The recognition of "sprintf()" and "snprintf()" in CStringChecker was refactored, because previously it was abusing the behavior that extra arguments are accepted even if the matched function is not a hardened variant. This commit also fixes the oversight that the old code would've recognized e.g. `__wmemcpy_chk` as a hardened variant of `memcpy`. After this commit I'm planning to create several follow-up commits that ensure that checkers looking for C library functions use `CDM::CLibrary` as a "sane default" matching mode. This commit is not truly NFC (it eliminates some buggy corner cases), but it does not intentionally modify the behavior of CSA on real-world non-crazy code. As a minor unrelated change I'm eliminating the argument/variable "IsBuiltin" from the evalSprintf function family in CStringChecker, because it was completely unused. --------- Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>	2024-04-05 11:20:27 +02:00
NagyDonat	52a460f9d4	[analyzer] Refactor CallDescription match mode (NFC) (#83432 ) The class `CallDescription` is used to define patterns that are used for matching `CallEvent`s. For example, a `CallDescription{{"std", "find_if"}, 3}` matches a call to `std::find_if` with 3 arguments. However, these patterns are somewhat fuzzy, so this pattern could also match something like `std::__1::find_if` (with an additional namespace layer), or, unfortunately, a `CallDescription` for the well-known function `free()` can match a C++ method named `free()`: https://github.com/llvm/llvm-project/issues/81597 To prevent this kind of ambiguity this commit introduces the enum `CallDescription::Mode` which can limit the pattern matching to non-method function calls (or method calls etc.). After this NFC change, one or more follow-up commits will apply the right pattern matching modes in the ~30 checkers that use `CallDescription`s. Note that `CallDescription` previously had a `Flags` field which had only two supported values: - `CDF_None` was the default "match anything" mode, - `CDF_MaybeBuiltin` was a "match only C library functions and accept some inexact matches" mode. This commit preserves `CDF_MaybeBuiltin` under the more descriptive name `CallDescription::Mode::CLibrary` (or `CDM::CLibrary`). Instead of this "Flags" model I'm switching to a plain enumeration becasue I don't think that there is a natural usecase to combine the different matching modes. (Except for the default "match anything" mode, which is currently kept for compatibility, but will be phased out in the follow-up commits.)	2024-03-04 15:43:37 +01:00
Balazs Benics	f90e063308	[analyzer] Fix taint sink rules for exec-like functions (#66358 ) Variadic arguments were not considered as taint sink arguments. I also decided to extend the list of exec-like functions. (Juliet CWE78_OS_Command_Injection__char_connect_socket_execl)	2023-09-22 07:14:32 +02:00
Daniel Krupp	97495d3159	[analyzer] TaintPropagation checker strlen() should not propagate (#66086 ) strlen(..) call should not propagate taintedness, because it brings in many false positive findings. It is a common pattern to copy user provided input to another buffer. In these cases we always get warnings about tainted data used as the malloc parameter: buf = malloc(strlen(tainted_txt) + 1); // false warning This pattern can lead to a denial of service attack only, when the attacker can directly specify the size of the allocated area as an arbitrary large number (e.g. the value is converted from a user provided string). Later, we could reintroduce strlen() as a taint propagating function with the consideration not to emit warnings when the tainted value cannot be "arbitrarily large" (such as the size of an already allocated memory block). The change has been evaluated on the following open source projects: - memcached: [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - tmux: 0 lost reports - twin [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=twin_v0.8.1_ednikru_taint_nostrlen_baseline&newcheck=twin_v0.8.1_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - vim [1 lost false positive](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - openssl 0 lost reports - sqliste [2 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=sqlite_version-3.33.0_ednikru_taint_nostrlen_baseline&newcheck=sqlite_version-3.33.0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - ffmpeg 0 lost repots - postgresql [3 lost false positives](https://codechecker-demo.eastus.cloudapp.azure.com/Default/reports?run=postgres_REL_13_0_ednikru_taint_nostrlen_baseline&newcheck=postgres_REL_13_0_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved) - tinyxml 0 lost reports - libwebm 0 lost reports - xerces 0 lost reports In all cases the lost reports are originating from copying untrusted environment variables into another buffer. There are 2 types of lost false positive reports: 1) [Where the warning is emitted at the malloc call by the TaintPropagation Checker ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=memcached_1.6.8_ednikru_taint_nostrlen_baseline&newcheck=memcached_1.6.8_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2648506&report-hash=2079221954026f17e1ecb614f5f054db&report-filepath=%2amemcached.c) ` len = strlen(portnumber_filename)+4+1; temp_portnumber_filename = malloc(len); ` 2) When pointers are set based on the length of the tainted string by the ArrayOutofBoundsv2 checker. For example [this ](https://codechecker-demo.eastus.cloudapp.azure.com/Default/report-detail?run=vim_v8.2.1920_ednikru_taint_nostrlen_baseline&newcheck=vim_v8.2.1920_ednikru_taint_nostrlen_new&is-unique=on&diff-type=Resolved&report-id=2649310&report-hash=79dc8522d2cd34ca8e1b2dc2db64b2df&report-filepath=%2aos_unix.c)case.	2023-09-19 11:04:50 +02:00
Balazs Benics	61924da630	[analyzer] Propagate taint for wchar variants of some APIs (#66074 ) Functions like `fgets`, `strlen`, `strcat` propagate taint. However, their `wchar_t` variants don't. This patch fixes that. Notice, that there could be many more APIs missing. This patch intends to fix those that so far surfaced, instead of exhaustively fixing this issue. https://github.com/llvm/llvm-project/pull/66074	2023-09-14 11:55:10 +02:00
Balazs Benics	8243bc4045	[analyzer] Make socket `accept()` propagate taint (#66074 ) This allows to track taint on real code from `socket()` to reading into a buffer using `recv()`. https://github.com/llvm/llvm-project/pull/66074	2023-09-14 11:55:10 +02:00
Balazs Benics	909c963999	[analyzer] Fix stdin declaration in C++ tests (#66074 ) The `stdin` declaration should be within `extern "C" {...}`, in C++ mode. In addition, it should be also marked `extern` in both C and C++ modes. I tightened the check to ensure we only accept `stdin` if both of these match. However, from the Juliet test suite's perspective, this commit should not matter. https://github.com/llvm/llvm-project/pull/66074	2023-09-14 11:55:10 +02:00
Daniel Krupp	26b19a67e5	[clang][analyzer]Fix non-effective taint sanitation There was a bug in alpha.security.taint.TaintPropagation checker in Clang Static Analyzer. Taint filtering could only sanitize const arguments. After this patch, taint filtering is effective also on non-const parameters. Differential Revision: https://reviews.llvm.org/D155848	2023-07-21 15:11:13 +02:00
Jordan Rupprecht	8b39527535	[NFC] Wrap entire debug logging loop in LLVM_DEBUG	2023-04-26 05:28:15 -07:00
Daniel Krupp	343bdb1094	[analyzer] Show taint origin and propagation correctly This patch improves the diagnostics of the alpha.security.taint.TaintPropagation checker and taint related checkers by showing the "Taint originated here" note at the correct place, where the attacker may inject it. This greatly improves the understandability of the taint reports. In the baseline the taint source was pointing to an invalid location, typically somewhere between the real taint source and sink. After the fix, the "Taint originated here" tag is correctly shown at the taint source. This is the function call where the attacker can inject a malicious data (e.g. reading from environment variable, reading from file, reading from standard input etc.). This patch removes the BugVisitor from the implementation and replaces it with 2 new NoteTags. One, in the taintOriginTrackerTag() prints the "taint originated here" Note and the other in taintPropagationExplainerTag() explaining how the taintedness is propagating from argument to argument or to the return value ("Taint propagated to the Xth argument"). This implementation uses the interestingess BugReport utility to track back the tainted symbols through propagating function calls to the point where the taintedness was introduced by a source function call. The checker which wishes to emit a Taint related diagnostic must use the categories::TaintedData BugType category and must mark the tainted symbols as interesting. Then the TaintPropagationChecker will automatically generate the "Taint originated here" and the "Taint propagated to..." diagnostic notes.	2023-04-26 12:43:36 +02:00
Kazu Hirata	6ad0788c33	[clang] Use std::optional instead of llvm::Optional (NFC) This patch replaces (llvm::\|)Optional< with std::optional<. I'll post a separate patch to remove #include "llvm/ADT/Optional.h". This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2023-01-14 12:31:01 -08:00
serge-sans-paille	d9ab3e82f3	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This a recommit of `e953ae5bbc` and the subsequent fixes `caa713559b` and `06b90e2e9c`. The above patchset caused some version of GCC to take eons to compile clang/lib/Basic/Targets/AArch64.cpp, as spotted in `aa171833ab`. The fix is to make BuiltinInfo tables a compilation unit static variable, instead of a private static variable. Differential Revision: https://reviews.llvm.org/D139881	2022-12-27 09:55:19 +01:00
Vitaly Buka	aa171833ab	Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information" Revert "Fix lldb option handling since `e953ae5bbc` (part 2)" Revert "Fix lldb option handling since e953ae5bbc313fd0cc980ce021d487e5b5199ea4" GCC build hangs on this bot https://lab.llvm.org/buildbot/#/builders/37/builds/19104 compiling CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.d The bot uses GNU 11.3.0, but I can reproduce locally with gcc (Debian 12.2.0-3) 12.2.0. This reverts commit `caa713559b`. This reverts commit `06b90e2e9c`. This reverts commit `e953ae5bbc`.	2022-12-25 23:12:47 -08:00
serge-sans-paille	e953ae5bbc	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This is a recommit of `719d98dfa8` that into account a GGC issue (probably https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92181) when dealing with intiailizer_list and constant expressions. Workaround this by avoiding initializer list, at the expense of a temporary plain old array. Differential Revision: https://reviews.llvm.org/D139881	2022-12-24 10:25:06 +01:00
serge-sans-paille	07d9ab9aa5	Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information" There are still remaining issues with GCC 12, see for instance https://lab.llvm.org/buildbot/#/builders/93/builds/12669 This reverts commit `5ce4e92264`.	2022-12-23 13:29:21 +01:00
serge-sans-paille	5ce4e92264	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u This is a recommit of `719d98dfa8` with a change to llvm/utils/TableGen/OptParserEmitter.cpp to cope with GCC bug https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158 Differential Revision: https://reviews.llvm.org/D139881	2022-12-23 12:48:17 +01:00
serge-sans-paille	b7065a31b5	Revert "[clang] Use a StringRef instead of a raw char pointer to store builtin and call information" Failing builds: https://lab.llvm.org/buildbot#builders/9/builds/19030 This is GCC specific and has been reported upstream: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108158 This reverts commit `719d98dfa8`.	2022-12-23 11:36:56 +01:00
serge-sans-paille	719d98dfa8	[clang] Use a StringRef instead of a raw char pointer to store builtin and call information This avoids recomputing string length that is already known at compile time. It has a slight impact on preprocessing / compile time, see https://llvm-compile-time-tracker.com/compare.php?from=3f36d2d579d8b0e8824d9dd99bfa79f456858f88&to=e49640c507ddc6615b5e503144301c8e41f8f434&stat=instructions:u Differential Revision: https://reviews.llvm.org/D139881	2022-12-23 10:31:47 +01:00
Kazu Hirata	6c8b8a6a2a	[Checkers] Use std::optional in GenericTaintChecker.cpp (NFC) This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-10 08:00:24 -08:00
Kazu Hirata	180600660b	[StaticAnalyzer] Use std::nullopt instead of None (NFC) This patch mechanically replaces None with std::nullopt where the compiler would warn if None were deprecated. The intent is to reduce the amount of manual work required in migrating from Optional to std::optional. This is part of an effort to migrate from llvm::Optional to std::optional: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716	2022-12-03 11:34:24 -08:00
Kazu Hirata	ca4af13e48	[clang] Don't use Optional::getValue (NFC)	2022-06-20 22:59:26 -07:00
Kazu Hirata	064a08cd95	Don't use Optional::hasValue (NFC)	2022-06-20 20:05:16 -07:00
Kazu Hirata	06decd0b41	[clang] Use value_or instead of getValueOr (NFC)	2022-06-18 23:21:34 -07:00
Balazs Benics	f4fc3f6ba3	[analyzer] Treat system globals as mutable if they are not const Previously, system globals were treated as immutable regions, unless it was the `errno` which is known to be frequently modified. D124244 wants to add a check for stores to immutable regions. It would basically turn all stores to system globals into an error even though we have no reason to believe that those mutable sys globals should be treated as if they were immutable. And this leads to false-positives if we apply D124244. In this patch, I'm proposing to treat mutable sys globals actually mutable, hence allocate them into the `GlobalSystemSpaceRegion`, UNLESS they were declared as `const` (and a primitive arithmetic type), in which case, we should use `GlobalImmutableSpaceRegion`. In any other cases, I'm using the `GlobalInternalSpaceRegion`, which is no different than the previous behavior. --- In the tests I added, only the last `expected-warning` was different, compared to the baseline. Which is this: ```lang=C++ void test_my_mutable_system_global_constraint() { assert(my_mutable_system_global > 2); clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{TRUE}} invalidate_globals(); clang_analyzer_eval(my_mutable_system_global > 2); // expected-warning {{UNKNOWN}} It was previously TRUE. } void test_my_mutable_system_global_assign(int x) { my_mutable_system_global = x; clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{TRUE}} invalidate_globals(); clang_analyzer_eval(my_mutable_system_global == x); // expected-warning {{UNKNOWN}} It was previously TRUE. } ``` --- Unfortunately, the taint checker will be also affected. The `stdin` global variable is a pointer, which is assumed to be a taint source, and the rest of the taint propagation rules will propagate from it. However, since mutable variables are no longer treated immutable, they also get invalidated, when an opaque function call happens, such as the first `scanf(stdin, ...)`. This would effectively remove taint from the pointer, consequently disable all the rest of the taint propagations down the line from the `stdin` variable. All that said, I decided to look through `DerivedSymbol`s as well, to acquire the memregion in that case as well. This should preserve the previously existing taint reports. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D127306	2022-06-15 17:08:27 +02:00
Balazs Benics	96ccb690a0	[analyzer][NFC] Prefer using isa<> instead getAs<> in conditions Depends on D125709 Reviewed By: martong Differential Revision: https://reviews.llvm.org/D127742	2022-06-15 16:58:13 +02:00
Balazs Benics	f1b18a79b7	[analyzer][NFC] Remove dead code and modernize surroundings Thanks @kazu for helping me clean these parts in D127799. I'm leaving the dump methods, along with the unused visitor handlers and the forwarding methods. The dead parts actually helped to uncover two bugs, to which I'm going to post separate patches. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D127836	2022-06-15 16:50:12 +02:00
Tom Ritter	82f3ed9904	[analyzer] Expose Taint.h to plugins Reviewed By: NoQ, xazax.hun, steakhal Differential Revision: https://reviews.llvm.org/D123155	2022-04-19 16:55:01 +02:00
Endre Fülöp	4fd6c6e65a	[analyzer] Add more propagations to Taint analysis Add more functions as taint propators to GenericTaintChecker. Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D120369	2022-03-07 13:18:54 +01:00
Endre Fülöp	34a7387986	[analyzer] Add more sources to Taint analysis Add more functions as taint sources to GenericTaintChecker. Reviewed By: steakhal Differential Revision: https://reviews.llvm.org/D120236	2022-02-28 11:33:02 +01:00
Fangrui Song	ecff9b65b5	[analyzer] Just use default capture after `7fd60ee6e0`	2022-02-24 10:06:11 -08:00
Fangrui Song	7fd60ee6e0	[analyzer] Fix -Wunused-lambda-capture in -DLLVM_ENABLE_ASSERTIONS=off builds	2022-02-24 00:13:13 -08:00
Balazs Benics	7036413dc2	Revert "Revert "[analyzer] Fix taint rule of fgets and setproctitle_init"" This reverts commit `2acead35c1`. Let's try `REQUIRES: asserts`.	2022-02-23 12:55:31 +01:00
Balazs Benics	a848a5cf2f	Revert "Revert "[analyzer] Fix taint propagation by remembering to the location context"" This reverts commit `d16c5f4192`. Let's try `REQUIRES: asserts`.	2022-02-23 12:53:07 +01:00
Balazs Benics	fa0a80e017	Revert "Revert "[analyzer] Add failing test case demonstrating buggy taint propagation"" This reverts commit `b8ae323cca`. Let's try `REQUIRES: asserts`.	2022-02-23 10:48:06 +01:00
Balazs Benics	b8ae323cca	Revert "[analyzer] Add failing test case demonstrating buggy taint propagation" This reverts commit `744745ae19`. I'm reverting this since this patch caused a build breakage. https://lab.llvm.org/buildbot/#/builders/91/builds/3818	2022-02-14 18:45:46 +01:00
Balazs Benics	d16c5f4192	Revert "[analyzer] Fix taint propagation by remembering to the location context" This reverts commit `b099e1e562`. I'm reverting this since the head of the patch stack caused a build breakage. https://lab.llvm.org/buildbot/#/builders/91/builds/3818	2022-02-14 18:45:46 +01:00
Balazs Benics	2acead35c1	Revert "[analyzer] Fix taint rule of fgets and setproctitle_init" This reverts commit `bf5963bf19`. I'm reverting this since the head of the patch stack caused a build breakage. https://lab.llvm.org/buildbot/#/builders/91/builds/3818	2022-02-14 18:45:46 +01:00
Balazs Benics	bf5963bf19	[analyzer] Fix taint rule of fgets and setproctitle_init There was a typo in the rule. `{{0}, ReturnValueIndex}` meant that the discrete index is `0` and the variadic index is `-1`. What we wanted instead is that both `0` and `-1` are in the discrete index list. Instead of this, we wanted to express that both `0` and the `ReturnValueIndex` is in the discrete arg list. The manual inspection revealed that `setproctitle_init` also suffered a probably incomplete propagation rule. Reviewed By: Szelethus, gamesh411 Differential Revision: https://reviews.llvm.org/D119129	2022-02-14 16:55:55 +01:00
Balazs Benics	b099e1e562	[analyzer] Fix taint propagation by remembering to the location context Fixes the issue D118987 by mapping the propagation to the callsite's LocationContext. This way we can keep track of the in-flight propagations. Note that empty propagation sets won't be inserted. Reviewed By: NoQ, Szelethus Differential Revision: https://reviews.llvm.org/D119128	2022-02-14 16:55:55 +01:00
Balazs Benics	744745ae19	[analyzer] Add failing test case demonstrating buggy taint propagation Recently we uncovered a serious bug in the `GenericTaintChecker`. It was already flawed before D116025, but that was the patch that turned this silent bug into a crash. It happens if the `GenericTaintChecker` has a rule for a function, which also has a definition. char fgets(char s, int n, FILE fp) { nested_call(); // no parameters! return (char )0; } // Within some function: fgets(..., tainted_fd); When the engine inlines the definition and finds a function call within that, the `PostCall` event for the call will get triggered sooner than the `PostCall` for the original function. This mismatch violates the assumption of the `GenericTaintChecker` which wants to propagate taint information from the `PreCall` event to the `PostCall` event, where it can actually bind taint to the return value of the same call. Let's get back to the example and go through step-by-step. The `GenericTaintChecker` will see the `PreCall<fgets(..., tainted_fd)>` event, so it would 'remember' that it needs to taint the return value and the buffer, from the `PostCall` handler, where it has access to the return value symbol. However, the engine will inline fgets and the `nested_call()` gets evaluated subsequently, which produces an unimportant `PreCall<nested_call()>`, then a `PostCall<nested_call()>` event, which is observed by the `GenericTaintChecker`, which will unconditionally mark tainted the 'remembered' arg indexes, trying to access a non-existing argument, resulting in a crash. If it doesn't crash, it will behave completely unintuitively, by marking completely unrelated memory regions tainted, which is even worse. The resulting assertion is something like this: Expr.h: const Expr *CallExpr::getArg(unsigned int) const: Assertion `Arg < getNumArgs() && "Arg access out of range!"' failed. The gist of the backtrace: CallExpr::getArg(unsigned int) const SimpleFunctionCall::getArgExpr(unsigned int) CallEvent::getArgSVal(unsigned int) const GenericTaintChecker::checkPostCall(const CallEvent &, CheckerContext&) const Prior to D116025, there was a check for the argument count before it applied taint, however, it still suffered from the same underlying issue/bug regarding propagation. This path does not intend to fix the bug, rather start a discussion on how to fix this. --- Let me elaborate on how I see this problem. This pre-call, post-call juggling is just a workaround. The engine should by itself propagate taint where necessary right where it invalidates regions. For the tracked values, which potentially escape, we need to erase the information we know about them; and this is exactly what is done by invalidation. However, in the case of taint, we basically want to approximate from the opposite side of the spectrum. We want to preserve taint in most cases, rather than cleansing them. Now, we basically sanitize all escaping tainted regions implicitly, since invalidation binds a fresh conjured symbol for the given region, and that has not been associated with taint. IMO this is a bad default behavior, we should be more aggressive about preserving taint if not further spreading taint to the reachable regions. We have a couple of options for dealing with it (let's call it //tainting policy//): 1) Taint only the parameters which were tainted prior to the call. 2) Taint the return value of the call, since it likely depends on the tainted input - if any arguments were tainted. 3) Taint all escaped regions - (maybe transitively using the cluster algorithm) - if any arguments were tainted. 4) Not taint anything - this is what we do right now :D The `ExprEngine` should not deal with taint on its own. It should be done by a checker, such as the `GenericTaintChecker`. However, the `Pre`-`PostCall` checker callbacks are not designed for this. `RegionChanges` would be a much better fit for modeling taint propagation. What we would need in the `RegionChanges` callback is the `State` prior invalidation, the `State` after the invalidation, and a `CheckerContext` in which the checker can create transitions, where it would place `NoteTags` for the modeled taint propagations and report errors if a taint sink rule gets violated. In this callback, we could query from the prior State, if the given value was tainted; then act and taint if necessary according to the checker's tainting policy. By using RegionChanges for this, we would 'fix' the mentioned propagation bug 'by-design'. Reviewed By: Szelethus Differential Revision: https://reviews.llvm.org/D118987	2022-02-14 16:55:55 +01:00
Tres Popp	262cc74e0b	Fix pair construction with an implicit constructor inside.	2022-01-18 18:01:52 +01:00
Endre Fülöp	17f74240e6	[analyzer][NFC] Refactor GenericTaintChecker to use CallDescriptionMap GenericTaintChecker now uses CallDescriptionMap to describe the possible operation in code which trigger the introduction (sources), the removal (filters), the passing along (propagations) and detection (sinks) of tainted values. Reviewed By: steakhal, NoQ Differential Revision: https://reviews.llvm.org/D116025	2022-01-18 16:04:04 +01:00
Balazs Benics	49285f43e5	[analyzer] sprintf is a taint propagator not a source Due to a typo, `sprintf()` was recognized as a taint source instead of a taint propagator. It was because an empty taint source list - which is the first parameter of the `TaintPropagationRule` - encoded the unconditional taint sources. This typo effectively turned the `sprintf()` into an unconditional taint source. This patch fixes that typo and demonstrated the correct behavior with tests. Reviewed By: martong Differential Revision: https://reviews.llvm.org/D112558	2021-10-28 11:03:02 +02:00
Kazu Hirata	0abb5d293c	[Sema, StaticAnalyzer] Use StringRef::contains (NFC)	2021-10-20 08:02:36 -07:00
Kazu Hirata	e567f37dab	[clang] Use llvm::is_contained (NFC)	2021-10-13 20:41:55 -07:00
Balazs Benics	edde4efc66	[analyzer] Introduce the assume-controlled-environment config option If the `assume-controlled-environment` is `true`, we should expect `getenv()` to succeed, and the result should not be considered tainted. By default, the option will be `false`. Reviewed By: NoQ, martong Differential Revision: https://reviews.llvm.org/D111296	2021-10-13 10:50:26 +02:00

1 2 3

126 Commits