Files
clang-p2996/clang/lib/StaticAnalyzer/Checkers/StdVariantChecker.cpp
Gábor Spaits 527fcb8e5d [analyzer] Add std::variant checker (#66481)
As my BSc thesis I've implemented a checker for std::variant and
std::any, and in the following weeks I'll upload a revised version of
them here.

# Prelude

@Szelethus and I sent out an email with our initial plans here:
https://discourse.llvm.org/t/analyzer-new-checker-for-std-any-as-a-bsc-thesis/65613/2
We also created a stub checker patch here:
https://reviews.llvm.org/D142354.

Upon the recommendation of @haoNoQ , we explored an option where instead
of writing a checker, we tried to improve on how the analyzer natively
inlined the methods of std::variant and std::any. Our attempt is in this
patch https://reviews.llvm.org/D145069, but in a nutshell, this is what
happened: The analyzer was able to model much of what happened inside
those classes, but our false positive suppression machinery erroneously
suppressed it. After months of trying, we could not find a satisfying
enhancement on the heuristic without introducing an allowlist/denylist
of which functions to not suppress.

As a result (and partly on the encouragement of @Xazax-hun) I wrote a
dedicated checker!

The advantage of the checker is that it is not dependent on the
standard's implementation and won't put warnings in the standard library
definitions. Also without the checker it would be difficult to create
nice user-friendly warnings and NoteTags -- as per the standard's
specification, the analysis is sinked by an exception, which we don't
model well now.

# Design ideas

The working of the checker is straightforward: We find the creation of
an std::variant instance, store the type of the variable we want to
store in it, then save this type for the instance. When retrieving type
from the instance we check what type we want to retrieve as, and compare
it to the actual type. If the two don't march we emit an error.

Distinguishing variants by instance (e.g. MemRegion *) is not the most
optimal way. Other checkers, like MallocChecker uses a symbol-to-trait
map instead of region-to-trait. The upside of using symbols (which would
be the value of a variant, not the variant itself itself) is that the
analyzer would take care of modeling copies, moves, invalidation, etc,
out of the box. The problem is that for compound types, the analyzer
doesn't create a symbol as a result of a constructor call that is fit
for this job. MallocChecker in contrast manipulates simple pointers.

My colleges and I considered the option of making adjustments directly
to the memory model of the analyzer, but for the time being decided
against it, and go with the bit more cumbersome, but immediately viable
option of simply using MemRegions.

# Current state and review plan

This patch contains an already working checker that can find and report
certain variant/any misuses, but still lands it in alpha. I plan to
upload the rest of the checker in later patches.

The full checker is also able to "follow" the symbolic value held by the
std::variant and updates the program state whenever we assign the value
stored in the variant. I have also built a library that is meant to
model union-like types similar to variant, hence some functions being a
bit more multipurpose then is immediately needed.

I also intend to publish my std::any checker in a later commit.

---------

Co-authored-by: Gabor Spaits <gabor.spaits@ericsson.com>
Co-authored-by: Balazs Benics <benicsbalazs@gmail.com>
2023-11-21 14:02:22 +01:00

298 lines
10 KiB
C++

//===- StdVariantChecker.cpp -------------------------------------*- C++ -*-==//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//
//===----------------------------------------------------------------------===//
#include "clang/AST/Type.h"
#include "clang/StaticAnalyzer/Checkers/BuiltinCheckerRegistration.h"
#include "clang/StaticAnalyzer/Core/BugReporter/BugType.h"
#include "clang/StaticAnalyzer/Core/Checker.h"
#include "clang/StaticAnalyzer/Core/CheckerManager.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CallDescription.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CallEvent.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/CheckerContext.h"
#include "clang/StaticAnalyzer/Core/PathSensitive/SVals.h"
#include "llvm/ADT/FoldingSet.h"
#include "llvm/ADT/StringRef.h"
#include "llvm/Support/Casting.h"
#include <optional>
#include <string_view>
#include "TaggedUnionModeling.h"
using namespace clang;
using namespace ento;
using namespace tagged_union_modeling;
REGISTER_MAP_WITH_PROGRAMSTATE(VariantHeldTypeMap, const MemRegion *, QualType)
namespace clang::ento::tagged_union_modeling {
const CXXConstructorDecl *
getConstructorDeclarationForCall(const CallEvent &Call) {
const auto *ConstructorCall = dyn_cast<CXXConstructorCall>(&Call);
if (!ConstructorCall)
return nullptr;
return ConstructorCall->getDecl();
}
bool isCopyConstructorCall(const CallEvent &Call) {
if (const CXXConstructorDecl *ConstructorDecl =
getConstructorDeclarationForCall(Call))
return ConstructorDecl->isCopyConstructor();
return false;
}
bool isCopyAssignmentCall(const CallEvent &Call) {
const Decl *CopyAssignmentDecl = Call.getDecl();
if (const auto *AsMethodDecl =
dyn_cast_or_null<CXXMethodDecl>(CopyAssignmentDecl))
return AsMethodDecl->isCopyAssignmentOperator();
return false;
}
bool isMoveConstructorCall(const CallEvent &Call) {
const CXXConstructorDecl *ConstructorDecl =
getConstructorDeclarationForCall(Call);
if (!ConstructorDecl)
return false;
return ConstructorDecl->isMoveConstructor();
}
bool isMoveAssignmentCall(const CallEvent &Call) {
const Decl *CopyAssignmentDecl = Call.getDecl();
const auto *AsMethodDecl =
dyn_cast_or_null<CXXMethodDecl>(CopyAssignmentDecl);
if (!AsMethodDecl)
return false;
return AsMethodDecl->isMoveAssignmentOperator();
}
bool isStdType(const Type *Type, llvm::StringRef TypeName) {
auto *Decl = Type->getAsRecordDecl();
if (!Decl)
return false;
return (Decl->getName() == TypeName) && Decl->isInStdNamespace();
}
bool isStdVariant(const Type *Type) {
return isStdType(Type, llvm::StringLiteral("variant"));
}
} // end of namespace clang::ento::tagged_union_modeling
static std::optional<ArrayRef<TemplateArgument>>
getTemplateArgsFromVariant(const Type *VariantType) {
const auto *TempSpecType = VariantType->getAs<TemplateSpecializationType>();
if (!TempSpecType)
return {};
return TempSpecType->template_arguments();
}
static std::optional<QualType>
getNthTemplateTypeArgFromVariant(const Type *varType, unsigned i) {
std::optional<ArrayRef<TemplateArgument>> VariantTemplates =
getTemplateArgsFromVariant(varType);
if (!VariantTemplates)
return {};
return (*VariantTemplates)[i].getAsType();
}
static bool isVowel(char a) {
switch (a) {
case 'a':
case 'e':
case 'i':
case 'o':
case 'u':
return true;
default:
return false;
}
}
static llvm::StringRef indefiniteArticleBasedOnVowel(char a) {
if (isVowel(a))
return "an";
return "a";
}
class StdVariantChecker : public Checker<eval::Call, check::RegionChanges> {
// Call descriptors to find relevant calls
CallDescription VariantConstructor{{"std", "variant", "variant"}};
CallDescription VariantAssignmentOperator{{"std", "variant", "operator="}};
CallDescription StdGet{{"std", "get"}, 1, 1};
BugType BadVariantType{this, "BadVariantType", "BadVariantType"};
public:
ProgramStateRef checkRegionChanges(ProgramStateRef State,
const InvalidatedSymbols *,
ArrayRef<const MemRegion *>,
ArrayRef<const MemRegion *> Regions,
const LocationContext *,
const CallEvent *Call) const {
if (!Call)
return State;
return removeInformationStoredForDeadInstances<VariantHeldTypeMap>(
*Call, State, Regions);
}
bool evalCall(const CallEvent &Call, CheckerContext &C) const {
// Check if the call was not made from a system header. If it was then
// we do an early return because it is part of the implementation.
if (Call.isCalledFromSystemHeader())
return false;
if (StdGet.matches(Call))
return handleStdGetCall(Call, C);
// First check if a constructor call is happening. If it is a
// constructor call, check if it is an std::variant constructor call.
bool IsVariantConstructor =
isa<CXXConstructorCall>(Call) && VariantConstructor.matches(Call);
bool IsVariantAssignmentOperatorCall =
isa<CXXMemberOperatorCall>(Call) &&
VariantAssignmentOperator.matches(Call);
if (IsVariantConstructor || IsVariantAssignmentOperatorCall) {
if (Call.getNumArgs() == 0 && IsVariantConstructor) {
handleDefaultConstructor(cast<CXXConstructorCall>(&Call), C);
return true;
}
// FIXME Later this checker should be extended to handle constructors
// with multiple arguments.
if (Call.getNumArgs() != 1)
return false;
SVal ThisSVal;
if (IsVariantConstructor) {
const auto &AsConstructorCall = cast<CXXConstructorCall>(Call);
ThisSVal = AsConstructorCall.getCXXThisVal();
} else if (IsVariantAssignmentOperatorCall) {
const auto &AsMemberOpCall = cast<CXXMemberOperatorCall>(Call);
ThisSVal = AsMemberOpCall.getCXXThisVal();
} else {
return false;
}
handleConstructorAndAssignment<VariantHeldTypeMap>(Call, C, ThisSVal);
return true;
}
return false;
}
private:
// The default constructed std::variant must be handled separately
// by default the std::variant is going to hold a default constructed instance
// of the first type of the possible types
void handleDefaultConstructor(const CXXConstructorCall *ConstructorCall,
CheckerContext &C) const {
SVal ThisSVal = ConstructorCall->getCXXThisVal();
const auto *const ThisMemRegion = ThisSVal.getAsRegion();
if (!ThisMemRegion)
return;
std::optional<QualType> DefaultType = getNthTemplateTypeArgFromVariant(
ThisSVal.getType(C.getASTContext())->getPointeeType().getTypePtr(), 0);
if (!DefaultType)
return;
ProgramStateRef State = ConstructorCall->getState();
State = State->set<VariantHeldTypeMap>(ThisMemRegion, *DefaultType);
C.addTransition(State);
}
bool handleStdGetCall(const CallEvent &Call, CheckerContext &C) const {
ProgramStateRef State = Call.getState();
const auto &ArgType = Call.getArgSVal(0)
.getType(C.getASTContext())
->getPointeeType()
.getTypePtr();
// We have to make sure that the argument is an std::variant.
// There is another std::get with std::pair argument
if (!isStdVariant(ArgType))
return false;
// Get the mem region of the argument std::variant and look up the type
// information that we know about it.
const MemRegion *ArgMemRegion = Call.getArgSVal(0).getAsRegion();
const QualType *StoredType = State->get<VariantHeldTypeMap>(ArgMemRegion);
if (!StoredType)
return false;
const CallExpr *CE = cast<CallExpr>(Call.getOriginExpr());
const FunctionDecl *FD = CE->getDirectCallee();
if (FD->getTemplateSpecializationArgs()->size() < 1)
return false;
const auto &TypeOut = FD->getTemplateSpecializationArgs()->asArray()[0];
// std::get's first template parameter can be the type we want to get
// out of the std::variant or a natural number which is the position of
// the requested type in the argument type list of the std::variant's
// argument.
QualType RetrievedType;
switch (TypeOut.getKind()) {
case TemplateArgument::ArgKind::Type:
RetrievedType = TypeOut.getAsType();
break;
case TemplateArgument::ArgKind::Integral:
// In the natural number case we look up which type corresponds to the
// number.
if (std::optional<QualType> NthTemplate =
getNthTemplateTypeArgFromVariant(
ArgType, TypeOut.getAsIntegral().getSExtValue())) {
RetrievedType = *NthTemplate;
break;
}
[[fallthrough]];
default:
return false;
}
QualType RetrievedCanonicalType = RetrievedType.getCanonicalType();
QualType StoredCanonicalType = StoredType->getCanonicalType();
if (RetrievedCanonicalType == StoredCanonicalType)
return true;
ExplodedNode *ErrNode = C.generateNonFatalErrorNode();
if (!ErrNode)
return false;
llvm::SmallString<128> Str;
llvm::raw_svector_ostream OS(Str);
std::string StoredTypeName = StoredType->getAsString();
std::string RetrievedTypeName = RetrievedType.getAsString();
OS << "std::variant " << ArgMemRegion->getDescriptiveName() << " held "
<< indefiniteArticleBasedOnVowel(StoredTypeName[0]) << " \'"
<< StoredTypeName << "\', not "
<< indefiniteArticleBasedOnVowel(RetrievedTypeName[0]) << " \'"
<< RetrievedTypeName << "\'";
auto R = std::make_unique<PathSensitiveBugReport>(BadVariantType, OS.str(),
ErrNode);
C.emitReport(std::move(R));
return true;
}
};
bool clang::ento::shouldRegisterStdVariantChecker(
clang::ento::CheckerManager const &mgr) {
return true;
}
void clang::ento::registerStdVariantChecker(clang::ento::CheckerManager &mgr) {
mgr.registerChecker<StdVariantChecker>();
}