In D114095, `HeaderFileInfo::NumIncludes` was moved into `Preprocessor`. This still makes sense, because we want to track this on the granularity of submodules (D112915, D114173), but the way this information is serialized is not ideal. In `ASTWriter`, the set of included files gets deserialized eagerly, issuing lots of calls to `FileManager::getFile()` for input files the PCM consumer might not be interested in.
This patch makes the information part of the header file info table, taking advantage of its lazy deserialization which typically happens when a file is about to be included.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D155131
This is recommit of 98390ccb80, reverted
in 82a3969d71, because it caused
https://github.com/llvm/llvm-project/issues/63542. Although the problem
described in the issue is independent of the reverted patch, fail of
PCH/late-parsed-instantiations.cpp indeed obseved on PowerPC and is
likely to be caused by wrong serialization of `LateParsedTemplate`
objects. In this patch the serialization is fixed.
Original commit message is below.
Previously function template instantiations occurred with FP options
that were in effect at the end of translation unit. It was a problem
for late template parsing as these FP options were used as attributes of
AST nodes and may result in crash. To fix it FP options are set to the
state of the point of template definition.
Differential Revision: https://reviews.llvm.org/D143241
In `clang-scan-deps` contexts, the number of interesting identifiers in PCM files is fairly low (only macros), while the number of identifiers in the importing instance is high (builtins). Marking the whole identifier table out-of-date triggers lots of benign and expensive calls to `ASTReader::updateOutOfDateIdentifiers()`. (That unfortunately happens even for unused identifiers due to `SemaRef.IdResolver.begin(II)` line in `ASTWriter::WriteASTCore()`.)
This patch makes the main code path more similar to C++ modules, where the PCM files have `INTERESTING_IDENTIFIERS` section which lists identifiers that get created in the identifier table of the importing instance and marked as out-of-date. The only difference is that the main code path doesn't *create* identifiers in the table and relies on the importing instance calling `ASTReader::get()` when creating new identifier on-demand. It only marks existing identifiers as out-of-date.
This speeds up `clang-scan-deps` by 5-10%.
Reviewed By: Bigcheese, benlangmuir
Differential Revision: https://reviews.llvm.org/D151277
When writing a pcm, we serialize diagnostic mappings in order to
accurately reproduce the diagnostic environment inside any headers from
that module. However, the diagnostic state mapping table contains
entries for every diagnostic ID ever accessed, while we only want to
serialize the ones that are actually modified from their default value.
Futher, we need to serialize them in a deterministic order.
rdar://111477511
Differential Revision: https://reviews.llvm.org/D154016
In D152070 we added many new intrinsic types required for the RISC-V
Vector Extension.
This was crashing when loading the AST as those types are intrinsically
added to the AST (they don't come from the disk).
The total number required now by clang exceeds 400 so increasing the
value to 500 solves the problem. This value was already increased in
D92715 but I assume this has some impact on the on-disk format.
Also add a static assert to avoid this happening again in the future.
Differential Revision: https://reviews.llvm.org/D153111
This patch swaps out the `void *` behind `CXFile` from `FileEntry *` to `FileEntryRef::MapEntry *`. This allows us to remove some deprecated uses of `FileEntry::getName()`.
Depends on D151854.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D151938
This patch forbids to write comment to BMIs for C++20 Named Modules.
Originally I thought this was helpful for language services like clangd.
But I found clangd don't want the BMI to contain comments actually. So
it is meaningless for C++20 Named Modules to keep such comments in
their BMI.
It is simple to enable this when someday we found we want this actually.
_Generic accepts an expression operand whose type is matched against a
list of associations. The expression operand is unevaluated, but the
type matched is the type after lvalue conversion. This conversion loses
type information, which makes it more difficult to match against
qualified or incomplete types.
This extension allows _Generic to accept a type operand instead of an
expression operand. The type operand form does not undergo any
conversions and is matched directly against the association list.
This extension is also supported in C++ as we already supported
_Generic selection expressions there.
The RFC for this extension can be found at:
https://discourse.llvm.org/t/rfc-generic-selection-expression-with-a-type-operand/70388
Differential Revision: https://reviews.llvm.org/D149904
This mimics the `ModuleMap` API and enables D151854, where the `AllowCreation = true` function needs `FileEntryRef` but `AllowCreation = false` functions is happy with plain `FileEntry`. No functional change intended.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D151853
Platform-specific language extensions often want to provide a way of
indicating that certain functions should be called in a different way,
compiled in a different way, or otherwise treated differently from a
“normal” function. Honoring these indications is often required for
correctness, rather being than an optimization/QoI thing.
If a function declaration has a property P that matters for correctness,
it will be ODR-incompatible with a function that does not have property P.
If a function type has a property P that affects the calling convention,
it will not be two-way compatible with a function type that does not
have property P. These properties therefore affect language semantics.
That in turn means that they cannot be treated as standard [[]]
attributes.
Until now, many of these properties have been specified using GNU-style
attributes instead. GNU attributes have traditionally been more lax
than standard attributes, with many of them having semantic meaning.
Examples include calling conventions and the vector_size attribute.
However, there is a big drawback to using GNU attributes for semantic
information: compilers that don't understand the attributes will
(by default) emit a warning rather than an error. They will go on to
compile the code as though the attributes weren't present, which will
inevitably lead to wrong code in most cases. For users who live
dangerously and disable the warning, this wrong code could even be
generated silently.
A more robust approach would be to specify the properties using
keywords, which older compilers would then reject. Some vendor-specific
extensions have already taken this approach. But traditionally, each
such keyword has been treated as a language extension in its own right.
This has three major drawbacks:
(1) The parsing rules need to be kept up-to-date as the language evolves.
(2) There are often corner cases that similar extensions handle differently.
(3) Each extension requires more custom code than a standard attribute.
The underlying problem for all three is that, unlike for true attributes,
there is no established template that extensions can reuse. The purpose
of this patch series is to try to provide such a template.
One option would have been to pick an existing keyword and do whatever
that keyword does. The problem with that is that most keywords only
apply to specific kinds of types, kinds of decls, etc., and so the
parsing rules are (for good reason) not generally applicable to all
types and decls.
Really, the “only” thing wrong with using standard attributes is that
standard attributes cannot affect semantics. In all other respects
they provide exactly what we need: a well-defined grammar that evolves
with the language, clear rules about what an attribute appertains to,
and so on.
This series therefore adds keyword “attributes” that can appear
exactly where a standard attribute can appear and that appertain
to exactly what a standard attribute would appertain to. The link is
mechanical and no opt-outs or variations are allowed. This should
make the keywords predictable for programmers who are already
familiar with standard attributes.
This does mean that these keywords will be accepted for parsing purposes
in many more places than necessary. Inappropriate uses will then be
diagnosed during semantic analysis. However, the compiler would need
to reject the keywords in those positions whatever happens, and treating
them as ostensible attributes shouldn't be any worse than the alternative.
In some cases it might even be better. For example, SME's
__arm_streaming attribute would make conceptual sense as a statement
attribute, so someone who takes a “try-it-and-see” approach might write:
__arm_streaming { …block-of-code…; }
In fact, we did consider supporting this originally. The reason for
rejecting it was that it was too difficult to implement, rather than
because it didn't make conceptual sense.
One slight disadvantage of the keyword-based approach is that it isn't
possible to use #pragma clang attribute with the keywords. Perhaps we
could add support for that in future, if it turns out to be useful.
For want of a better term, I've called the new attributes "regular"
keyword attributes (in the sense that their parsing is regular wrt
standard attributes), as opposed to "custom" keyword attributes that
have their own parsing rules.
This patch adds the Attr.td support for regular keyword attributes.
Adding an attribute with a RegularKeyword spelling causes tablegen
to define the associated tokens and to record that attributes created
with that syntax are regular keyword attributes rather than custom
keyword attributes.
A follow-on patch contains the main Parse and Sema support,
which is enabled automatically by the Attr.td definition.
Other notes:
* The series does not allow regular keyword attributes to take
arguments, but this could be added in future.
* I wondered about trying to use tablegen for
TypePrinter::printAttributedAfter too, but decided against it.
RegularKeyword is really a spelling-level classification rather
than an attribute-level classification, and in general, an attribute
could have both GNU and RegularKeyword spellings. In contrast,
printAttributedAfter is only given the attribute kind and the type
that results from applying the attribute. AFAIK, it doesn't have
access to the original attribute spelling. This means that some
attribute-specific or type-specific knowledge might be needed
to print the attribute in the best way.
* Generating the tokens automatically from Attr.td means that
pseudo's libgrammar does now depend on tablegen.
* The patch uses the SME __arm_streaming attribute as an example
for testing purposes. The attribute does not do anything at this
stage. Later SME-specific patches will add proper semantics for it,
and add other SME-related keyword attributes.
Differential Revision: https://reviews.llvm.org/D148700
Most users of `Module::Header` already assume its `Entry` is populated. Enforce this assumption in the type system and handle the only case where this is not the case by wrapping the whole struct in `std::optional`. Do the same for `Module::DirectoryName`.
Depends on D151584.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D151586
The function was renamed to ReadTemplateKWAndArgsInfo, but the
original declaration remained:
commit 7945c981b9
Author: Abramo Bagnara <abramo.bagnara@gmail.com>
Date: Fri Jan 27 09:46:47 2012 +0000
For modules with umbrellas, we track how they were written in the module map. Unfortunately, the getter for the umbrella directory conflates the "as written" directory and the "effective" directory (either the written one or the parent of the written umbrella header).
This patch makes the distinction between "as written" and "effective" umbrella directories clearer. No functional change intended.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D151581
Close https://github.com/llvm/llvm-project/issues/62796.
Previously, we didn't serialize the evaluated result for VarDecl. This
caused the compilation of template metaprogramming become slower than
expect. This patch fixes the issue.
This is a recommit tested with asan built clang.
Close https://github.com/llvm/llvm-project/issues/62796.
Previously, we didn't serialize the evaluated result for VarDecl. This
caused the compilation of template metaprogramming become slower than
expect. This patch fixes the issue.
A step to address https://github.com/llvm/llvm-project/issues/62707.
It is not user friendly enough to drop the implicitly generated path
directly. Let's emit the warning first and drop it in the next version.
In 9c254184 `ASTWriter` stopped writing identifiers that are not interesting. Taking it a bit further, we don't need to sort the whole identifier table, just the interesting identifiers. This reduces the size of sorted vector from ~10k (including lots of builtins) to 2 (`__VA_ARGS__` and `__VA_OPT__`) in a typical Xcode project, improving `clang-scan-deps` performance.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D150494
This patch adds missing copy/move assignment operator to the class which has user-defined copy/move constructor.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D149718
ASTReader after we start writing
This is intended to mitigate
https://github.com/llvm/llvm-project/issues/61447.
Before the patch, it takes 5s to compile test.cppm in the above
reproducer. After the patch it takes 3s to compile it. Although this
patch didn't solve the problem completely, it should mitigate the
problem for sure. Noted that the behavior of the patch is consistent
with the comment of the originally empty function
ASTReader::finalizeForWriting. So the change should be consistent with
the original design.
Clang currently updates the mtime of .timestamp files on each load of the corresponding .pcm file. This is not necessary. In a given build session, Clang only needs to write the .timestamp file once, when we first validate the input files. This patch makes it so that we only touch the .timestamp file when it's older than the build session, alleviating some filesystem contention in clang-scan-deps.
Reviewed By: benlangmuir
Differential Revision: https://reviews.llvm.org/D149802
Avoid inferring new submodules for headers in ASTWriter's collection of
affecting modulemap files, since we don't want to pick up dependencies
that didn't actually exist during parsing.
rdar://109112624
Differential Revision: https://reviews.llvm.org/D150151
Close https://github.com/llvm/llvm-project/issues/62269
Currently, the compiler will emit errors when we compile C++20 modules
if the referenced files changed or got removed. This is because we reuse
the existing logic from Clang implicit modules. It is helpful for clang
implicit modules since it is implicit and we want to be sure things
don't go wrong. But it is not necessary for C++20 modules. The C++20
modules is explicit and it is build systems' responsibility to maintain
the dependencies. So the check in the compiler side may be an overkill.
Reported by Coverity:
1. Inside "ASTReader.cpp" file, in clang::ASTReader::FindExternalLexicalDecls(clang::DeclContext const *, llvm::function_ref<bool (clang::Decl::Kind)>, llvm::SmallVectorImpl<clang::Decl *> &): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type pair.
2. Inside "ASTReader.cpp" file, in clang::ASTReader::ReadAST(llvm::StringRef, clang::serialization::ModuleKind, clang::SourceLocation, unsigned int, llvm::SmallVectorImpl<clang::ASTReader::ImportedSubmodule> *): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type DenseMapPair.
3. Inside "CGOpenMPRuntimeGPU.cpp" file, in clang::CodeGen::CGOpenMPRuntimeGPU::emitGenericVarsEpilog(clang::CodeGen::CodeGenFunction &, bool): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type pair.
4. Inside "ASTWriter.cpp" file, in clang::ASTWriter::WriteHeaderSearch(clang::HeaderSearch const &): Using the auto keyword without an & causes a copy.
auto_causes_copy: Using the auto keyword without an & causes the copy of an object of type UnresolvedHeaderDirective.
Reviewed By: tahonermann
Differential Revision: https://reviews.llvm.org/D149461
We have no use for debug info for the scanner modules, and writing raw
ast files speeds up scanning ~15% in some cases. Note that the compile
commands produced by the scanner will still build the obj format (if
requested), and the scanner can *read* obj format pcms, e.g. from a PCH.
rdar://108807592
Differential Revision: https://reviews.llvm.org/D149693
Fix several problems related to serialization causing command line
defines to be reported as being built-in defines:
* When serializing the <built-in> and <command line> files don't
convert them into absolute paths.
* When deserializing SM_SLOC_BUFFER_ENTRY we need to call
setHasLineDirectives in the same way as we do for
SM_SLOC_FILE_ENTRY.
* When created suggested predefines based on the current command line
options we need to add line markers in the same way that
InitializePreprocessor does.
* Adjust a place in clangd where it was implicitly relying on command
line defines being treated as builtin.
Differential Revision: https://reviews.llvm.org/D144651
In file `clang/lib/Basic/Module.cpp` the `Module` class had `submodule_begin()` and `submodule_end()` functions to retrieve corresponding iterators for private vector of Modules. This commit removes mentioned functions, and replaces all of theirs usages with `submodules()` function and range-based for-loops.
Differential Revision: https://reviews.llvm.org/D148954
This reverts commit 67b298f6d8.
We got linker errors with undefined symbols during a compiler release
and tracked it down to this change. I am in the process of understanding
what is happening and getting a reproducer.
Sorry for reverting this again.
I will reopen#61065 until we fix this.
We only want to make PCH imports visible once for the the TU, not
repeatedly after every subsequent import. This causes some incorrect
behaviour with submodule visibility, and causes us to get extra module
dependencies in the scanner. So far I have only seen obviously incorrect
behaviour when building with -fmodule-name to cause a submodule to be
textually included when using the PCH, though the old behaviour seems
wrong regardless.
rdar://107449644
Differential Revision: https://reviews.llvm.org/D148176
The "getField" method is a bit confusing considering we also have a
"getFieldName" method. Instead, use "getFieldDecl" rather than
"getField".
Differential Revision: https://reviews.llvm.org/D147743