Remove the testing for std::optional - it was originally for
llvm::Optional, but now that that doesn't exist and we use
std::optional, testing for that pretty printer should live, wherever the
pretty printer lives, not here in LLVM.
And the PointerIntPair pretty printer bit rotted due to changes in
PointerIntPair, 875391728c.
`rethrow` instruction is a terminator, but when when its DAG is built in
`SelectionDAGBuilder` in a custom routine, it was NOT treated as such.
```ll
rethrow: ; preds = %catch.start
invoke void @llvm.wasm.rethrow() #1 [ "funclet"(token %1) ]
to label %unreachable unwind label %ehcleanup
ehcleanup: ; preds = %rethrow, %catch.dispatch
%tmp = phi i32 [ 10, %catch.dispatch ], [ 20, %rethrow ]
...
```
In this bitcode, because of the `phi`, a `CONST_I32` will be created in
the `rethrow` BB. Without this patch, the DAG for the `rethrow` BB looks
like this:
```
t0: ch,glue = EntryToken
t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
t5: ch = llvm.wasm.rethrow t0, TargetConstant:i32<12161>
t6: ch = TokenFactor t3, t5
t8: ch = br t6, BasicBlock:ch<unreachable 0x562532e43c50>
```
Note that `CopyToReg` and `llvm.wasm.rethrow` don't have dependence so
either can come first in the selected code, which can result in the code
like
```mir
bb.3.rethrow:
RETHROW 0, implicit-def dead $arguments
%9:i32 = CONST_I32 20, implicit-def dead $arguments
BR %bb.6, implicit-def dead $arguments
```
After this patch, `llvm.wasm.rethrow` is treated as a terminator, and
the DAG will look like
```
t0: ch,glue = EntryToken
t3: ch = CopyToReg t0, Register:i32 %9, Constant:i32<20>
t5: ch = llvm.wasm.rethrow t3, TargetConstant:i32<12161>
t7: ch = br t5, BasicBlock:ch<unreachable 0x5555e3d32c70>
```
Note that now `rethrow` takes a token from `CopyToReg`, so `rethrow` has
to come after `CopyToReg`. And the resulting code will be
```mir
bb.3.rethrow:
%9:i32 = CONST_I32 20, implicit-def dead $arguments
RETHROW 0, implicit-def dead $arguments
BR %bb.6, implicit-def dead $arguments
```
I'm not very familiar with the internals of `getRoot` vs.
`getControlRoot`, but other terminator instructions seem to use the
latter, and using it for `rethrow` too worked.
This patch adds a new builtin type for AMDGPU's buffer rsrc data type,
which is effectively an AS 8 pointer. This is needed because we'd like
to expose certain intrinsics to users via builtins which take buffer
rsrc as argument.
# Added/changed options
The following options are **added** to the `statistics dump` command:
* `--targets=bool`: Boolean. Dumps the `targets` section.
* `--modules=bool`: Boolean. Dumps the `modules` section.
When both options are given, the field `moduleIdentifiers` will be
dumped for each target in the `targets` section.
The following options are **changed**:
* `--transcript=bool`: Changed to a boolean. Dumps the `transcript`
section.
# Behavior of `statistics dump` with various options
The behavior is **backward compatible**:
- When no options are provided, `statistics dump` dumps all sections.
- When `--summary` is provided, only dumps the summary info.
**New** behavior:
- `--targets=bool`, `--modules=bool`, `--transcript=bool` overrides the
above "default".
For **example**:
- `statistics dump --modules=false` dumps summary + targets +
transcript. No modules.
- `statistics dump --summary --targets=true --transcript=true` dumps
summary + targets (in summary mode) + transcript.
# Added options into public API
In `SBStatisticsOptions`, add:
* `Set/GetIncludeTargets`
* `Set/GetIncludeModules`
* `Set/GetIncludeTranscript`
**Alternative considered**: Thought about adding
`Set/GetIncludeSections(string sections_spec)`, which receives a
comma-separated list of section names to be included ("targets",
"modules", "transcript"). The **benefit** of this approach is that the
API is more future-proof when it comes to possible adding/changing of
section names. **However**, I feel the section names are likely to
remain unchanged for a while - it's not like we plan to make big changes
to the output of `statistics dump` any time soon. The **downsides** of
this approach are: 1\ the readability of the API is worse (requires
reading doc to understand what string can be accepted), 2\ string input
are more prone to human error (e.g. typo "target" instead of expected
"targets").
# Tests
```
bin/llvm-lit -sv ../external/llvm-project/lldb/test/API/commands/statistics/basic/TestStats.py
```
```
./tools/lldb/unittests/Interpreter/InterpreterTests
```
New test cases have been added to verify:
* Different sections are dumped/not dumped when different
`StatisticsOptions` are given through command line (CLI or
`HandleCommand`; see `test_sections_existence_through_command`) or API
(see `test_sections_existence_through_api`).
* The order in which the options are given in command line does not
matter (see `test_order_of_options_do_not_matter`).
---------
Co-authored-by: Roy Shi <royshi@meta.com>
DenseMap iteration order is not guaranteed to be deterministic.
Without the change,
llvm/test/Transforms/CodeGenPrepare/X86/statepoint-relocate.ll would
fail when `combineHashValue` changes (#95970).
Fixes: dba7329ebb
I'm planning to use IndexedMemProfData in MemProfReader and beyond.
Before I do so, this patch renames the members of IndexedMemProfData
as MemProfData.FrameData is a bit mouthful with "Data" repeated twice.
Note that MemProfReader currently has a trio -- IdToFrame,
CSIdToCallStack, and FunctionProfileData. Replacing them with an
instance of IndexedMemProfData allows us to use the move semantics
from the reader to the writer context. More importantly, treating the
profile data as one package makes the maintenance easier. In the
past, forgetting to update a place dealing with the trio has resulted
in a bug where we totally forgot to emit call stacks into the indexed
profile.
Fixes:
llvm-project/libc/src/__support/OSUtil/linux/fcntl.cpp:63:26: error:
implicit conversion loses integer precision: '__off64_t' (aka 'long
long')
to '__off_t' (aka 'long') [-Werror,-Wshorten-64-to-32]
flk->l_start = flk64.l_start;
~ ~~~~~~^~~~~~~
llvm-project/libc/src/__support/OSUtil/linux/fcntl.cpp:64:24: error:
implicit conversion loses integer precision: '__off64_t' (aka 'long
long')
to '__off_t' (aka 'long') [-Werror,-Wshorten-64-to-32]
flk->l_len = flk64.l_len;
~ ~~~~~~^~~~~
We already have an overflow check, just need the cast to be explicit.
This
warning was observed on the 32b ARM build in overlay mode.
This is the second attempt. When parsing the target attribute
we should be letting cc1 features which don't correspond to
Extensions pass through to avoid errors like the following:
% cat neon.c
__attribute__((target("arch=armv8-a")))
uint64x2_t foo(uint64x2_t a, uint64x2_t b) { return veorq_u64(a, b); }
% clang --target=aarch64-linux-gnu -c neon.c
error: always_inline function 'veorq_u64' requires target feature
'outline-atomics', but would be inlined into function 'foo'
that is compiled without support for 'outline-atomics'
Co-authored-by: Tomas Matheson <Tomas.Matheson@arm.com>
Some of the OpenMP code can change the instruction pointed at by the
insertion point. This leads to an assert in the compiler about
BB->getParent() and IP->getParent() not matching.
The fix is to rebuild the insertionpoint from the block, rather than use
builder.restoreIP.
Also, move some of the alloca generation, rather than skipping back and
forth between insert points (and ensure all the allocas are done before
their users are created).
A simple test, mainly to ensure the minimal reproducer doesn't fail to
compile in the future is also added.
validateRecord ensures that all the values are unique except for
IPVK_IndirectCallTarget and IPVK_VTableTarget. The problem is that we
exclude them in the innermost loop.
This patch pulls the loop invariant out of the loop. While I am at
it, this patch migrates a use of getValueForSite to
getValueArrayForSite.
All of the IEEE_SUPPORT_xxx() intrinsic functions must fold to constant
logical values when they have constant arguments; and since they fold to
.TRUE. for currently support architectures, always fold them. But also
put in the infrastructure whereby a driver can initialize Evaluate's
target information to set some of them to .FALSE. if that becomes
necessary.
Alternative instructions in the Linux kernel may modify control flow in
a function. As such, it is unsafe to optimize functions with alternative
instructions until we properly support CFG alternatives.
Previously, we marked functions with alt instructions before the
emission, but that could be too late if we remove or replace
instructions with alternatives. We could have marked functions as
non-simple immediately after reading .altinstructions, but it's nice to
be able to view functions after CFG is built. Thus assign the non-simple
status after building CFG.
This patch adds a new option for `ilist`, `ilist_parent<ParentTy>`, that
enables storing an additional pointer in the `ilist_node_base` type to a
specified "parent" type, and uses that option for `Instruction`.
This is distinct from the `ilist_node_with_parent` class, despite their
apparent similarities. The `ilist_node_with_parent` class is a subclass
of `ilist_node` that requires its own subclasses to define a `getParent`
method, which is then used by the owning `ilist` for some of its
operations; it is purely an interface declaration. The `ilist_parent`
option on the other hand is concerned with data, adding a parent field
to the `ilist_node_base` class.
Currently, we can use `BasicBlock::iterator` to insert instructions,
_except_ when either the iterator is invalid (`NodePtr=0x0`), or when
the iterator points to a sentinel value (`BasicBlock::end()`). This patch
results in the sentinel value also having a valid pointer to its owning
basic block, which allows us to use iterators for all insertions,
without needing to store or pass an extra `BasicBlock *BB` argument
alongside it.
We should have done this for the f32/f64 case a long time ago. Now that
codegen handles atomicrmw selection for the v2f16/v2bf16 case, start emitting
it instead.
This also does upgrade the behavior to respect a volatile qualified pointer,
which was previously ignored (for the cases that don't have an explicit
volatile argument).
Fixes#62925.
The following code:
```cpp
#include <map>
int main() {
std::map m1 = {std::pair{"foo", 2}, {"bar", 3}}; // guide #2
std::map m2(m1.begin(), m1.end()); // guide #1
}
```
Is rejected by clang, but accepted by both gcc and msvc:
https://godbolt.org/z/6v4fvabb5 .
So basically CTAD with copy-list-initialization is rejected.
Note that this exact code is also used in a cppreference article:
https://en.cppreference.com/w/cpp/container/map/deduction_guides
I checked the C++11 and C++20 standard drafts to see whether suppressing
user conversion is the correct thing to do for user conversions. Based
on the standard I don't think that it is correct.
```
13.3.1.4 Copy-initialization of class by user-defined conversion [over.match.copy]
Under the conditions specified in 8.5, as part of a copy-initialization of an object of class type, a user-defined
conversion can be invoked to convert an initializer expression to the type of the object being initialized.
Overload resolution is used to select the user-defined conversion to be invoked
```
So we could use user defined conversions according to the standard.
```
If a narrowing conversion is required to initialize any of the elements, the
program is ill-formed.
```
We should not do narrowing.
```
In copy-list-initialization, if an explicit constructor is chosen, the initialization is ill-formed.
```
We should not use explicit constructors.
Fixes https://github.com/llvm/llvm-project/issues/95638
The check was `if(unsigned_num >= 0)` which will always be true. The
intent was to check for zero, but the `for` loop inside the `if` was
already doing that.
According to [temp.expl.spec] p2:
> The declaration in an _explicit-specialization_ shall not be an
_export-declaration_. An explicit specialization shall not use a
_storage-class-specifier_ other than `thread_local`.
Clang partially implements this, but a number of issues exist:
1. We don't diagnose class scope explicit specializations of variable
templates with _storage-class-specifiers_, e.g.
```
struct A
{
template<typename T>
static constexpr int x = 0;
template<>
static constexpr int x<void> = 1; // ill-formed, but clang accepts
};
````
2. We incorrectly reject class scope explicit specializations of
variable templates when `static` is not used, e.g.
```
struct A
{
template<typename T>
static constexpr int x = 0;
template<>
constexpr int x<void> = 1; // error: non-static data member cannot be
constexpr; did you intend to make it static?
};
````
3. We don't diagnose dependent class scope explicit specializations of
function templates with storage class specifiers, e.g.
```
template<typename T>
struct A
{
template<typename U>
static void f();
template<>
static void f<int>(); // ill-formed, but clang accepts
};
````
This patch addresses these issues as follows:
- # 1 is fixed by issuing a diagnostic when an explicit
specialization of a variable template has storage class specifier
- # 2 is fixed by considering any non-function declaration with any
template parameter lists at class scope to be a static data member. This
also allows for better error recovery (it's more likely the user
intended to declare a variable template than a "field template").
- # 3 is fixed by checking whether a function template explicit
specialization has a storage class specifier even when the primary
template is not yet known.
One thing to note is that it would be far simpler to diagnose this when
parsing the _decl-specifier-seq_, but such an implementation would
necessitate a refactor of `ParsedTemplateInfo` which I believe to be
outside the scope of this patch.
Implements HLSL availability diagnostics' strict mode.
HLSL availability diagnostics emits errors or warning when unavailable
shader APIs are used. Unavailable shader APIs are APIs that are exposed
in HLSL code but are not available in the target shader stage or shader
model version.
In the strict mode the compiler emits an error when an unavailable API
is found in any function regardless of whether it is reachable from the
shader entry point or not. This mode is enabled by
``-fhlsl-strict-availability``.
See HLSL Availability Diagnostics design doc
[here](https://github.com/llvm/llvm-project/blob/main/clang/docs/HLSL/AvailabilityDiagnostics.rst)
for more details.
Fixes#90096
Global variables that reference themselves alongside a function that is
called indirectly can cause an infinite loop in
`AAGlobalValueInfoFloating`. The recursive reference is continually
pushed back into the workload, causing the attributor to hang
indefinitely.
The `TilingInterface` methods have return values that allow the
interface implementation to return multiple operations, and also return
tiled values explicitly. This is to avoid the assumption that the
interface needs to return a single operation and this operations result
are the expected tiled values. Make the
`PartialReductionOpInterface::tileToPartialReduction` return
`TilingResult` as well for the same reason.
Similarly make the `PartialReductionOpInterface::mergeReductions` also
return a list of generated operations and values to use as replacements.
This is just a refactoring to allow for deprecation of
`linalg::tileReductionUsingForall` with `scf::tileReductionUsingSCF`
method.
The following warning is technically correct, but pretty much useless,
since there aren't any frame variables that we'd expect the debugger to
understand.
> This version of LLDB has no plugin for the language "assembler".
> Inspection of frame variables will be limited.
This message is useful in the general case but should be suppressed for
the "assembler" case.
rdar://92745462