AArch64 uses $d and $x symbols to delimit data embedded in code.
However, sometimes we see $d symbols, typically in .eh_frame, with
addresses that belong to different sections. These occasionally fall
inside .text functions and cause BOLT to stop disassembling, which in
turn causes DWARF CFA processing to fail.
As a workaround, we just ignore symbols with addresses outside the
section they belong to. This behaviour is consistent with objdump and
similar tools.
This paper added case ranges in switch statements, which is a GNU
extension Clang has supported since at least Clang 3.0.
It updates the diagnostics to no longer call this a GNU extension except
in C++ mode.
CopyHints set has been collecting duplicates of a register with
increasing weight and then deduplicated with HintedRegs set. Let's stop
collecting duplicates at the first place.
A urem recurrence has the property that the result can never exceed the
start value. A udiv recurrence has the property that the result can
never exceed either the start value or the numerator, whichever is
greater. Implement a simplification based on these properties.
The MVE VADC and VSBC instructions read and write a carry bit in FPSCR,
which is exposed through the intrinsics. This makes it possible to write
code which has the FPSCR live across a function call, or which uses the
same value twice, so it needs to be possible to spill and reload it.
There is a missed optimisation in one of the test cases, where we reload
the FPSCR from the stack despite it still being live, I've not found a
simple way to prevent the register allocator from doing this.
In DWARF 4 and earlier `static const` members of structs, classes and
unions have an entry tag `DW_TAG_member`, and are also tagged as
`DW_AT_declaration`, but otherwise follow the same rules as
`DW_TAG_variable`.
Claiming AArch64 support for llvm-exegesis is a bit of a stretch in my
opinion as only a couple of opcodes with GPR64 operands will work for
snippet benchmarking, so I propose to clarify that AArch64 support is
very experimental. Also added some clarifications about its libpfm4
dependency.
This patch adds NVVM intrinsics and NVPTX codegen for:
* cp.async.bulk.tensor.S2G.1D -> 5D variants, supporting both Tile and
Im2Col modes. These intrinsics optionally support cache_hints as
indicated by the boolean flag argument.
* cp.async.bulk.tensor.G2S.1D -> 5D variants, with support for both Tile
and Im2Col modes. The Im2Col variants have an extra set of offsets as
parameters. These intrinsics optionally support multicast and cache_hints,
as indicated by the boolean arguments at the end of the intrinsics.
* The backend looks through these flag arguments and lowers to the
appropriate PTX instruction.
* Lit tests are added for all combinations of these intrinsics in
cp-async-bulk-tensor-g2s/s2g.ll.
* The generated PTX is verified with a 12.3 ptxas executable.
* Added docs for these intrinsics in NVPTXUsage.rst file.
* PTX Spec reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-cp-async-bulk-tensor
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Using LazyValueInfo, it is possible to compute valuable information for
allocation functions, GEP and alloca, even in the presence of dynamic
information.
llvm.objectsize plays an important role in _FORTIFY_SOURCE definitions,
so improving its diagnostic in turns improves the security of compiled
application.
As a side note, as a result of recent optimization improvements, clang
no longer passes
https://github.com/serge-sans-paille/builtin_object_size-test-suite This
commit restores the situation and greatly improves the scope of code
handled by the static version of __builtin_object_size.
As defined in LangRef, branching on `undef` is undefined behavior.
This PR aims to remove undefined behavior from tests. As UB tests break
Alive2 and may be the root cause of breaking future optimizations.
Here's an Alive2 proof for one of the examples:
https://alive2.llvm.org/ce/z/TncxhP
In 50e5411e4, we preserved the pack substitution index within
SubstTemplateTypeParmType nodes and performed in-place expansions of
packs such that type constraints on a lambda that serve as a pattern of
a fold expression could be evaluated if the type constraints contain any
packs that are expanded by the fold expression.
However, we made an incorrect assumption of the condition under which
in-place expansion should occur. For example, a SizeOfPackExpr case
relies on SubstTemplateTypeParmType nodes being transformed to
SubstTemplateTypeParmPackTypes rather than expanding them immediately in
place.
This fixes that by adding a flag to SubstTemplateTypeParmType to
discriminate such in-place expansion situations.
Fixes https://github.com/llvm/llvm-project/issues/113518
This is also split off from the zvfhmin/zvfbfmin
isLegalElementTypeForRVV work.
Enabling this will cause SLP and RISCVGatherScatterLowering to emit
@llvm.experimental.vp.strided.{load,store} intrinsics, and codegen
support for this was added in #109387 and #114750.
This patch adds support for re-scheduling already scheduled
instructions. For now this will clear and rebuild the DAG, and will
reschedule the code using the new DAG.
Fixes a copy-paste typo.
The typo resulted in producing bad v_perm based operands for the v_dot4
combine. When adding a corresponding byte pair to the v_dot byte pair
chains, we must take note of the byte position in the corresponding
source nodes. These byte positions are used to ensure we extract the
correct DWord from the ultimate source, and formulate a correct
perm_mask from the extracted DWord.
With the typo, we the S0 byte would used the DWord offset for the
corresponding S1 byte. If this offset was not the same as the true DWord
offset for the S0 byte, we would extract and use the wrong byte for S0
in the v_dot.
Fixes https://github.com/llvm/llvm-project/issues/112941
This is another piece split off from the work to add zvfhmin/zvfbfmin to
isLegalElementTypeForRVV.
This is needed to get InterleavedAccessPass to lower [de]interleaves to
segment load/stores.
This PR adds a folder for extracting an element from a vector shuffle.
It turns something like:
```
%shuffle = vector.shuffle %a, %b [0, 8, 7, 15]
: vector<8xf32>, vector<8xf32>
%extract = vector.extract %shuffle[3] : f32 from vector<4xf32>
```
into:
```
%extract = vector.extract %b[7] : f32 from vector<8xf32>
```
This patch fixes:
third-party/unittest/googletest/include/gtest/gtest.h:1379:11:
error: comparison of integers of different signs: 'const unsigned
int' and 'const int' [-Werror,-Wsign-compare]