This pattern can be often met in Flang generated LLVM IR,
for example, for the counts of the loops generated for array
expressions like: `a(x:x+y)` or `a(x+z:x+z)` or their variations.
In order to compute the loop count, Flang needs to subtract
the lower bound of the array slice from the upper bound
of the array slice. To avoid the sign wraps, it sign extends
the original values (that may be of any user data type)
to `i64`.
This peephole is really helpful in CPU2017/548.exchange2,
where we have multiple following statements like this:
```
block(row+1:row+2, 7:9, i7) = block(row+1:row+2, 7:9, i7) - 10
```
While this is just a 2x3 iterations loop nest, LLVM cannot
figure it out, ending up vectorizing the inner loop really
hard (with a vector epilog and scalar remainder). This, in turn,
causes problems for LSR that ends up creating too many loop-carried
values in the loop containing the above statement, which are then
causing too many spills/reloads.
Alive2: https://alive2.llvm.org/ce/z/gLgfYX
Related to #143219.
This patch updates the following ops to use `result` (instead of `res`)
as the name for their result argument:
* `vector.scalable.insert`
* `vector.scalable.extract`
* `vector.insert_strided_slice`
This change ensures naming consistency with other ops in the `vector`
dialect. It addresses part of:
* https://github.com/llvm/llvm-project/issues/131602
Relands this feature after several fixes:
* Force fake uses to be emitted before musttail calls (#136867)
* Added soften-float legalization for fake uses (#142714)
* Treat fake uses as size-less instructions in a SystemZ assert (#144390)
If further issues with fake uses are found then this may be reverted again,
but all currently-known issues are resolved.
This reverts commit 2dc6e98169.
If X is known non-negative, that's still true if we fold the truncate
to create a smaller zext.
In the i128 tests, SelectionDAGBuilder aggressively truncates the
`zext nneg` to i64 to match `getShiftAmountTy`. If we don't preserve
the `nneg` we can't see that the shift amount argument being `signext`
means we don't need to do any extension
This patch updates the RISC-V SpacemiT X60 scheduling model with latency
values collected from the X60 hardware. The previous values were
empirically derived but were slightly off.
Changes:
- LoadLatency (baseline for load instructions): 5 --> 3 cycles
- Memory operations: unified at 4 cycles
- Atomic loads/stores: 5 --> 8 cycles
- Atomic RMW operations: 5 --> 12 cycles
Hardware-measured values provide more accurate instruction scheduling
for the in-order X60 core. Testing shows NFC across benchmarks except
for 523.xalancbmk_r (known to be noisy).
https://lnt.lukelau.me/db_default/v4/nts/663?compare_to=657
The Arith dialect includes patterns that canonicalize a sequence of:
- trunci(shrui(mul(sext(x), sext(y)), c)) -> mulsi_extended(x, y)
- trunci(shrui(mul(zext(x), zext(y)), c)) -> mului_extended(x, y)
These patterns return the high word of an extended multiplication, which
assumes that the shift amount is equal to the bit width of the original
operands. This check was missing, leading to incorrect canonicalizations
when the shift amount was less than the bit width.
For example, the following code:
```
%x = arith.extui %a: i32 to i33
%y = arith.extui %b: i32 to i33
%m = arith.muli %x, %y: i33
%c1 = arith.constant 1: i33
%sh = arith.shrui %m, %c1 : i33
%hi = arith.trunci %sh: i33 to i32
```
would incorrectly be canonicalized to:
```
_, %hi = arith.mului_extended %a, %b : i32
```
This commit removes the faulty canonicalizations since they are not
believed to be generally beneficial (c.f., the discussion of the
alternative https://github.com/llvm/llvm-project/pull/144787 which fixes
the canonicalizations).
In OpenMP 5.2, the `target enter data` and `target exit data` constructs
now have default map types if the user does not define them in the Map
clause. For `target enter data`, this is `to` and `target exit data`
this is `from`. This behaviour is now enabled when OpenMP 5.2 or greater
is used when compiling. To enable this, the default value is now set in
the `processMap` clause, with any previous behaviour being maintained
for either older versions of OpenMP or other directives.
See also #110008
Summary:
The AMDGPU toolchain uses a different set of tools than the standard
SPIR-V toolchain. The linker wrapper prefers to invoke a linker via a
clang toolchain. To make that work we introduce
`--target=spirv64-amd-amdhsa` so that it creates the linking phases that
HIP prefers. Additionally, this can be used to make LLVM-IR / SPIR-V
from C/C++ that can be linked with the HIP output.
Before this PR, users had to pass the "old" block argument when
replacing the uses of a block argument in a newly converted block. Users
can now pass the actual block argument that should be replaced.
Note for LLVM integration: Make sure to pass the current block argument
instead of the old one.
https://github.com/llvm/llvm-project/pull/141152 causes an issue in
v_s_xxx_f16 lowering in both true16/fake16 flow.
V_S_XXX_F16 are special insts which has scalar input/output but in VALU
VOP3 format. Need to keep the srcmod/clamp/omod when lower it to its
corresponding VALU inst with vector input/output.
1. The PR proceeds with a backend target hook to allow front-ends to
determine what target features are available in a compilation based on
the CPU name.
2. Fix a backend target feature bug that supports HTM for
Power8/9/10/11. However, HTM is only supported on Power8/9 according to
the ISA.
3. All target features that are hardcoded in PPC.cpp can be retrieved
from the backend target feature. I have double-checked that the
hardcoded logic for inferring target features from the CPU in the
frontend(PPC.cpp) is the same as in PPC.td.
The reland patch addressed the comment
https://github.com/llvm/llvm-project/pull/137670#discussion_r2143541120
Some instruction-printing code used under LLVM_DEBUG does not handle CFI
instructions well. While CFI instructions seem to be harmless for the
correctness of the analysis results, they do not convey any useful
information to the analysis either, so skip them early.
Summary:
This is a weird point of divergence that was not updated when the new
driver
switched to using the CUID method, which is also apparently critical
for SPIR-V compilation not failing? Somehow if we don't emit this global
than the `llvm.compiler.used` global uses AS(0) which makes SPIR-V
unhappy, but with this global it's AS(4) which makes it happy. Either
way, this should be fixed.
My goal is to remove the `object_pointer` member on
`ParsedDWARFTypeAttributes` since it's duplicating information that we
retrieve with `GetCXXObjectParameter` anyway. To continue having
coverage for the `DW_AT_object_pointer` code-paths, instead of checking
the
`attrs.object_pointer` I'm now calling `GetCXXObjectParameter` directly.
We could find some very roundabout way to go via the Clang AST to check
that the object parameter was parsed correctly, but that quickly became
quite painful.
Depends on https://github.com/llvm/llvm-project/pull/144876
Update handling of ReductionStartVector in VPlanUnroll for partial
reductions. The new code makes sure all parts are properly set to the
cloned ReductionStartVector.
Fixes a mis-compile reported for
https://github.com/llvm/llvm-project/pull/142290.
This put the onus on the caller to ensure the result type is big enough.
In the unlikely event a cropped result is required then explicitly
truncate a safe value.
The previous approach rewrote the atomic constructs in the AST based on
the REQUIRES ATOMIC_DEFAULT_MEM_ORDER directives. The new approach
checks for incorrect uses of REQUIRED ADMO in the semantic analysis, and
applies it in lowering, eliminating the need for a separate
tree-rewriting procedure.
Implement the detection of authentication instructions whose results can
be inspected by an attacker to know whether authentication succeeded.
As the properties of output registers of authentication instructions are
inspected, add a second set of analysis-related classes to iterate over
the instructions in reverse order.
This commit converts the class DynamicTypePropagation to a very simple
checker family, which has only one checker frontend -- but also supports
enabling the backend ("modeling checker") without the frontend.
As a tangentially related change, this commit adds the backend of
DynamicTypePropagation as a dependency of alpha.core.DynamicTypeChecker
in Checkers.td, because the header comment of DynamicTypeChecker.cpp
claims that it depends on DynamicTypePropagation and the source code
seems to confirm this.
(The lack of this dependency relationship didn't cause problems, because
'core.DynamicTypePropagation' is in the group 'core', so it is
practically always active. However, explicitly declaring the dependency
clarifies the fact that the separate existence of the modeling checker
is warranted.)
I'm trying to call `GetCXXObjectParameter` from unit-tests in a
follow-up patch and taking a `DWARFDIE` instead of `clang::DeclContext`
makes that much simpler. These should be equivalent, since all we're
trying to check is that the parent context is a record type.
Make HashRecognize a non-PassManager analysis that can be called to get
the result on-demand, creating a new getResult() entry-point. The issue
was discovered when attempting to use the analysis to perform a
transform in LoopIdiomRecognize.
We need the full set of ABI options to accurately compute
the full set of libcalls. This partially resolves missing
information required to compute the set of ARM calls.
This patch adds a test that shows incorrect branch weights being set in
function
EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck