This is to enable `CooperativeMatrixType` to be used with
`DenseElementsAttr`, so that a `spirv.Constant` can be easily built from
`OpConstantComposite`. For example:
```mlir
%cst = spirv.Constant dense<0.000000e+00> : !spirv.coopmatrix<1x1xf32, Subgroup, MatrixAcc>
```
Constraints of arithmetic operations are changed, as
`SameOperandsAndResultType` can no longer fully verify CoopMatrices.
This is because for shaped types the verifier only checks element type
and shapes, whereas for any other arbitrary type it looks for an exact
match.
This patch does not enable the actual deserialization. This will be
done in a subsequent PR.
According to the OpenMP standard, variables with static storage duration
are predetermined as shared.
Add a check when creating implicit symbols for OpenMP to fix them
erroneously getting set to firstprivate.
Fixes llvm#140732.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
This PR adds the WebKit checker support for
[[clang::annotate_type("webkit.pointerconversion")]].
When this attribute is set on the return value of a function, the
function is treated as safe to call anywhere and the return value's
pointer origin is the argument.`
It's safe for a member function of a class or struct to call a function
or allocate a local variable with a pointer or a reference to a member
variable since "this" pointer, and therefore all its members, will be
kept alive by its caller so recognize as such.
Those tests set nvptx64 in IR but doesn't require the target. The optimization
now needs TTI such that if nvptx is not registered, it just uses whatever
default target is, which will cause the check lines mismatch.
- try_emplace(Key) is shorter than insert({Key, nullptr}).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.
We have recently added the partial_reduce_smla and partial_reduce_umla
nodes to represent Acc += ext(b) * ext(b) where the two extends have to
have the same source type, and have the same extend kind.
For riscv64 w/zvqdotq, we have the vqdot and vqdotu instructions which
correspond to the existing nodes, but we also have vqdotsu which
represents the case where the two extends are sign and zero respective
(i.e. not the same type of extend).
This patch adds a partial_reduce_sumla node which has sign extension for
A, and zero extension for B. The addition is somewhat mechanical.
[KeyInstr] MDNodeKeyImpl<DILocation> skip zero values for hash
Hashing AtomGroup and AtomRank substantially impacts performance whether Key
Instructions is enabled or not. We can't detect whether it's enabled here
cheaply; avoiding hashing zero values is a good approximation. This affects
Key Instruction builds too, but any potential costs incurred by messing with
the hash distribution (hash_combine(x) != hash_combine(x, 0)) appear to still
be massively outweighed by the overall compile time savings by performing this
check.
See PR for compile-time-tracker numbers.
`AArch64Subtarget.h` uses the complete type of `Triple`, but had only
forward declared the class, which happend to be included through the
following bottom-up path for example: "llvm/TargetParser/Triple.h"
"llvm/MC/MCSubtargetInfo.h"
"llvm/CodeGen/TargetSubtargetInfo.h"
Undo the vectorcombine canonicalisation as SSE has awful vXi8 shift
support, but can easily splat the MSB using the PCMPGTB(0,x) trick.
Alternative to #143106 which could cause infinite loops between srl/sra
conversions
Fixes#130549
I think that the issue width for neoverse-v2 CPUs is set too
high and does not properly reflect the dispatch constraints.
I tested various values of IssueWidth (16, 8 and 6) with runs
of SPEC2017 on a neoverse-v2 machine and I got the highest
overall geomean score with an issue width of 6, although it's
only a marginal 0.14% improvement. I also observed a 1-2%
improvement when testing the Gromacs application with some
workloads. Here are some notable changes in SPEC2017 ref
runtimes, i.e. has a ~0.5% change or greater ('-' means
faster):
548.exchange2: -1.7%
510.parest: -0.78%
538.imagick: -0.73%
500.perlbench: -0.57%
525.x264: -0.55%
507.cactuBSSN: -0.5%
520.omnetpp: -0.48%
511.povray: +0.57%
544.nab: +0.65%
503.bwaves: +0.68%
526.blender: +0.75%
If this patch causes any major regressions post-commit it can
be easily reverted, but I think it should be an overall
improvement.
This adds another puzzle piece for the support of OpenMP DECLARE
REDUCTION functionality.
This adds support for operators with derived types, as well as declaring
multiple different types with the same name or operator.
A new detail class for UserReductionDetials is introduced to hold the
list of types supported for a given reduction declaration.
Tests for parsing and symbol generation added.
Declare reduction is still not supported to lowering, it will generate a
"Not yet implemented" fatal error.
Fixes#141306Fixes#97241Fixes#92832Fixes#66453
---------
Co-authored-by: Mats Petersson <mats.petersson@arm.com>
When testing LLDB, we want to make sure to use the same Python as the
one we used to build it.
This patch used the CMake variable `Python3_ROOT_DIR` to set the
`PYTHONHOME` env variable in LLDB lit tests, in order to ensure of this.
Please see https://github.com/swiftlang/swift/pull/82063 for the
original issue.
This patch should be mostly obvious, but in one place, this patch
changes:
const auto &it = std::find(...)
to:
auto it = llvm::find(...)
We do not need to bind to a temporary with const ref.
Different object file formats support DWARF sections (COFF, ELF, MachO,
PE/COFF, WASM). COFF and PE/COFF only matched a subset. This caused some
GCC executables produced on MinGW to have issue later on when debugging.
One example is that `.debug_rnglists` was not matched, which caused
range-extraction to fail when printing a backtrace.
This unifies the parsing of section names in
`ObjectFile::GetDWARFSectionTypeFromName`, so all file formats can use
the same naming convention. Since the prefixes are different,
`GetDWARFSectionTypeFromName` only matches the suffixes (i.e. `.debug_`
needs to be stripped before).
I added two tests to ensure the sections are correctly identified on
Windows executables.
Previously PatLeafs could not be imported, causing the following
warnings to be emitted when running tblgen with
`-warn-on-skipped-patterns:`
```
/work/clean/llvm/lib/Target/AArch64/AArch64InstrInfo.td:2631:1: warning: Skipped pattern: Src pattern child has unsupported predicate
def : Pat<(i64 (mul top32Zero:$Rn, top32Zero:$Rm)),
^
```
These changes allow the patterns to now be imported successfully.
When the original predicate is ordered and both operands are non-NaN,
`Ordered` should be set to true. This variable still matters even if
both operands are non-NaN because FMF only applies to select, not fcmp.
Closes https://github.com/llvm/llvm-project/issues/143123.
We currently combine (AES (EOR (A, B)), 0) into (AES A, B) for Neon
intrinsics when the zero operand appears in the RHS of the AES
instruction.
This patch extends the combine to support AES SVE intrinsics and
the case where the zero operand appears in the LHS of the AES
instructions.
Some shuffles containing undefs can match multiple instructions, such as
<3,u,u,u> being either a duplane or a rev. This changes the order that
different shuffles are considered, so that duplane is preferred which is
simpler and more likely to lead to further combines.
Without this patch, we implement 4-byte and 8-byte popcount as
structs.
This patch replaces the template trick with constexpr if, putting the
entire logic in the body of popcount.
The LoongArch psABI recently added __bf16 type support. Now we can
enable this new type in clang.
Currently, bf16 operations are automatically supported by promoting to
float. This patch adds bf16 support by ensuring that load extension /
truncate store operations are properly expanded.
And this commit implements support for bf16 truncate/extend on hard FP
targets. The extend operation is implemented by a shift just as in the
standard legalization. This requires custom lowering of the truncate
libcall on hard float ABIs (the normal libcall code path is used on soft
ABIs).
Parsed branches and fall-throughs are validated in `doBranch` and
`doTrace` respectively. Simplify parseLBRSample by omitting the
validation. This also speeds up perf data processing as checks are only
done once for aggregated branches/fall-throughs and not individual LBR
entries.
Since invalid/external addresses are no longer sanitized during parsing,
sanitize them in `doBranch`.
Test Plan: updated X86/pre-aggregated-perf.test
Aggregated branch data has two containers: `Data` for local branches,
and `EntryData` for external branches. Fix the omission and sort
`EntryData` to ensure stable output fdata profiles.
Test Plan: updated pre-aggregated-perf.test
- try_emplace(Key) is shorter than insert(std::make_pair(Key, 0)).
- try_emplace performs value initialization without value parameters.
- We overwrite values on successful insertion anyway.