Erasing/replacing an op, which is also the current insertion point,
invalidates the insertion point. Explicitly set the insertion point, so
that `copy` does not crash after the One-Shot Dialect Conversion
refactoring. (`ConversionPatternRewriter` will start behaving more like
a "normal" rewriter.)
Fold icmp between a chain of geps and its base pointer. Previously only
a single gep was supported.
This will be extended to handle the case of two gep chains with a common
base in a followup.
This helps to avoid regressions after #137297.
Just like 1d-tensors, reshapes of 0d-tensors (aka scalars) are always
no-folds as they only have one possible layout. This PR adds logic to
the `fold` implementation to optimize these away as is currently
implemented for 1d tensors.
This reverts commit 1108cf6419.
Caused a regression for a weird but interesting case (STT_SECTION symbol
as group signature). We no longer define `sec`
```
.section sec,"ax"
.section .foo,"axG",@progbits,sec
nop
```
Fix#146581
linalg promotion attempts to compute a constant upper bound for the
allocated buffer size. Only when failed to compute an upperbound it
fallbacks to the original subview size, which may be dynamic.
Adding a promotion option to use the original subview size by default,
thus minimizing the allocation size.
Fixes#144268.
Insn is passed to decodeInstruction which is a template function based
on the type of Insn. By using uint64_t we ensure only one version of
decodeInstruction is created. This reduces the file size of
RISCVDisassembler.cpp.o by ~25% in my local build.
This commit fixes using inline assembly with v128 results. Previously
this failed with an internal assertion about a failure to legalize a
`CopyFromReg` where the source register was typed `v8f16`. It looks like
the type used for the destination register was whatever was listed first
in the `def V128 : WebAssemblyRegClass` listing, so the types were
shuffled around to have a default-supported type.
A small test was added as well which failed to generate previously and
should now pass in generation. This test passed on LLVM 18 additionally
and regressed by accident in #93228 which was first included in LLVM 19.
After #145996 CLANG_RESOURCE_DIR can be an absolute path so we need to
handle it correctly in the driver.
llvm::sys::path::append does not append absolute paths in the way
that I expected (or consistent with other similar APIs such as C++17
std::filesystem::path::append or Python os.path.join); instead, it
effectively discards the leading / and appends the resulting relative path
(e.g. append(P, "/bar") with P = "/foo" sets P to "/foo/bar").
Many tests start failing if I try to align llvm::sys::path::append with
the other APIs because of callers that expect the existing behavior,
so for now let's add a special case here for absolute resource paths,
and document the behavior in Path.h.
Reviewers: MaskRay
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/146449
Currently AffineExpr Add has ability to optimize `"s1 + (s1 // c * -c)"
to "s1 % c"`,
but can not optimize `"(s0 + s1) + (s1 // c * -c)"`.
This patch provide an opportunity to do this simplification, let it can
be simplified to `"s0 + s1 % c"`.
A decorator to skip or XFAIL a test takes effect when the function
that's passed in returns a reason string. The wrappers around
hw_breakpoints_supported were doing that incorrectly by inverting
(calling `not`) on the result, turning it into a boolean, which means
the test is always skipped.
This will enable removal of a hack from the wasm backend
in a future change.
This feels unnecessarily clunky. I would assume something was
automatically parsing this and propagating it in the C++ case,
but I can't seem to find it. In particular it feels wrong that
I need to parse out the individual values, given they are listed
in the options.td file. We should also be parsing and forwarding
every flag that corresponds to something else in TargetOptions,
which requires auditing.
In OpenCL Extended Instruction Set Specification, nancode can be signed
integer or vector of signed integers values.
This PR has no change to amdgcn--amdhsa.bc and nvptx64--nvidiacl.bc
because the newly added clc functions are not used in OpenCL library.
We had two classes named `PipeTest`: one in `PipeTestUtilities.h` and
one in `PipeTest.cpp`. The latter was unintentionally using the wrong
class (from the header) which didn't initialize the HostInfo subsystem.
This resulted in a crash due to a nullptr dereference (`g_fields`) when
`PipePosix::CreateWithUniqueName` called `HostInfoBase::GetProcessTempDir`.
Fixes https://github.com/llvm/llvm-project/issues/50142, a miss of
further vectorization, where we can only achieve zext (xor (any_true),
-1).
Now in test case simd-setcc-reductions, it's converted to all_true.
Also fixes https://github.com/llvm/llvm-project/issues/145177, which is
all_true (setcc x, 0, eq) -> not any_true
any_true (setcc x, 0, ne) -> any_true
all_true (setcc x, 0, ne) -> all_true
---------
Co-authored-by: badumbatish <--show-origin>
[WebAssembly] [Backend] Wasm optimize illegal bitmask for #131980.
Currently, the case for illegal bitmask (v32i8 or v64i8) is that at the
SelectionDag level, two (four) vectors of v128 will be concatenated
together, then they'll all be SETCC by the same pseudo illegal
instruction, which requires expansion later on.
I opt for SETCC-ing them seperately, bitcast and zext them and then add
them up together in the end.
---------
Co-authored-by: badumbatish <--show-origin>
After 901e1390c9 (#127770), the DAG
combine would transform `fma(x, 0.0, 1.0)` into `1.0` if
`-fp-contract=fast` was enabled, in addition to when 'x' is marked
nnan/ninf.
It's only valid in the latter case, not the former, so delete the extra
condition.
This change continues rewriting and cleanup around DAG ISel for
formal-arguments, return values, and function calls. This causes some
incidental changes, mostly to instruction ordering and register naming
but also a couple improvements caused by using scalar types earlier in
the lowering.
This PR adds a `build()` constructor for `ConstantIntOp` that takes in
an `APInt`.
Creating an `arith` constant value with an `APInt` currently requires a
structure like the following:
```c
b.create<arith::ConstantOp>(IntegerAttr::get(apintValue, 5));
```
In comparison, the`ConstantFloatOp` already has an `APFloat` constructor
which allows for the following:
```c
b.create<arith::ConstantFloatOp>(floatType, apfloatValue);
```
Thus, intuitively, it makes sense that a similar `ConstantIntOp`
constructor is made for `APInts` like so:
```c
b.create<arith::ConstantIntOp>(intType, apintValue);
```
Depends on https://github.com/llvm/llvm-project/pull/144636
VPReductionPHIRecipe doesn't rely on the underlying phi any longer,
allow empty underlying values when cloning. NFC at the moment but will
enable follow-up patches.
Patched and tested the `AAInvariantLoadPointer` attributor from #141800,
which identifies pointers whose loads are eligible to be marked as
`!invariant.load`.
The bug in the attributor was due to `AAMemoryBehavior` always
identifying pointers obtained from `alloca`s as having no writes. I'm
not entirely sure why `AAMemoryBehavior` behaves this way, but it seems
to be beceause it identifies the scope of an `alloca` to be limited to
only that instruction (and, certainly, no memory writes occur within the
`alloca` instructin). This patch just adds a check to disallow all loads
from `alloca` pointers from being marked `!invariant.load` (since any
well-defined program will have to write to stack pointers at some
point).
The issue is triggered by
ee070d0816
that checks `TensorLikeType` when downstream projects use the pattern
without registering bufferization::BufferizationDialect. The
registration is needed because the interface implementation for builtin
types locate at `BufferizationDialect::initialize()`. However, we do not
need to fix it by the registration. The proper fix is using the linalg
method, i.e., hasPureTensorSemantics.
No additional tests are added because the functionality is well tested
in
[transpose-matmul.mlir](https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Linalg/transpose-matmul.mlir).
To reproduce the issue, it requires a different setup, e.g., writing a
new C++ pass, which seems not worth it.
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Optimizations:
- Only build the skeleton for each identifier once, rather than once for
each declaration of that identifier.
- Only compute the contexts in which identifiers are declared for
identifiers that have the same skeleton as another identifier in the
translation unit.
- Only compare pairs of declarations that are declared in related
contexts, rather than comparing all pairs of declarations with the same
skeleton.
Also simplify by removing the caching of enclosing `DeclContext` sets,
because with the above changes we don't even compute the enclosing
`DeclContext` sets in common cases. Instead, we terminate the traversal
to enclosing `DeclContext`s immediately if we've already found another
declaration in that context with the same identifier. (This optimization
is not currently applied to the `forallBases` traversal, but could be
applied there too if needed.)
This also fixes two bugs that together caused the check to fail to find
some of the issues it was looking for:
- The old check skipped comparisons of declarations from different
contexts unless both declarations were type template parameters. This
caused the checker to not warn on some instances of the CVE it is
intended to detect.
- The old check skipped comparisons of declarations in all base classes
other than the first one found by the traversal. This appears to be an
oversight, incorrectly returning `false` rather than `true` from the
`forallBases` callback, which terminates traversal.
This also fixes an issue where the check would have false positives for
template parameters and function parameters in some cases, because those
parameters sometimes have a parent `DeclContext` that is the parent of
the parameterized entity, or sometimes is the translation unit. In
either case, this would cause warnings about declarations that are never
visible together in any scope.
This decreases the runtime of this check, especially in the common case
where there are few or no skeletons with two or more different
identifiers. Running this check over LLVM, clang, and clang-tidy, the
wall time for the check as reported by clang-tidy's internal profiler is
reduced from 5202.86s to 3900.90s.
This patch adds a new recipe to combine multiple recipes into an
'expression' recipe, which should be considered as single entity for
cost-modeling and transforms. The recipe needs to be 'decomposed', i.e.
replaced by its individual recipes before execute.
This subsumes VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe and should make it easier to extend to
include more types of bundled patterns, like e.g. extends folded into
loads or various arithmetic instructions, if supported by the target.
It allows avoiding re-creating the original recipes when converting to
concrete recipes, together with removing the need to record various
information. The current version of the patch still retains the original
printing matching VPExtendedReductionRecipe and
VPMulAccumulateReductionRecipe, but this specialized print could be
replaced with printing the bundled recipes directly.
PR: https://github.com/llvm/llvm-project/pull/144281
There are a handful of passes in PassRegistry.def with outdated or
missing pass options. These strings describing pass options are used for
the printPassNames() function only, which is likely why they have gotten
out-of-date without being caught. This MR simply changes the few passes
where the option string is out-of-date, fixing the output of
-print-passes. This does not affect functionality of the pipeline
parser, and is hard to verify in a unit test, so no tests were added.