This covers more options for CSE. It also ensures that two operations
that have same operands but different regions to begin with, but same
regions after `simplifyRegions`, don't get both added to the list of
`knownValues`.
Fixes#59135
Differential Revision: https://reviews.llvm.org/D139490
The goal is to make the naming of the future `_extended` ops more
consistent. With unsigned addition, the carry value/flag and overflow
bit are the same, but this is not true when it comes to signed addition.
Also rename the second result from `carry` to `overflow`.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D139569
At the moment, they are a part of EmptyOp::getCanonicalizationPatterns. When
extract_slice(tensor.empty) is rewritten as a new tensor.empty, it could
happen that we end up with two tensor.empty ops, since the original
tensor.empty can have two users. After bufferization such cases result in two
allocations.
Differential Revision: https://reviews.llvm.org/D139308
The only way to do this with the current hoisting strategy is by
lowering Affine to Scf first, but that prevents further passes on
Affine.
Differential Revision: https://reviews.llvm.org/D137600
Implementation assumed a i32 accumulator. Fixed the implementation to
work with an i32 accumulator.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D139365
This adds a tile_size parameter, when it is used the tiles are
cyclically distributed onto the threads of the scf.foreach_thread op.
Differential Revision: https://reviews.llvm.org/D139474
When running in parallel, nesting more than once caused
statistics to be dropped.
Fix by also preparing "async" pass managers before merging,
as they may also have "async" pass managers within.
Add test checking reported statistics have expected values
with and without threading enabled.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139459
The new function is a wrapper around the regular `getStridesAndOffset`
that offers a more compact way (as in writing less code) of getting the
relevant information.
This method is intended to be used only when it is known that the
LogicalResult of the regular `getStridesAndOffset` must be "succeeded".
This warpper will assert on that.
Differential Revision: https://reviews.llvm.org/D139529
D138330 updated the deprecated `getMemorySpaceAsInt` uses to `getMemorySpace`. There are few uses that were missed.
Differential Revision: https://reviews.llvm.org/D139526
This covers `SDot`, `SUDot`, and `UDot`. The `*AccSat` version will be
added in a follow-up revision.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D139242
It introduces a pattern that swaps `linalg.generic + tensor.pack` to
`tensor.pack + linalg.generic`. It requires all the iteration types
being parallel; the indexing map of output operand is identiy. They can
all be relaxed in the future.
The user can decide whether the propagation should be applied or not by
passing a control function.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D138882
This revision adapts the pattern in LinAlg to work on DPS interface, and
adds it to canonicalization patterns of tensor dialect. The
InsertSliceOp is skipped in the pattern because it has its own logic
about folding tensor.cast ops.
Reviewed By: pifon2a
Differential Revision: https://reviews.llvm.org/D139375
The only way to do this with the current hoisting strategy is by
lowering Affine to Scf first, but that prevents further passes on
Affine.
Differential Revision: https://reviews.llvm.org/D137600
This patch adds `replaceAllUsesExcept` to the rewriter class.
The implementation is copy-pasted from Value + calling
`updateRootInPlace` to notify the listeners about the
corresponding IR changes.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D139382
Since tosa.pad is lowered strictly to artih and tensor ops, move
ConvertPad from TosaToLinalg to TosaToTensor, benefitting non-Linalg
Tosa targets. TensorToLinalg exists, and is trivial, so nothing is lost.
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D139091
The pass was not checking for uninitialized states due to dead code.
This patch also makes LLVMFuncOp correctly return a null body when it is
external.
Fixes#58807
Depends on D139388
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139389
This change cleans up the conversion pass re the "dim"-vs-"lvl" and "sizes"-vs-"shape" distinctions of the runtime. A quick synopsis includes:
* Adds new `SparseTensorStorageBase::getDimSize` method, with `sparseDimSize` wrapper in SparseTensorRuntime.h, and `genDimSizeCall` generator in SparseTensorConversion.cpp
* Changes `genLvlSizeCall` to perform no logic, just generate the function call.
* Adds `createOrFold{Dim,Lvl}Call` functions to handle the logic of replacing `gen{Dim,Lvl}SizeCall` with constants whenever possible. The `createOrFoldDimCall` function replaces the old `sizeFromPtrAtDim`.
* Adds `{get,fill}DimSizes` functions for iterating `createOrFoldDimCall` across the whole type. These functions replace the old `sizesFromPtr`.
* Adds `{get,fill}DimShape` functions for lowering a `ShapedType` into constants. These functions replace the old `sizesFromType`.
* Changes the `DimOp` rewrite to do the right thing.
* Changes the `ExpandOp` rewrite to compute the proper expansion size.
Depends On D138365
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D139165
Along the way, make the default pattern fail instead of crashing
when an elementwise op is not supported yet.
Reviewed By: kuhar
Differential Revision: https://reviews.llvm.org/D139280
This patch abstracts sparse tensor memory scheme into a SparseTensorDescriptor class. Previously, the field accesses are performed in a relatively error-prone way, this patch hides the hairy details behind a SparseTensorDescriptor class to allow users access sparse tensor fields in a more cohesive way.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D138627
We can compute the offsets and sizes for the slice of input because the
iteration domain is defined over outer loops. If the dimension is tiled,
the i-th index is the product of offset_i and inner_tile_i.
Different from tiling a pad op, we do not have to deal with reading zero
data from input. Because the tiling sizes are indicated to packed outer
dimensions. We will read either the entire tile or partial tile for each
packed tile. The scf.if and tensor.generate ops are not needed in this
context.
Co-authored-by: Lorenzo Chelini <l.chelini@icloud.com>
Reviewed By: rengolin, mravishankar
Differential Revision: https://reviews.llvm.org/D138631
This patch removes the implementation of TypedAttr and ElementsAttr
from DenseArrayAttr and, in doing so, removes the need store a shaped
type. The attribute now stores a size (number of elements), an MLIR type
as a discriminator, and a raw byte array.
The intent of DenseArrayAttr was not to be a drop-in replacement for DenseElementsAttr. It was meant to be a simple container of integers or floats that map to C++ types. The ElementsAttr implementation on DenseArrayAttr had many holes in it, and fixing those holes would require evolving DenseArrayAttr in a way that is incompatible with its original purpose.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D137606
Rounding of tosa.resize did not handle rounding to the nearest pixel correctly.
Rather than dividing the scale by 2 we should double the partial pixel to
guarantee we include a check on the lowest bit.
Reviewed By: NatashaKnk
Differential Revision: https://reviews.llvm.org/D139162
Calculating the position of the region trailing objects isn't free,
given that it's the last trailing object, and inlining the size check
removes the need for users to explicitly add size checks for
micro-optimization.
Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D139021
Incrementing `counter` variable is inside the if statement. If the code does not enter there, the while loop will iterate infinitely. This revision moves the codes outside of if statement.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D139005