Operation memref.reinterpret_cast was accept input like:
%out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10],
strides: [1]
: memref<?xf32> to memref<10xf32>
A problem arises: while lowering, the true offset of %out is %offset,
but its data type indicates an offset of 0. Permitting this
inconsistency can result in incorrect outcomes, as certain pass might
erroneously extract the offset from the data type of %out.
This patch fixes this by enforcing that the return value's data type
aligns
with the input parameter.
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
`vector.transfer_*` folding and forwarding currently does not take into
account reshaping view-like memref ops (expand and collapse shape),
leading to potentially invalid store folding or value forwarding. This
patch adds tracking for those (and other) view-like ops. It is still
possible to design operations that alias memrefs without being a view
(e.g. memref in the iter_args of an `scf.for`), so these patterns may
still need revisiting in the future.
There are some spurious libraries which can be removed.
I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
Updates the return type of `getNumDynamicDims` and `getNumScalableDims`
from `int64_t` to `size_t`. This is for consistency with other
helpers/methods that return "size" and to reduce the number of
`static_cast`s in various places.
Add a convenience builder that infers the result type of
`memref.reinterpret_cast`.
Note: It is not possible to remove the result type from all builder
overloads because this op currently also allows certain
operand/attribute + result type combinations that do not match. The op
verifier should probably be made stricter, but that's a larger change
that requires additional `memref.cast` ops in some places that build
`reinterpret_cast` ops.
It is possible to have a subview with a fully static size and a type
that matches the source type, but a dynamic offset that may be
different. However, currently the memref dialect folds:
```mlir
func.func @subview_of_static_full_size(
%arg0: memref<16x4xf32, strided<[4, 1], offset: ?>>, %idx: index)
-> memref<16x4xf32, strided<[4, 1], offset: ?>>
{
%0 = memref.subview %arg0[%idx, 0][16, 4][1, 1]
: memref<16x4xf32, strided<[4, 1], offset: ?>>
to memref<16x4xf32, strided<[4, 1], offset: ?>>
return %0 : memref<16x4xf32, strided<[4, 1], offset: ?>>
}
```
To:
```mlir
func.func @subview_of_static_full_size(
%arg0: memref<16x4xf32, strided<[4, 1], offset: ?>>, %arg1: index)
-> memref<16x4xf32, strided<[4, 1], offset: ?>>
{
return %arg0 : memref<16x4xf32, strided<[4, 1], offset: ?>>
}
```
Which drops the dynamic offset from the `subview` op.
This change removes dependencies declared as either 'LINK_LIBS' or
'LINK_COMPONENTS' across several MLIR libraries. The removed
dependencies appear
to be incorrect and may have been required in older versions of the
project.
These dependencies cause many high level dialects to have transitive
dependence on the LLVM dialect and the LLVM 'Core' library
('llvm/lib/IR').
Note that if using the 'Ninja' CMake generator, one can inspect the
dependencies
(including all transitive libraries) of any given MLIR target but using
the command `ninja -C <build dir> -t browse` and navigating to the
library
of interest in a web browser.
- Add support fef memory_space_cast to strided metadata expansion and
narrow type emulation
- Add support for expand_shape to narrow type emulation (like
collapse_shape, it's a noop after linearization) and to
expand-strided-metadata (mirroring the collapse_shape pattern)
- Add support for memref.dealloc to narrow type emulation (it is a
trivial rewrite) and for memref.copy (which is unsupported when it is
used for a layout change but a trivial rewrite otherwise)
`PassManager::run` loads the dependent dialects for each pass into the
current context prior to invoking the individual passes. If the
dependent dialect is already loaded into the context, this should be a
no-op. However, if there are extensions registered in the
`DialectRegistry`, the dependent dialects are unconditionally registered
into the context.
This poses a problem for dynamic pass pipelines, however, because they
will likely be executing while the context is in an immutable state
(because of the parent pass pipeline being run).
To solve this, we'll update the extension registration API on
`DialectRegistry` to require a type ID for each extension that is
registered. Then, instead of unconditionally registered dialects into a
context if extensions are present, we'll check against the extension
type IDs already present in the context's internal `DialectRegistry`.
The context will only be marked as dirty if there are net-new extension
types present in the `DialectRegistry` populated by
`PassManager::getDependentDialects`.
Note: this PR removes the `addExtension` overload that utilizes
`std::function` as the parameter. This is because `std::function` is
copyable and potentially allocates memory for the contained function so
we can't use the function pointer as the unique type ID for the
extension.
Downstream changes required:
- Existing `DialectExtension` subclasses will need a type ID to be
registered for each subclass. More details on how to register a type ID
can be found here:
8b68e06731/mlir/include/mlir/Support/TypeID.h (L30)
- Existing uses of the `std::function` overload of `addExtension` will
need to be refactored into dedicated `DialectExtension` classes with
associated type IDs. The attached `std::function` can either be inlined
into or called directly from `DialectExtension::apply`.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
The `elementPtrs` has changed meaning over time and the name is now
outdated which may be confusing. This PR updates it to a name
representative of current usage.
The `memref.subview` result type inference
(`SubViewOp::inferResultType`) sometimes used to produce a dynamic
offset when a static offset is possible.
When a dynamic value (stride, size, etc.) is multiplied with zero, the
result is always a "static 0". Based on this, the result type inference
implementation can be improved to produce more static type information
in memref types.
This patch adds more precise side effects to the current ops with memory
effects, allowing us to determine which OpOperand/OpResult/BlockArgument
the
operation reads or writes, rather than just recording the reading and
writing
of values. This allows for convenient use of precise side effects to
achieve
analysis and optimization.
Related discussions:
https://discourse.llvm.org/t/rfc-add-operandindex-to-sideeffect-instance/79243
This patch adds adds patterns to fold memref alias for
expand_shape/collapse_shape feeding into vector.load/vector.store and
vector.maskedload/vector.maskedstore
In some cases (see https://github.com/iree-org/iree/issues/16285),
`memref.subview` ops can't be folded into transfer ops and sub-byte type
emulation fails. This issue has been blocking a few things, including
the enablement of vector flattening transformations
(https://github.com/iree-org/iree/pull/16456). This PR extends the
existing sub-byte type emulation support of `memref.subview` to handle
multi-dimensional subviews with dynamic offsets and addresses the issues
for some of the `memref.subview` cases that can't be folded.
Co-authored-by: Diego Caballero <diegocaballero@google.com>
Implement folding and rewrite logic to eliminate no-op tensor and memref
operations. This handles two specific cases:
1. tensor.insert_slice operations where the size of the inserted slice
is known to be 0.
2. memref.copy operations where either the source or target memrefs are
known to be emtpy.
Co-authored-by: Spenser Bauman <sabauma@fastmail>
This allows `TransferOptimization` to eliminate and forward stores that
are to trivial aliases (rather than just to identical memref values).
A trivial aliases is (currently) defined as:
1. A `memref.cast`
2. A `memref.subview` with a zero offset and unit strides
3. A chain of 1 and 2
Previously this was only populated in the create method later. This
resolves some of invalid builder paths. This may also be sufficient that
type inference functions no longer have to consider whether property
conversion has happened (but haven't verified that yet).
This also makes Attributes corresponding to Properties as optional
inside the set from attributes method. Today that is in effect what
happens with Property value initialization and folks use it to define
custom C++ types whose default initialization is what they want. This is
the behavior users get if they use properties directly. Propagating
Attributes without allowing partial setting would require iterating over
the dictionary attribute considering the properties of the op type that
will be created. This could also have been an additional method
generated or optional behavior on the set method. But doing it
consistently seems better. In terms of whats lost, it doesn't seem like
anything compared to the pure Property path where Property is default
value initialized and then partially overwritten (this doesn't seem to
buy anything else verification wise).
Default valued Properties (as specified ODS side rather than C++ side)
triggered error as the containing class was not yet complete but
referenced nested class, so that we couldn't have default initializer
for them in the parent class. Added an additional forwarding builder to
avoid needing to update call sites. This could be split out to separate
change.
Inlined templated function in unit test that was only used once. Moved
initialization earlier where seen.
This commit extends the SROA interfaces to ensure the interface
instantiations can communicate newly created allocators to the
algorithm. This ensures that the SROA implementation does no longer
require re-walking the IR to find new allocators.
This commit fixes Mem2Regs mutli-slot allocator handling and extends the
test dialect to test this.
Additionally, this modifies Mem2Reg's API to always attempt a full
promotion on all the passed in "allocators". This ensures that the pass
does not require unnecessary walks over the regions and improves caching
benefits.
I'm planning to remove StringRef::equals in favor of
StringRef::operator==.
- StringRef::operator==/!= outnumber StringRef::equals by a factor of
10 under mlir/ in terms of their usage.
- The elimination of StringRef::equals brings StringRef closer to
std::string_view, which has operator== but not equals.
- S == "foo" is more readable than S.equals("foo"), especially for
!Long.Expression.equals("str") vs Long.Expression != "str".
Torch-mlir integration is currently blocked on `memref.expand_shape`
verifier errors of the form
```
'memref.expand_shape' op invalid output shape provided at pos 1
```
The verifier code generating these errors was introduced in
https://github.com/llvm/llvm-project/pull/91245. I have commented there
why I believe it's incorrect. This PR has my suggested fix.
Unfortunately, this does not seem to be directly testable on `memref`
IR, because `static_output_shape` is not directly exposed in the custom
assembly format.
This commit changes the `MemorySlotInterface` back to using `OpBuilder`
instead of a rewriter. This was originally introduced in
https://reviews.llvm.org/D150432 but it was shown that patterns are a
bad idea for both Mem2Reg and SROA.
Mem2Reg suffers from the usage of a rewriter due to being forced to
create new basic blocks. This is an issue, as it leads to the
invalidation of the dominance information, which can be expensive to
recompute.
This is a new take on #89111. Now that #90040 is merged, this has become
trivial to implement. The added test shows the kind of benefit that we
get from this: now dim-of-expand-shape naturally folds without us
needing to implement an ad-hoc folding rewrite.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This PR adds support for `memref.collapse_shape` to sub-byte type emulation. The `memref.collapse_shape` becomes a no-opt given that we are flattening the memref as part of the emulation (i.e., we are collapsing all the dimensions).
This PR adds a new pattern to the set of patterns used to resolve the offset, sizes and
stride of a memref. Similar to `ExtractStridedMetadataOpSubviewFolder`, the new
pattern resolves strided_metadata(collapse_shape) directly, without introduce a
reshape_cast op.
This commit implements runtime verification for LinalgStructuredOps
using the existing `RuntimeVerifiableOpInterface`. The verification
checks that the runtime sizes of the operands match the runtime sizes
inferred by composing the loop ranges with the op's indexing maps.
This commit enhances the LLVM dialect's Mem2Reg interfaces to support
partial stores to memory slots. To achieve this support, the `getStored`
interface method has to be extended with a parameter of the reaching
definition, which is now necessary to produce the resulting value after
this store.
This commit implements runtime verification for LinalgStructuredOps
using the existing `RuntimeVerifiableOpInterface`. The verification
checks that the runtime sizes of the operands match the runtime sizes
inferred by composing the loop ranges with the op's indexing maps.
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>