This patch adds the `convertInstruction` and `getSupportedInstructions`
to `LLVMImportInterface`, allowing any non-LLVM dialect to specify how
to import LLVM IR instructions and overriding the default import of LLVM instructions.
Before the change `test-loop-fusion` and `affine-super-vectorizer-test`
options were in their own category. This was because they used the
standard llvm command line parsing with `llvm::cl::opt`. This PR moves
them over to the mlir `Pass::Option` class.
Before the change
```
$ mlir-opt --help
...
General options:
...
Compiler passes to run
Passes:
...
Pass Pipelines:
...
Generic Options:
....
affine-super-vectorizer-test options:
--backward-slicing
...
--vectorize-affine-loop-nest
test-loop-fusion options:
--test-loop-fusion-dependence-check
...
--test-loop-fusion-transformation
```
After the change
```
$ mlir-opt --help
...
General options:
...
Compiler passes to run
Passes:
...
--affine-super-vectorizer-test
--backward-slicing
...
--vectorize-affine-loop-nest
...
--test-loop-fusion options:
--test-loop-fusion-dependence-check
...
--test-loop-fusion-transformation
...
Pass Pipelines:
...
Generic Options:
...
```
---------
Signed-off-by: philass <plassen@groq.com>
This commit changes the API of `ValueBoundsConstraintSet`: the stop
condition is now passed to the constructor instead of `processWorklist`.
That makes it easier to add items to the worklist multiple times and
process them in a consistent manner. The current
`ValueBoundsConstraintSet` is passed as a reference to the stop
function, so that the stop function can be defined before the the
`ValueBoundsConstraintSet` is constructed.
This change is in preparation of adding support for branches.
This commit extends the data layout subsystem with accessors for the
endianness. The implementation follows the structure implemented for
alloca, global, and program memory spaces.
Before this change: `notifyOperationReplaced` was triggered when calling
`RewriteBase::replaceOp`.
After this change: `notifyOperationReplaced` is triggered when
`RewriterBase::replaceAllOpUsesWith` or `RewriterBase::replaceOp` is
called.
Until now, every `notifyOperationReplaced` was always sent together with
a `notifyOperationErased`, which made that `notifyOperationErased`
callback irrelevant. More importantly, when a user called
`RewriterBase::replaceAllOpUsesWith`+`RewriterBase::eraseOp` instead of
`RewriterBase::replaceOp`, no `notifyOperationReplaced` callback was
sent, even though the two notations are semantically equivalent. As an
example, this can be a problem when applying patterns with the transform
dialect because the `TrackingListener` will only see the
`notifyOperationErased` callback and the payload op is dropped from the
mappings.
Note: It is still possible to write semantically equivalent code that
does not trigger a `notifyOperationReplaced` (e.g., when op results are
replaced one-by-one), but this commit already improves the situation a
lot.
Adds support for scalable vectors to patterns defined in
VectorLineralize.cpp.
Linearization is disable in 2 notable cases:
* vectors with more than 1 scalable dimension (we cannot represent
vscale^2),
* vectors initialised with arith.constant that's not a vector splat
(such arith.constant Ops cannot be flattened).
This adds a new API built with the `ValueBoundsConstraintSet` to compute
the bounds of possibly scalable quantities. It uses knowledge of the
range of vscale (which is defined by the target architecture), to solve
for the bound as either a constant or an expression in terms of vscale.
The result is an `AffineMap` that will always take at most one
parameter, vscale, and returns a single result, which is the bound of
`value`.
The API is defined as follows:
```c++
FailureOr<ConstantOrScalableBound>
vector::ScalableValueBoundsConstraintSet::computeScalableBound(
Value value, std::optional<int64_t> dim,
unsigned vscaleMin, unsigned vscaleMax,
presburger::BoundType boundType,
bool closedUB = true,
StopConditionFn stopCondition = nullptr);
```
Note: `ConstantOrScalableBound` is a thin wrapper over the `AffineMap`
with a utility for converting the bound to a single quantity (i.e. a
size and scalable flag).
We believe this API could prove useful downstream in IREE (which uses a
similar analysis to hoist allocas, which currently fails for scalable
vectors).
When importing from LLVM IR the data layout of all pointer types
contains an index bitwidth that should be used for index computations.
This revision adds a getter to the DataLayout that provides access to
the already stored bitwidth. The function returns an optional since only
pointer-like types have an index bitwidth. Querying the bitwidth of a
non-pointer type returns std::nullopt.
The new function works for the built-in Index type and, using a type
interface, for the LLVMPointerType.
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.
Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.
As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
isAccessIndexInvariant had outdated code and didn't handle IR with
multiple
affine.apply ops, which is inconvenient when used as a utility. This is
addressed by switching to use the proper API on AffineValueMap. Add
mlir::affine::isInvariantAccess exposed for outside use and tested via
the test pass. Add a method on AffineValueMap. Add test cases to
exercise simplification and composition for invariant access analysis.
A TODO/FIXME has been added but this issue existed before.
We need to generate `.has_value` for `OptionalParseResult`, also ensure
that `auto result` doesn't conflict with `result` which is the variable
name for `OperationState`.
This patch adds a the `LowerVectorToArmNeonPattern` patterns to the
ArmNeon.
This pattern inspects `vector.contract` ops that can be 1-1 mapped to an
`arm.neon.smmla` intrinsic. The contract ops must be separated into
tiles who's inputs must fit that of a single smmla op (`2x8xi32` inputs
and `2x2xi32` output). The `vector.contract` inputs must be sign
extended from narrow types (<=i8) to be converted. If all conditions are
met, an smmla op is inserted with additional `vector.shape_casts` to
handle linearizing the input and output dimension.
This commit adds listener support to the dialect conversion. Similarly
to the greedy pattern rewrite driver, an optional listener can be
specified in the configuration object.
Listeners are notified only if the dialect conversion succeeds. In case
of a failure, where some IR changes are first performed and then rolled
back, no notifications are sent.
Due to the fact that some kinds of rewrite are reflected in the IR
immediately and some in a delayed fashion, there are certain limitations
when attaching a listener; these are documented in `ConversionConfig`.
To summarize, users are always notified about all rewrites that
happened, but the notifications are sent all at once at the very end,
and not interleaved with the actual IR changes.
This change is in preparation improvements to
`transform.apply_conversion_patterns`, which currently invalidates all
handles. In the future, it can use a listener to update handles
accordingly, similar to `transform.apply_patterns`.
* `replaceOp` replaces all uses of the original op and erases the old
op.
* `replaceAllUsesWith` replaces all uses of the original op/value/block.
It does not erase any IR.
This commit renames `replaceOpWithIf` to `replaceUsesWithIf`.
`replaceOpWithIf` was a misnomer because the function never erases the
original op. Similarly, `replaceOpWithinBlock` is renamed to
`replaceUsesWithinBlock`. (No "operation replaced" is sent because the
op is not erased.)
Also improve comments.
This commit adds a new `ConversionConfig` struct that allows users to
customize the dialect conversion. This configuration is similar to
`GreedyRewriteConfig` for the greedy pattern rewrite driver.
A few existing options are moved to this objects, simplifying the
dialect conversion API.
This is a re-upload of #82250. The Windows build breakage was fixed in #83768.
This reverts commit 60fbd60501.
This commit fixes a bug in a dialect conversion. Currently, when a block
is replaced via a signature conversion, the block is erased during the
"commit" phase. This is problematic because the block arguments may
still be referenced internal data structures of the dialect conversion
(`mapping`). Blocks should be treated same as ops: they should be erased
during the "cleanup" phase.
Note: The test case fails without this fix when running with ASAN, but
may pass when running without ASAN.
From https://reviews.llvm.org/D153245
This adds support for native PDL (and PDLL) C++ constraints to return
results.
This is useful for situations where a pattern checks for certain
constraints of multiple interdependent attributes and computes a new
attribute value based on them. Currently, for such an example it is
required to escape to C++ during matching to perform the check and after
a successful match again escape to native C++ to perform the computation
during the rewriting part of the pattern. With this work we can do the
computation in C++ during matching and use the result in the rewriting
part of the pattern. Effectively this enables a choice in the trade-off
of memory consumption during matching vs recomputation of values.
This is an example of a situation where this is useful: We have two
operations with certain attributes that have interdependent constraints.
For instance `attr_foo: one_of [0, 2, 4, 8], attr_bar: one_of [0, 2, 4,
8]` and `attr_foo == attr_bar`. The pattern should only match if all
conditions are true. The new operation should be created with a new
attribute which is computed from the two matched attributes e.g.
`attr_baz = attr_foo * attr_bar`. For the check we already escape to
native C++ and have all values at hand so it makes sense to directly
compute the new attribute value as well:
```
Constraint checkAndCompute(attr0: Attr, attr1: Attr) -> Attr;
Pattern example with benefit(1) {
let foo = op<test.foo>() {attr = attr_foo : Attr};
let bar = op<test.bar>(foo) {attr = attr_bar : Attr};
let attr_baz = checkAndCompute(attr_foo, attr_bar);
rewrite bar with {
let baz = op<test.baz> {attr=attr_baz};
replace bar with baz;
};
}
```
To achieve this the following notable changes were necessary:
PDLL:
- Remove check in PDLL parser that prevented native constraints from
returning results
PDL:
- Change PDL definition of pdl.apply_native_constraint to allow variadic
results
PDL_interp:
- Change PDL_interp definition of pdl_interp.apply_constraint to allow
variadic results
PDLToPDLInterp Pass:
The input to the pass is an arbitrary number of PDL patterns. The pass
collects the predicates that are required to match all of the pdl
patterns and establishes an ordering that allows creation of a single
efficient matcher function to match all of them. Values that are matched
and possibly used in the rewriting part of a pattern are represented as
positions. This allows fusion and thus reusing a single position for
multiple matching patterns. Accordingly, we introduce
ConstraintPosition, which records the type and index of the result of
the constraint. The problem is for the corresponding value to be used in
the rewriting part of a pattern it has to be an input to the
pdl_interp.record_match operation, which is generated early during the
pass such that its surrounding block can be referred to by branching
operations. In consequence the value has to be materialized after the
original pdl.apply_native_constraint has been deleted but before we get
the chance to generate the corresponding pdl_interp.apply_constraint
operation. We solve this by emitting a placeholder value when a
ConstraintPosition is evaluated. These placeholder values (due to fusion
there may be multiple for one constraint result) are replaced later when
the actual pdl_interp.apply_constraint operation is created.
Changes since the phabricator review:
- Addressed all comments
- In particular, removed registerConstraintFunctionWithResults and
instead changed registerConstraintFunction so that contraint functions
always have results (empty by default)
- Thus we don't need to reuse `rewriteFunctions` to store constraint
functions with results anymore, and can instead use
`constraintFunctions`
- Perform a stable sort of ConstraintQuestion, so that
ConstraintQuestion appear before other ConstraintQuestion that use their
results.
- Don't create placeholders for pdl_interp::ApplyConstraintOp. Instead
generate the `pdl_interp::ApplyConstraintOp` before generating the
successor block.
- Fixed a test failure in the pdl python bindings
Original code by @martin-luecke
Co-authored-by: martin-luecke <martinpaul.luecke@amd.com>
`isContiguousAccess` is an important affine analysis utility but is only
tested very indirectly via passes like vectorization and is not exposed.
Expose it and add a test pass for it that'll make it easier/feasible to
write test cases. This is especially needed since the utility can be
significantly enhanced in power, and we need a test pass to exercise it
directly.
This pass can in the future be used to test the utility of invariant
accesses as well.
The dialect conversion maintains sets of "ignored" and "replaced" ops.
This change simplifies the two sets, such that all nested ops are
included. (This was previously not the case and sometimes only the
parent op was included.)
This change allows for more aggressive assertions to prevent incorrect
rewriter API usage. E.g., accessing ops/blocks/regions within an erased
op.
A concrete example: I have seen conversion patterns in downstream
projects where an op is replaced with a new op, and the region of the
old op is afterwards inlined into the newly created op. This is invalid
rewriter API usage: ops that were replaced/erased should not be
accessed. Nested ops will be considered "ignored", even if they are
moved to a different region after the region's parent op was erased
(which is illegal API usage). Instead, create a new op, inline the
regions, then replace the old op with the new op.
Currently n-d transfer write distribution can be inconsistent with
distribution of reductions if a value has multiple users, one of which
is a transfer_write with a non-standard distribution map, and the other
of which is a vector.reduction.
We may want to consider removing the distribution map functionality in
the future for this reason.
This commit adds a new `ConversionConfig` struct that allows users to
customize the dialect conversion. This configuration is similar to
`GreedyRewriteConfig` for the greedy pattern rewrite driver.
A few existing options are moved to this objects, simplifying the
dialect conversion API.
The ArmSME compilation pipeline has evolved significantly and is now
sufficiently complex enough that it warrants a proper lowering pipeline
that encapsulates the various passes and orderings. Currently the
pipeline is loosely defined in our integration tests, but these have
diverged and are not using the same passes or ordering everywhere.
This patch introduces a test-lower-to-arm-sme pipeline mirroring
test-lower-to-llvm that provides some sanity when running e2e examples
and can be used a reference for targeting ArmSME in MLIR.
All the integration tests are updated to use this pipeline. The
intention is to productize the pipeline once it becomes more mature.
This PR adds an optional bitwidth parameter to the vector xfer op
flattening transformation so that the flattening doesn't happen if the
trailing dimension of the read/writen vector is larger than this
bitwidth (i.e., we are already able to fill at least one vector register
with that size).
The dialect conversion rolls back in-place op modifications upon
failure. Rolling back modifications of attributes is already supported,
but there was no support for properties until now.
Rename listener callback names:
* `notifyOperationRemoved` -> `notifyOperationErased`
* `notifyBlockRemoved` -> `notifyBlockErased`
The current callback names are misnomers. The callbacks are triggered
when an operation/block is erased, not when it is removed (unlinked).
E.g.:
```c++
/// Notify the listener that the specified operation is about to be erased.
/// At this point, the operation has zero uses.
///
/// Note: This notification is not triggered when unlinking an operation.
virtual void notifyOperationErased(Operation *op) {}
```
This change is in preparation of adding listener support to the dialect
conversion. The dialect conversion internally unlinks IR before erasing
it at a later point of time. There is an important difference between
"remove" and "erase". Lister callback names should be accurate to avoid
confusion.
This is a new ODS feature that allows dialects to define a list of
key/value pair representing an attribute type and a name.
This will generate helper classes on the dialect to be able to
manage discardable attributes on operations in a type safe way.
For example the `test` dialect can define:
```
let discardableAttrs = (ins
"mlir::IntegerAttr":$discardable_attr_key,
);
```
And the following will be generated in the TestDialect class:
```
/// Helper to manage the discardable attribute `discardable_attr_key`.
class DiscardableAttrKeyAttrHelper {
::mlir::StringAttr name;
public:
static constexpr ::llvm::StringLiteral getNameStr() {
return "test.discardable_attr_key";
}
constexpr ::mlir::StringAttr getName() {
return name;
}
DiscardableAttrKeyAttrHelper(::mlir::MLIRContext *ctx)
: name(::mlir::StringAttr::get(ctx, getNameStr())) {}
mlir::IntegerAttr getAttr(::mlir::Operation *op) {
return op->getAttrOfType<mlir::IntegerAttr>(name);
}
void setAttr(::mlir::Operation *op, mlir::IntegerAttr val) {
op->setAttr(name, val);
}
bool isAttrPresent(::mlir::Operation *op) {
return op->hasAttrOfType<mlir::IntegerAttr>(name);
}
void removeAttr(::mlir::Operation *op) {
assert(op->hasAttrOfType<mlir::IntegerAttr>(name));
op->removeAttr(name);
}
};
DiscardableAttrKeyAttrHelper getDiscardableAttrKeyAttrHelper() {
return discardableAttrKeyAttrName;
}
```
User code having an instance of the TestDialect can then manipulate this
attribute on operation using:
```
auto helper = testDialect.getDiscardableAttrKeyAttrHelper();
helper.setAttr(op, value);
helper.isAttrPresent(op);
...
```
This op is the inverse of all-gather. It is useful to have an explicit
concise representation instead of having a blob of slicing logic.
Add lowering for the op that slices from the tensor based on the
in-group process index.
Make resharding generate an all-slice instead of inserting the slicing
logic directly.
Add a new rewrite class for "operation movements". This rewrite class
can roll back `moveOpBefore` and `moveOpAfter`.
`RewriterBase::moveOpBefore` and `RewriterBase::moveOpAfter` is no
longer virtual. (The dialect conversion can gather all required
information for rollbacks from listener notifications.)
Common backends (LLVM, SPIR-V) only supports 1D vectors, LLVM conversion
handles ND vectors (N >= 2) as `array<array<... vector>>` and SPIR-V
conversion doesn't handle them at all at the moment. Sometimes it's
preferable to treat multidim vectors as linearized 1D. Add pass to do
this. Only constants and simple elementwise ops are supported for now.
@krzysz00 I've extracted yours result type conversion code from
LegalizeToF32 and moved it to common place.
Also, add ConversionPattern class operating on traits.
Collection of changes with the goal of being able to convert `encoding`
to `memorySpace` during bufferization
- new API for encoder to allow implementation to select destination
memory space
- update existing bufferization implementations to support the new
interface
Similar to `OpBuilder::clone`, operation/block insertion notifications
should be sent when cloning the contents of a region. E.g., this is to
ensure that the newly created operations are put on the worklist of the
greedy pattern rewriter driver.
Also move `cloneRegionBefore` from `RewriterBase` to `OpBuilder`. It
only creates new IR, so it should be part of the builder API (like
`clone(Operation &)`). The function does not have to be virtual. Now
that notifications are properly sent, the override in the dialect
conversion is no longer needed.
* Split out `MeshDialect.h` form `MeshOps.h` that defines the dialect
class. Reduces include clutter if you care only about the dialect and
not the ops.
* Expose functions `getMesh` and `collectiveProcessGroupSize`. There
functions are useful for outside users of the dialect.
* Remove unused code.
* Remove examples and tests of mesh.shard attribute in tensor encoding.
Per the decision that Spmdization would be performed on sharding
annotations and there will be no tensors with sharding specified in the
type.
For more info see this RFC comment:
https://discourse.llvm.org/t/rfc-sharding-framework-design-for-device-mesh/73533/81
When a block is split with `RewriterBase::splitBlock`, a
`notifyBlockInserted` notification, followed by
`notifyOperationInserted` notifications (for moving over the operations
into the new block) should be sent. This commit adds those
notifications.
When a block is inlined into another block, the nested operations are
moved into another block and the `notifyOperationInserted` callback
should be triggered. This commit adds the missing notifications for:
* `RewriterBase::inlineBlockBefore`
* `RewriterBase::mergeBlocks`
This commit adds a new method to the rewriter API: `moveBlockBefore`.
This op is utilized by `inlineRegionBefore` and covered by dialect
conversion test cases.
Also fixes a bug in `moveOpBefore`, where the previous op location was
not passed correctly. Adds a test case to
`test-strict-pattern-driver.mlir`.