This adds Python abstractions for the different handle types of the
transform dialect
The abstractions allow for straightforward chaining of transforms by
calling their member functions.
As an initial PR for this infrastructure, only a single transform is
included: `transform.structured.match`.
With a future `tile` transform abstraction an example of the usage is:
```Python
def script(module: OpHandle):
module.match_ops(MatchInterfaceEnum.TilingInterface).tile(tile_sizes=[32,32])
```
to generate the following IR:
```mlir
%0 = transform.structured.match interface{TilingInterface} in %arg0
%tiled_op, %loops = transform.structured.tile_using_for %0 [32, 32]
```
These abstractions are intended to enhance the usability and flexibility
of the transform dialect by providing an accessible interface that
allows for easy assembly of complex transformation chains.
The `conv_2d_ngchw_fgchw` Op implements 2d grouped convolution with
dimensions ordered as given in the name. However, the current
implementation orders weights as `gfchw` instead of `fgchw`. This was
already pointed out in an old phabricator revision which never landed:
https://reviews.llvm.org/D150064
This patch
1) Adds a new op `conv_2d_ngchw_gfchw`
2) Fixes the dimension ordering of the old op `conv_2d_ngchw_fgchw`
3) Adds tests with non-dynamic dimensions so that it's easier to
understand.
The existing initialization sequence always enables multi-threading at
MLIRContext construction time, making it impractical to provide a
customized thread pool.
Here, this is changed to always create the context with threading
disabled, process all site-specific init hooks (which can set thread
pools) and ultimately enable multi-threading unless if site-configured
to not do so.
This should preserve the existing user-visible initialization behavior
while also letting downstreams ensure that contexts are always created
with a shared thread pool. This was tested with IREE, which has such a
concept. Using site-specific thread tuning produced up to 2x single
compilation job behavior and customization of batch compilation (i.e. as
part of a build system) to utilize half the memory and run the entire
test suite ~2x faster. Given this, I believe that the additional
configurability can well pay for itself for implementations that use it.
We may also want to present user-level Python APIs for controlling
threading configuration in the future.
This PR adds "value casting", i.e., a mechanism to wrap `ir.Value` in a
proxy class that overloads dunders such as `__add__`, `__sub__`, and
`__mul__` for fun and great profit.
This is thematically similar to
bfb1ba7526
and
9566ee2806.
The example in the test demonstrates the value of the feature (no pun
intended):
```python
@register_value_caster(F16Type.static_typeid)
@register_value_caster(F32Type.static_typeid)
@register_value_caster(F64Type.static_typeid)
@register_value_caster(IntegerType.static_typeid)
class ArithValue(Value):
__add__ = partialmethod(_binary_op, op="add")
__sub__ = partialmethod(_binary_op, op="sub")
__mul__ = partialmethod(_binary_op, op="mul")
a = arith.constant(value=FloatAttr.get(f16_t, 42.42))
b = a + a
# CHECK: ArithValue(%0 = arith.addf %cst, %cst : f16)
print(b)
a = arith.constant(value=FloatAttr.get(f32_t, 42.42))
b = a - a
# CHECK: ArithValue(%1 = arith.subf %cst_0, %cst_0 : f32)
print(b)
a = arith.constant(value=FloatAttr.get(f64_t, 42.42))
b = a * a
# CHECK: ArithValue(%2 = arith.mulf %cst_1, %cst_1 : f64)
print(b)
```
**EDIT**: this now goes through the bindings and thus supports automatic
casting of `OpResult` (including as an element of `OpResultList`),
`BlockArgument` (including as an element of `BlockArgumentList`), as
well as `Value`.
The _mlirRegisterEverything symbol may not be built by some customers. The
code here was intended to support this, but didn't properly initialize the
init_module variable.
This would break JAX with:
NameError: free variable 'init_module' referenced before assignment in enclosing scope
Added missing register_translations in python to replicate the same in
the C-API
Cleaned up the current calls to register passes where the other calls
are already embedded in the mlirRegisterAllPasses.
found here,
https://discourse.llvm.org/t/opencl-example/74187
Linalg currently has these named ops:
* `matmul`
* `matvec`
* `vecmat`
* `batch_matmul`
* `batch_matvec`
But it does not have:
* `batch_vecmat`
This PRs adds that for consistency, and I have a short-term need for it
( https://github.com/openxla/iree/issues/15158 ), so not having this
would cause some contortion on my end.
Currently, `linalg.transpose` and `linalg.broadcast` can't be emitted
through either the C API or the python bindings (which of course go
through the C API). See
https://discourse.llvm.org/t/how-to-build-linalg-transposeop-in-mlir-pybind/73989/10.
The reason is even though they're named ops, there is no opdsl
`@linalg_structured_op` for them and thus while they can be instantiated
they cannot be passed to
[`mlirLinalgFillBuiltinNamedOpRegion`](a7cccb9cbb/mlir/lib/CAPI/Dialect/Linalg.cpp (L18)).
I believe the issue is they both take a `IndexAttrDef` but
`IndexAttrDef` cannot represent dynamic rank. Note, if I'm mistaken and
there is a way to write the `@linalg_structured_op` let me know.
The solution here simply implements the `regionBuilder` interface which
is then picked up by
[`LinalgDialect::addNamedOpBuilders`](7557530f42/mlir/lib/Dialect/Linalg/IR/LinalgDialect.cpp (L116)).
Extension classes are added "by hand" that mirror the API of the
`@linalg_structured_op`s. Note, the extension classes are added to to
`dialects/linalg/__init__.py` instead of
`dialects/linalg/opdsl/ops/core_named_ops.py` in order that they're not
confused for opdsl generators/emitters.
This PR replaces the mixin `OpView` extension mechanism with the
standard inheritance mechanism.
Why? Firstly, mixins are not very pythonic (inheritance is usually used
for this), a little convoluted, and too "tight" (can only be used in the
immediately adjacent `_ext.py`). Secondly, it (mixins) are now blocking
are correct implementation of "value builders" (see
[here](https://github.com/llvm/llvm-project/pull/68764)) where the
problem becomes how to choose the correct base class that the value
builder should call.
This PR looks big/complicated but appearances are deceiving; 4 things
were needed to make this work:
1. Drop `skipDefaultBuilders` in
`OpPythonBindingGen::emitDefaultOpBuilders`
2. Former mixin extension classes are converted to inherit from the
generated `OpView` instead of being "mixins"
a. extension classes that simply were calling into an already generated
`super().__init__` continue to do so
b. (almost all) extension classes that were calling `self.build_generic`
because of a lack of default builder being generated can now also just
call `super().__init__`
3. To handle the [lone single
use-case](https://sourcegraph.com/search?q=context%3Aglobal+select_opview_mixin&patternType=standard&sm=1&groupBy=repo)
of `select_opview_mixin`, namely
[linalg](https://github.com/llvm/llvm-project/blob/main/mlir/python/mlir/dialects/_linalg_ops_ext.py#L38),
only a small change was necessary in `opdsl/lang/emitter.py` (thanks to
the emission/generation of default builders/`__init__`s)
4. since the `extend_opview_class` decorator is removed, we need a way
to register extension classes as the desired `OpView` that `op.opview`
conjures into existence; so we do the standard thing and just enable
replacing the existing registered `OpView` i.e.,
`register_operation(_Dialect, replace=True)`.
Note, the upgrade path for the common case is to change an extension to
inherit from the generated builder and decorate it with
`register_operation(_Dialect, replace=True)`. In the slightly more
complicated case where `super().__init(self.build_generic(...))` is
called in the extension's `__init__`, this needs to be updated to call
`__init__` in `OpView`, i.e., the grandparent (see updated docs).
Note, also `<DIALECT>_ext.py` files/modules will no longer be automatically loaded.
Note, the PR has 3 base commits that look funny but this was done for
the purpose of tracking the line history of moving the
`<DIALECT>_ops_ext.py` class into `<DIALECT>.py` and updating (commit
labeled "fix").
The reason I want this is that I am writing my own Python bindings and
would like to use the insertion point from
`PyThreadContextEntry::getDefaultInsertionPoint()` to call C++ functions
that take an `OpBuilder` (I don't need to expose it in Python but it
also seems appropriate). AFAICT, there is currently no way to translate
a `PyInsertionPoint` into an `OpBuilder` because the operation is
inaccessible.
This PR creates the necessary files to support bindings for operations
in the affine dialect.
This is the first of many PRs which will progressively introduce
affine.load, affine.for, etc operations. I would like to
acknowledge the work by Nelli's author @makslevental :
https://github.com/makslevental/nelli/blob/main/nelli/mlir/affine/affine.py
which jump-starts the work.
This PR adds the additional generation of what I'm calling "value
builders" (a term I'm not married to) that look like this:
```python
def empty(sizes, element_type, *, loc=None, ip=None):
return get_result_or_results(tensor.EmptyOp(sizes=sizes, element_type=element_type, loc=loc, ip=ip))
```
which instantiates a `tensor.EmptyOp` and then immediately grabs the
result (`OpResult`) and then returns that *instead of a handle to the
op*.
What's the point of adding these when `EmptyOp.result` already exists?
My claim/feeling/intuition is that eDSL users are more comfortable with
a value centric programming model (i.e., passing values as operands) as
opposed to an operator instantiation programming model. Thus this change
enables (or at least goes towards) the bindings supporting such a user
and use case. For example,
```python
i32 = IntegerType.get_signless(32)
...
ten1 = tensor.empty((10, 10), i32)
ten2 = tensor.empty((10, 10), i32)
ten3 = arith.addi(ten1, ten2)
```
Note, in order to present a "pythonic" API and enable "pythonic" eDSLs,
the generated identifiers (op names and operand names) are snake case
instead of camel case and thus `llvm::convertToSnakeFromCamelCase`
needed a small fix. Thus this PR is stacked on top of
https://github.com/llvm/llvm-project/pull/68375.
In addition, as a kind of victory lap, this PR adds a "rangefor" that
looks and acts exactly like python's `range` but emits `scf.for`.
TOSA defines the filter channel ordering for 2D convolution operation
`tosa.conv2d` as `[OC, KH, KW, IC]`. The LinAlg dialect supports `[F, H,
W, C]` and `[H, W, C, F]` orderings via the `linalg.conv_2d_nhwc_fhwc`
and `linalg.conv_2d_nhwc_hwcf` operations respectively. Where `F == OC`,
`KH == H`, `KW == W` and `C == IC`.
Currently `tosa.conv2d` is lowered to `linalg.conv2d_nhwc_hwcf` meaning
we need to insert a transposition operation to permute the filter
channels before they can be passed as weights to the linalg op, that is
`[F, H, W, C]` -> `[H, W, C, F]`. An analogous transformation needs to
be applied to the quantized operation that lowers to
`linalg.conv_2d_nhwc_hwcf_q`.
This commit updates the TOSA->LinAlg lowering so that `tosa.conv2d` is
lowered to `linalg.conv2d_nhwc_fhwc` removing the need for the
introduction of a transposition operation and making the mapping 1-1. It
also adds a `linalg.conv_2d_nhwc_fhwc_q` quantized operation to the
LinAlg dialect so the same direct 1-1 mapping can be applied to the
quantized variant.
This commit does not add any new lit tests but repurposes the current
TosaToLinalgNamed tests by removing the checks for transpositions and
updating the targeted LinAlg operations from `linalg.conv2d_nhwc_hwcf`
to linalg.conv2d_nhwc_fhwc`.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
This patch updates `transform.loop.peel` so that this Op returns two
rather than one handle:
* one for the peeled loop, and
* one for the remainder loop.
Also, following this change this Op will fail if peeling fails. This is
consistent with other similar Ops that also fail if no transformation
takes place.
Relands #67482 with an extra fix for transform_loop_ext.py
Rename and restructure tiling-related transform ops from the structured
extension to be more homogeneous. In particular, all ops now follow a
consistent naming scheme:
- `transform.structured.tile_using_for`;
- `transform.structured.tile_using_forall`;
- `transform.structured.tile_reduction_using_for`;
- `transform.structured.tile_reduction_using_forall`.
This drops the "_op" naming artifact from `tile_to_forall_op` that
shouldn't have been included in the first place, consistently specifies
the name of the control flow op to be produced for loops (instead of
`tile_reduction_using_scf` since `scf.forall` also belongs to `scf`),
and opts for the `using` connector to avoid ambiguity.
The loops produced by tiling are now systematically placed as *trailing*
results of the transform op. While this required changing 3 out of 4 ops
(except for `tile_using_for`), this is the only choice that makes sense
when producing multiple `scf.for` ops that can be associated with a
variadic number of handles. This choice is also most consistent with
*other* transform ops from the structured extension, in particular with
fusion ops, that produce the structured op as the leading result and the
loop as the trailing result.
This PR adds a new transform op that replaces `memref.alloca`s with
`memref.get_global`s to newly inserted `memref.global`s. This is useful,
for example, for allocations that should reside in the shared memory of
a GPU, which have to be declared as globals.
This PR renames the vectorization transform ops as follows:
* `structured.masked_vectorize` => `structured.vectorize`. This reflects
the fact that since [recently](https://reviews.llvm.org/D157774) the op
can also handle the unmasked case.
* `structured.vectorize` =>
`structured.vectorize_children_and_applies_patterns`. This reflects the
fact that the op does not just vectorize the given payload op but all
vectorizable children contained in it, and applies patterns before and
after for preparation and clean-up.
This rename was discussed first
[here](https://reviews.llvm.org/D157774).
The PR also adapts and cleans ups the tablegen description of the
`VectorizeChildrenAndApplyPatternsOp` (formerly `VectorizeOp`).
Don't generate enums from the main VectorOps.td file as that
transitively includes enums from Arith.
---------
Co-authored-by: Nicolas Vasilache <ntv@google.com>
- make sure that the type of induction variable should be determined by the type of the lower bound type.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D159534
The Linalg named ops specification went out of sync with the OpDSL
description, presumably due to direct manual modifications of the yaml
file.
Additionally, the unsigned division has been generating the signed
scalar instruction, which is now fixed.
This revision pipes the fastmath attribute support through the
vector.reduction op. This seemingly simple first step already requires
quite some genuflexions, file and builder reorganization. In the
process, retire the boolean reassoc flag deep in the LLVM dialect
builders and just use the fastmath attribute.
During conversions, templated builders for predicated intrinsics are
partially cleaned up. In the future, to finalize the cleanups, one
should consider adding fastmath to the VPIntrinsic ops.
* Moves several orphaned methods from Operation/OpView -> _OperationBase
so that both hierarchies share them (whether unknown or known to ODS).
* Adds typing information for missing `MLIRError` exception.
* Adds `DiagnosticInfo` typing.
* Adds `DenseResourceElementsAttr` typing that was missing.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.