Commit Graph

9773 Commits

Author SHA1 Message Date
Jan Leyonberg
17db9efe92 [OpenMP][MLIR] Add omp.distribute op to the OMP dialect (#67720)
This patch adds the omp.distribute operation to the OMP dialect. The
purpose is to be able to represent the distribute construct in OpenMP
with the associated clauses. The effect of the operation is to
distributes the loop iterations of the loop(s) contained inside the
region across multiple teams.
2024-01-24 10:51:47 -05:00
Mirko Brkušanin
7fdf608cef [AMDGPU] Add GFX12 WMMA and SWMMAC instructions (#77795)
Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com>
Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com>
2024-01-24 13:43:07 +01:00
Uday Bondhugula
b4785cebfb [MLIR] NFC. Clean up stale TODO comments and style deviations in affine utils (#79079)
NFC. Clean up stale TODO comments and style deviations in affine utils
and
affine fusion utils.
2024-01-24 11:41:44 +05:30
Krzysztof Drewniak
750e90e440 [mlir][ArithToAMDGPU] Add option for saturating truncation to fp8 (#74153)
Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.

To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).

Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.
2024-01-23 16:52:21 -06:00
Krzysztof Drewniak
80fcc9247a [mlir][AMDGPU] Actually update the default ABI version, add comments (#79185)
Much confusion occurred earlier today when updating the fallback `int
abi;` in addControlVariables() didn't do anything. THis was because that
that value is the fallback for if the ABI version fails to parse ...
which it always should, because it has a default value that comes from
multiple different places.

This commit updates all the places said default variable can come from,
namely:
1. The ROCDL target attribute definition
2. The ROCDL target attribute's builders
3. The rocdl-attach-target pass's default option values.

With this, the printf test is passing.
2024-01-23 12:16:18 -06:00
Valentin Clement (バレンタイン クレメン)
3eb4178b9c [mlir][openacc] Update acc.loop to be a proper loop like operation (#67355)
The initial design of the `acc.loop` was to be an operation that
encapsulates a loop like operation. This was an early design and we now
want to change it so the `acc.loop` operation becomes a real loop-like
operation by implementing the LoopLikeInterface.

Differential Revision: https://reviews.llvm.org/D159229

This patch is just moved from Phabricator to github
2024-01-22 10:31:29 -08:00
Andrzej Warzynski
75b0c913a5 [mlir][nfc] Update comments
1. Updates and clarifies a few comments related to hooks for
   vector.{insert|extract}_strided_slice.

2. For consistency with vector.insert_strided_slice, removes a TODO from
   vector.extract_strided_slice Op def. It's self-explenatory that
   adding support for non-unit strides is a "TODO".
2024-01-22 14:25:27 +00:00
Cullen Rhodes
9f7fff7f13 [mlir][ArmSME] Add arith-to-arm-sme conversion pass (#78197)
Existing 'arith::ConstantOp' conversion and tests are moved from
VectorToArmSME. There's currently only a single op that's converted at
the moment, but this will grow in the future as things like in-tile add
are implemented. Also, 'createLoopOverTileSlices' is moved to ArmSME
utils since it's relevant for both conversions.
2024-01-22 09:23:11 +00:00
Abhinav271828
68a5261d26 [MLIR][Presburger] Implement function to evaluate the number of terms in a generating function. (#78078)
We implement `computeNumTerms()`, which counts the number of terms in a
generating function by substituting the unit vector in it.
This is the main function in Barvinok's algorithm – the number of points
in a polytope is given by the number of terms in the generating function
corresponding to it.
We also modify the GeneratingFunction class to have `const` getters and
improve the simplification of QuasiPolynomials.
2024-01-22 14:22:01 +05:30
Durgadoss R
aa4547fcc8 [MLIR][NVVM] Update cp.async.bulk Ops to use intrinsics (#78900)
This patch updates the cp.async.bulk.{commit/wait}_group Ops to use NVVM
intrinsics.
* Doc updated for the commit_group Op.
* Tests are added to verify the lowering to the intrinsics.

While we are there, fix the FileCheck directive on the
'nvvm.setmaxregister' test.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-01-22 08:39:30 +01:00
Guray Ozen
12c241b365 [MLIR][NVVM] Explicit Data Type for Output in wgmma.mma_async (#78713)
The current implementation of `nvvm.wgmma.mma_async` Op deduces the data
type of the output matrix from the data type of struct member, which can be
non-intuitive, especially in cases where types like `2xf16` are packed
into `i32`.

This PR addresses this issue by improving the Op to include an explicit
data type for the output matrix.

The modified Op now includes an explicit data type for Matrix-D (<f16>),
and looks as follows:

```
%result = llvm.mlir.undef : !llvm.struct<(struct<(i32, i32, ...
nvvm.wgmma.mma_async
    %descA, %descB, %result,
    #nvvm.shape<m = 64, n = 32, k = 16>,
    D [<f16>, #nvvm.wgmma_scale_out<zero>],
    A [<f16>, #nvvm.wgmma_scale_in<neg>, <col>],
    B [<f16>, #nvvm.wgmma_scale_in<neg>, <col>]
```
2024-01-22 08:37:20 +01:00
Matthias Springer
62bf7710ff [mlir][IR] Add notifyBlockRemoved callback to listener (#78306)
There is already a "block inserted" notification (in
`OpBuilder::Listener`), so there should also be a "block removed"
notification.

The purpose of this change is to make the listener API more mature.
There is currently a gap between what kind of IR changes can be made and
what IR changes can be listened to. At the moment, the only way to
inform listeners about "block removal" is to send a manual
`notifyOperationModified` for the parent op (e.g., by wrapping the
`eraseBlock(b)` method call in `updateRootInPlace(b->getParentOp())`).
This tells the listener that *something* has changed, but it is somewhat
of an API abuse.
2024-01-21 10:06:53 +01:00
Bharathi Ramana Joshi
d70bfeb4e1 [MLIR][Presburger] Implement IntegerRelation::setId (#77872) 2024-01-20 15:19:10 +05:30
Jeff Niu
15b089cb02 [mlir] Make printAlias hooks public (NFC) (#78833)
These are very useful when writing custom parsers and printers for
aggregate types or attributes that might want to print aliases.
2024-01-19 23:23:41 -08:00
Mehdi Amini
e611a4cf80 Revert "[mlir][amdgpu] Shared memory access optimization pass" (#78822)
Reverts llvm/llvm-project#75627 ; it broke the bot:
https://lab.llvm.org/buildbot/#/builders/61/builds/53218
2024-01-19 16:41:43 -08:00
erman-gurses
b7360fbe8c [mlir][amdgpu] Shared memory access optimization pass (#75627)
It implements transformation to optimize accesses to shared memory.

Reference: https://reviews.llvm.org/D127457

_This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by newColIdx = col % vecSize + perm[row](col / vecSize, row)
where perm is a permutation function indexed by row and vecSize
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized
accesses
common to loading matrix multiplication operands to/from shared memory._
2024-01-19 15:44:45 -08:00
Quinn Dawkins
42b160356f [mlir][transform] Add an op for replacing values with function calls (#78398)
Adds `transform.func.cast_and_call` that takes a set of inputs and
outputs and replaces the uses of those outputs with a call to a function
at a specified insertion point.

The idea with this operation is to allow users to author independent IR
outside of a to-be-compiled module, and then match and replace a slice
of the program with a call to the external function.

Additionally adds a mechanism for populating a type converter with a set
of conversion materialization functions that allow insertion of
casts on the inputs/outputs to and from the types of the function
signature.
2024-01-19 13:21:52 -05:00
Matthias Springer
4fc128f817 [mlir][bufferization][NFC] Clean up code (#78594)
Clean up code and remove dead code.
2024-01-19 10:20:41 +01:00
Valentin Clement (バレンタイン クレメン)
b06bc7c6a0 [mlir][flang][openacc] Device type support on acc routine op (#78375)
This patch add support for device_type on the acc.routine operation.
device_type can be specified on seq, worker, vector, gang and bind
information.

The support is following the same design than the one for compute
operations, data operation and the loop operation.
2024-01-18 09:04:11 -08:00
Fehr Mathieu
914cfa4138 [mlir][irdl] Add irdl.base op (#76400)
The `irdl.base` op represent an attribute constraint that will check
that the
base of a type or attribute is the expected one (e.g. `IntegerType`) .

Example:

```mlir
irdl.dialect @cmath {
  irdl.type @complex {
    %0 = irdl.base "!builtin.integer"
    irdl.parameters(%0)
  }

  irdl.type @complex_wrapper {
    %0 = irdl.base @complex
    irdl.parameters(%0)
  }
}
```

The above program defines a `cmath.complex` type that expects a single
parameter, which is a type with base name `builtin.integer`, which is
the
name of an `IntegerType` type.
It also defines a `cmath.complex_wrapper` type that expects a single
parameter, which is a type of base type `cmath.complex`.
2024-01-18 16:31:40 +00:00
Krzysztof Drewniak
05e85e4fc5 [mlir][Math] Add pass to legalize math functions to f32-or-higher (#78361)
Since most of the operations in the `math` dialect don't have
low-precision implementations, add the -math-legalize-to-f32 pass that
goes through and brackets low-precision math funcitons (like `math.sin
%0 : f16`) with `arith.extf` and `arith.truncf`. This preserves the
original semantics of the math operation but allows lowering to proceed.

Versions of this lowering are already implicitly present in some passes,
like ConvertGPUToROCDL. However, because those are implicit rewrites,
they hide the floating-point extension and truncation, preventing anyone
from writing passes that operate on those implitic extf/truncf pairs.

Exposing this legalization explicitly is needed to allow lowening 8-bit
floats on AMD GPUs, as the implementation of extf and truncf on that
platform requires the complex logic found in ArithToAMDGPU, which runs
before the GPU to ROCDL lowering.
2024-01-18 09:37:43 -06:00
Sergio Afonso
2747193058 [Flang][MLIR][OpenMP] Remove the early outlining interface (#78450)
After the removal of the OpenMP early outlining MLIR pass in #67319, the
`EarlyOutliningInterface` stopped doing any useful work. It used to be
necessary to tie the name of the function from which a target region was
outlined to that new function, so it would be used when translating to
LLVM IR in place of the outlined function's name.

This is not necessary anymore, so this patch removes all references to
this interface and uses of the `omp.outline_parent_name` discardable
attribute in tests.
2024-01-18 15:33:43 +00:00
Quinn Dawkins
5caab8bbc0 [mlir][transform] Add transform.get_operand op (#78397)
Similar to `transform.get_result`, except it returns a handle to the
operand indicated by a positional specification, same as is defined for
the linalg match ops.

Additionally updates `get_result` to take the same positional specification.
This makes the use case of wanting to get all of the results of an
operation easier by no longer requiring the user to reconstruct the list
of results one-by-one.
2024-01-18 09:33:14 -05:00
Mehdi Amini
74cf9bcf71 Apply clang-tidy fixes for performance-unnecessary-value-param in Utils.cpp (NFC) 2024-01-17 08:51:41 -08:00
Sergio Afonso
8fb685fb7e [MLIR][LLVM] Add explicit target_cpu attribute to llvm.func (#78287)
This patch adds the target_cpu attribute to llvm.func MLIR operations
and updates the translation to/from LLVM IR to match "target-cpu"
function attributes.
2024-01-17 14:55:02 +00:00
Matthias Springer
5fcf907b34 [mlir][IR] Rename "update root" to "modify op" in rewriter API (#78260)
This commit renames 4 pattern rewriter API functions:
* `updateRootInPlace` -> `modifyOpInPlace`
* `startRootUpdate` -> `startOpModification`
* `finalizeRootUpdate` -> `finalizeOpModification`
* `cancelRootUpdate` -> `cancelOpModification`

The term "root" is a misnomer. The root is the op that a rewrite pattern
matches against
(https://mlir.llvm.org/docs/PatternRewriter/#root-operation-name-optional).
A rewriter must be notified of all in-place op modifications, not just
in-place modifications of the root
(https://mlir.llvm.org/docs/PatternRewriter/#pattern-rewriter). The old
function names were confusing and have contributed to various broken
rewrite patterns.

Note: The new function names use the term "modify" instead of "update"
for consistency with the `RewriterBase::Listener` terminology
(`notifyOperationModified`).
2024-01-17 11:08:59 +01:00
Alex Zinenko
baa39b789b [mlir] fix wording in transform dialect docs
The wording "fails silently" has been sometimes used to indicate that a
silenceable failure was emitted by the operation. The meaning is exactly
the opposite: silenceable failure is _not_ silent unless silenced.
2024-01-17 09:20:02 +00:00
Aviad Cohen
d89a0a6594 [mlir][Tosa]: Add folder to ReciprocalOp of splat constant inputs (#78137) 2024-01-17 09:05:07 +02:00
Jacques Pienaar
8934b10642 [mlir][arith] Add overflow flags support to arith ops (#78376)
Add overflow flags support to the following ops:
* `arith.addi`
* `arith.subi`
* `arith.muli`

Example of new syntax:
```
%res = arith.addi %arg1, %arg2 overflow<nsw> : i64
```
Similar to existing LLVM dialect syntax
```
%res = llvm.add %arg1, %arg2 overflow<nsw> : i64
```

Tablegen canonicalization patterns updated to always drop flags, proper
support with tests will be added later.

Updated LLVMIR translation as part of this commit as it currenly written
in a way that it will crash when new attributes added to arith ops
otherwise.

Also lower `arith` overflow flags to corresponding SPIR-V op decorations

Discussion

https://discourse.llvm.org/t/rfc-integer-overflow-flags-support-in-arith-dialect/76025

This effectively rolls forward #77211, #77700 and #77714 while adding a
test to ensure the Python usage is not broken. More follow up needed but
unrelated to the core change here. The changes here are minimal and just
correspond to "textual namespacing" ODS side, no C++ or Python changes
were needed.

---------

---------

Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>, Yi Wu <yi.wu2@arm.com>
2024-01-17 06:12:23 +03:00
Tobias Gysi
bd26ce47c8 [mlir][llvm] Fix loop annotation parser (#78266)
This revision moves the ArrayRef field of the LoopAnnotation attribute
to the end of the struct to enable printing and parsing of the
attribute. Previously, the parsing could fail in the presence of a start
or end loc.
2024-01-16 16:35:52 +01:00
Matthias Springer
8f2d83da26 [mlir][bufferization] Add BufferizableOpInterface::hasTensorSemantics (#75273)
Add a new interface method to `BufferizableOpInterface`:
`hasTensorSemantics`. This method returns "true" if the op has tensor
semantics and should be bufferized.

Until now, we assumed that an op has tensor semantics if it has tensor
operands and/or tensor op results. However, there are ops like
`ml_program.global` that do not have any results/operands but must still
be bufferized (#75103). The new interface method can return "true" for
such ops.

This change also decouples `bufferization::bufferizeOp` a bit from the
func dialect.
2024-01-16 10:07:34 +01:00
Jacques Pienaar
f6ff7574a6 [mlir] Attribute add printStripped (#78008)
Enable printing without dialect wrapping.

This closely matches `AsmPrinter::printStrippedAttrOrType`
implementation wise except templating component.
2024-01-15 20:56:35 -08:00
Fabian Mora
5b4f2b906b [mlir][gpu] Add an offloading handler attribute to gpu.module (#78047)
This patch adds an optional offloading handler attribute to
the`gpu.module` op. This attribute will be used during
`gpu-module-to-binary` pass to override the offloading handler used in
the `gpu.binary` op.
2024-01-15 16:58:10 -05:00
Boian Petkantchin
5df2c00af3 [mlir][mesh] Remove rank attribute and rename dim_sizes to shape in ClusterOp (#77838)
Remove the somewhat redundant rank attribute.
Before this change
```
mesh.cluster @mesh(rank = 3, dim_sizes = 2x3)
```
After
```
mesh.cluster @mesh(shape = 2x3x?)
```

The rank is instead determined by the provided shape. With this change
no longer `getDimSizes()` can be wrongly assumed to have size equal to
the cluster rank.
Now `getShape().size()` will always equal `getRank()`.
2024-01-15 07:39:09 -08:00
Durgadoss R
dc01b597ba [MLIR][NVVM] Add support for aligned variants of cluster barriers (#78142)
This patch adds:
* Support for the 'aligned' variants of the cluster barrier Ops, by
extending the existing Op with an 'aligned' attribute.
* Docs for these Ops.
* Test cases to verify the lowering to the corresponding intrinsics.

Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
2024-01-15 14:52:30 +01:00
Guray Ozen
8dd0d95c7c [mlir][nvgpu] Add nvgpu.tma.async.store (#77811)
PR adds `nvgpu.tma.async.store` Op for asynchronous stores using the
Tensor Memory Access (TMA) unit.

It also implements Op lowering to NVVM dialect. The Op currently
performs asynchronous stores of a tile memory region from shared to
global memory for a single CTA.
2024-01-15 11:44:51 +01:00
Fabian Mora
a1eaed7a21 [mlir][gpu] Fix GPU YieldOP format and traits (#78006)
This patch adds assembly format to `gpu::YieldOp`. It also adds the
return like trait, to make it compatible with `RegionBranchOpInterface`.
2024-01-14 21:19:20 -05:00
Abhinav271828
850f713e80 [MLIR][Presburger] Helper functions to compute the constant term of a generating function (#77819)
We implement two functions that are needed to compute the constant term
of a GF.
One finds a vector not orthogonal to all the non-null vectors in a given
set.
One computes the coefficient of any term in an arbitrary rational
function (quotient of two polynomials).
2024-01-13 21:30:06 +05:30
Bharathi Ramana Joshi
66786a79d6 [MLIR][Presburger] Implement Matrix::moveColumns (#68362) 2024-01-13 18:51:26 +05:30
Benjamin Maxwell
5417a5fed6 [mlir][ArmSME] Add rudimentary support for tile spills to the stack (#76086)
This adds very basic (and inelegant) support for something like spilling
and reloading tiles, if you use more SME tiles than physically exist.

This is purely implemented to prevent the compiler from aborting if a
function uses too many tiles (i.e. due to bad unrolling), but is
expected to perform very poorly.

Currently, this works in two stages:

During tile allocation, if we run out of tiles instead of giving up, we
switch to allocating 'in-memory' tile IDs. These are tile IDs that start
at 16 (which is higher than any real tile ID). A warning will also be
emitted for each (root) tile op assigned an in-memory tile ID:

```
warning: failed to allocate SME virtual tile to operation, all tile operations will go through memory, expect degraded performance
```

Everything after this works like normal until `-convert-arm-sme-to-llvm`

Here the in-memory tile op:

```mlir
arm_sme.tile_op { tile_id = <IN MEMORY TILE> }
```

Is lowered to:

```mlir
// At function entry:
%alloca = memref.alloca ... : memref<?x?xty>

// Around the op:
// Swap the contents of %alloca and tile 0.
scf.for %slice_idx {
  %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}>
  "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx)  <{tile_id = 0 : i32}>
  vector.store %current_slice, %alloca[%slice_idx, %c0]
}
// Execute op using tile 0.
arm_sme.tile_op { tile_id = 0 }
// Swap the contents of %alloca and tile 0.
// This restores tile 0 to its original state.
scf.for %slice_idx {
  %current_slice = "arm_sme.intr.read.horiz" ... <{tile_id = 0 : i32}>
  "arm_sme.intr.ld1h.horiz"(%alloca, %slice_idx)  <{tile_id = 0 : i32}>
  vector.store %current_slice, %alloca[%slice_idx, %c0]
}
```

This is inserted during the lowering to LLVM as spilling/reloading
registers is a very low-level concept, that can't really be modeled
correctly at a high level in MLIR.

Note: This is always doing the worst case full-tile swap. This could be
optimized to only spill/load data the tile op will use, which could be
just a slice. It's also not making any use of liveness, which could
allow reusing tiles. But these is not seen as important as correct code
should only use the available number of tiles.
2024-01-12 14:51:47 +00:00
Guray Ozen
ae5d63924a [mlir][nvvm] Introduce cp.async.bulk.wait_group (#77917) 2024-01-12 14:16:38 +01:00
Oleksandr "Alex" Zinenko
2798b72ae7 [mlir] introduce debug transform dialect extension (#77595)
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
2024-01-12 13:24:02 +01:00
Matthias Springer
0a8e3dd432 [mlir][Interfaces] DestinationStyleOpInterface: Rename hasTensor/BufferSemantics (#77574)
Rename interface functions as follows:
* `hasTensorSemantics` -> `hasPureTensorSemantics`
* `hasBufferSemantics` -> `hasPureBufferSemantics`

These two functions return "true" if the op has tensor/buffer operands
but not buffer/tensor operands.

Also drop the "ranked" part from the interface, i.e., do not distinguish
between ranked/unranked types.

The new function names describe the functions more accurately. They also
align their semantics with the notion of "tensor semantics" with the
bufferization framework. (An op is supposed to be bufferized if it has
tensor operands, and we don't care if it also has memref operands.)

This change is in preparation of #75273, which adds
`BufferizableOpInterface::hasTensorSemantics`. By renaming the functions
in the `DestinationStyleOpInterface`, we can avoid name clashes between
the two interfaces.
2024-01-12 10:02:54 +01:00
MaheshRavishankar
aa2a96a24a [mlir][TilingInterface] Move TilingInterface tests to use transform dialect ops. (#77204)
In the process a couple of test transform dialect ops are added just
for testing. These operations are not intended to use as full flushed
out of transformation ops, but are rather operations added for testing.

A separate operation is added to `LinalgTransformOps.td` to convert a
`TilingInterface` operation to loops using the
`generateScalarImplementation` method implemented by the
operation. Eventually this and other operations related to tiling
using the `TilingInterface` need to move to a better place (i.e. out
of `Linalg` dialect)
2024-01-11 21:31:03 -08:00
Ivan Butygin
5f59b720a8 Revert "[mlir][arith] Add overflow flags support to arith ops (#77211)"
Temporarily reverting as it broke python bindings

This reverts commit a7262d2d9b.
2024-01-12 00:05:22 +01:00
Ivan Butygin
5afc4f3a5f Revert "[mlir][arith][nfc] Fix typos (#77700)"
Temporarily reverting as it broke python bindings

This reverts commit 9ed30012fb.
2024-01-12 00:05:21 +01:00
Valentin Clement (バレンタイン クレメン)
40f5f90507 [mlir][openacc][flang] Simplify gang, vector and worker representation (#77667)
The IR representation for gang, vector and worker has grown with the
support for device_type. This patch simplify the IR representation for
gang, vector and worker information on the acc.loop operation.

When the only the keyword is present without any values, the information
is printed at the same place than when there is values. The device_type
is omitted if there is no values and it is equal to None. Otherwise the
full information is displayed. First the keyword only device_type
information and then the values with their device_type.
2024-01-11 13:02:06 -08:00
Felix Schneider
061b777c82 [mlir][affine] Add dependency on UBDialect for PoisonAttr (#77691)
The folder for `AffineApplyOp` will try creating a `PoisonAttr`
under certain circumstances. However, this will result in a crash if the
`UBDialect` isn't loaded.

This patch adds a dependency of `AffineDialect` on `UBDialect`.
2024-01-11 19:52:15 +01:00
Mats Petersson
21e1bf2d00 Add more ZA modes (#77361)
Add more ZA modes
    
 Adds the arm_shared_za and arm_preserves_za attributes to the existing
 arm_new_za attribute. The functionality already exists in LLVM, so just
 "linking the pieces together".
    
For more details see:
https://arm-software.github.io/acle/main/acle.html#sme-attributes-relating-to-za
2024-01-11 18:49:52 +00:00
Matthias Springer
21aacb0b4c [mlir] Improve GreedyPatternRewriteDriver and pass documentation (#77614)
Clarify what kind of IR modifications are allowed. Also improve the
documentation of the greedy rewrite driver entry points.

Addressing comments in #76219.
2024-01-11 11:24:28 +01:00