Below is the original commit description. Furthermore, it applies a
[fix](33a26b9ca2)
for CMakeList.txt
The issue occurs during a downstream pass which does dialect conversion,
where both
[`FuncOpConversion`](cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480))
and
[`SubviewFolder`](cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187))
are run together. The original starting IR is:
```mlir
module {
func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
%subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
}
}
```
After `FuncOpConversion` runs, the IR looks like:
```mlir
"builtin.module"() ({
"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({
^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64):
%0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>
"func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> ()
}) : () -> ()
"func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({
}) : () -> ()
}) {llvm.data_layout = "", llvm.target_triple = ""} : () -> ()
```
The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate
unlinked block, which is disconnected from the rest of the IR (so not
only is the IR verifier-invalid, it can't even be parsed). This IR is
created by signature conversion in the dialect conversion infra.
Now `SubviewFolder` is applied, and the utility function here is called
on one of these disconnected block arguments, causing a crash.
The TestMemRefToLLVMWithTransforms pass is introduced to exercise the
bug, and it can be reused by other contributors in the future.
Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
The issue occurs during a downstream pass which does dialect conversion,
where both
[`FuncOpConversion`](cde67b6663/mlir/lib/Conversion/FuncToLLVM/FuncToLLVM.cpp (L480))
and
[`SubviewFolder`](cde67b6663/mlir/lib/Dialect/MemRef/Transforms/ExpandStridedMetadata.cpp (L187))
are run together. The original starting IR is:
```mlir
module {
func.func @foo(%arg0: memref<100x100xf32>, %arg1: index, %arg2: index, %arg3: index, %arg4: index) -> memref<?x?xf32, strided<[100, 1], offset: ?>> {
%subview = memref.subview %arg0[%arg1, %arg2] [%arg3, %arg4] [1, 1] : memref<100x100xf32> to memref<?x?xf32, strided<[100, 1], offset: ?>>
return %subview : memref<?x?xf32, strided<[100, 1], offset: ?>>
}
}
```
After `FuncOpConversion` runs, the IR looks like:
```mlir
"builtin.module"() ({
"llvm.func"() <{CConv = #llvm.cconv<ccc>, function_type = !llvm.func<struct<(ptr, ptr, i64, array<2 x i64>, array<2 x i64>)> (ptr, ptr, i64, i64, i64, i64, i64, i64, i64, i64, i64)>, linkage = #llvm.linkage<external>, sym_name = "foo", visibility_ = 0 : i64}> ({
^bb0(%arg0: !llvm.ptr, %arg1: !llvm.ptr, %arg2: i64, %arg3: i64, %arg4: i64, %arg5: i64, %arg6: i64, %arg7: i64, %arg8: i64, %arg9: i64, %arg10: i64):
%0 = "memref.subview"(<<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>, <<UNKNOWN SSA VALUE>>) <{operandSegmentSizes = array<i32: 1, 2, 2, 0>, static_offsets = array<i64: -9223372036854775808, -9223372036854775808>, static_sizes = array<i64: -9223372036854775808, -9223372036854775808>, static_strides = array<i64: 1, 1>}> : (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>
"func.return"(%0) : (memref<?x?xf32, strided<[100, 1], offset: ?>>) -> ()
}) : () -> ()
"func.func"() <{function_type = (memref<100x100xf32>, index, index, index, index) -> memref<?x?xf32, strided<[100, 1], offset: ?>>, sym_name = "foo"}> ({
}) : () -> ()
}) {llvm.data_layout = "", llvm.target_triple = ""} : () -> ()
```
The `<<UNKNOWN SSA VALUE>>`'s here are block arguments of a separate
unlinked block, which is disconnected from the rest of the IR (so not
only is the IR verifier-invalid, it can't even be parsed). This IR is
created by signature conversion in the dialect conversion infra.
Now `SubviewFolder` is applied, and the utility function here is called
on one of these disconnected block arguments, causing a crash.
The TestMemRefToLLVMWithTransforms pass is introduced to exercise the
bug, and it can be reused by other contributors in the future.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
Co-authored-by: Rahul Kayaith <rkayaith@gmail.com>
The class "ClauseVal" actually represents a definition of an enumeration
value, and in itself it is not bound to any clause. Rename it to EnumVal
and add a comment clarifying how it's translated into an actual enum
definition in the generated source code.
There is no change in functionality.
This fixes Tosa VariableOp to align with spec 1.0
- add var_shape attribute to store shape of variable type
- change type attribute to store element type of variable type
- add a builder so previous construction calls still work
- fix up level check of rank to be on variable type instead of initial
value which is optional
- add level check of size for variable type
- add lit tests for variable op's without initial values
- add lit test for variable op with fixed rank but unknown dimension
- add invalid lit test for variable op with unranked type
Signed-off-by: Tai Ly <tai.ly@arm.com>
Adds a few canonicalizers, folders, and rewrite patterns to tensor ops:
* tensor.insert folder: insert into a constant is replaced with a new
constant
* tensor.extract folder: extract from a parent tensor that was inserted
at the same indices is folded into the inserted value
* rewrite pattern added that replaces an extract of a collapse shape
with an extract of the source tensor (requires static source dimensions)
Signed-off-by: Asra Ali <asraa@google.com>
The main idea behind the change is to allow expand-of-collapse folds for
reshapes like `?x?xk` -> `?` (k>1). The rationale here is that the
expand op must have a coherent index/affine expression specified in its
`output_shape` argument (see example below), and if it doesn't, the IR
has already been invalidated at an earlier stage:
```
%c32 = arith.constant 32 : index
%div = arith.divsi %<some_index>, %c32 : index
%collapsed = tensor.collapse_shape %41#1 [[0], [1, 2], [3, 4]]
: tensor<9x?x32x?x32xf32> into tensor<9x?x?xf32>
%affine = affine.apply affine_map<()[s0] -> (s0 * 32)> ()[%div]
%expanded = tensor.expand_shape %collapsed [[0], [1, 2], [3]] output_shape [9, %div, 32, %affine]
: tensor<9x?x?xf32> into tensor<9x?x32x?xf32>
```
On the above assumption, adjust the routine in
`getReassociationIndicesForCollapse()` to allow dynamic reshapes beyond
just `?x..?x1x1x..x1` -> `?`. Dynamic subshapes introduce two kinds of
issues:
1. n>2 consecutive dynamic dimensions in the source shape cannot be
collapsed together into 1<k<n neighboring dynamic dimensions in the
target shape, since there'd be more than one suitable reassociation
(example: `?x?x10x? into ?x?`)
2. When figuring out static subshape reassociations based on products,
there are cases where a static dimension is collapsed with a dynamic
one, and should therefore be skipped when comparing products of source &
target dimensions (e.g. `?x2x3x4 into ?x12`)
To address 1, we should detect such sequences in the target shape before
assigning multiple dynamic dimensions into the same index set. For 2, we
take note that a static target dimension was preceded by a dynamic one
and allow an "offset" subshape of source static dimensions, as long as
there's an exact sequence for the target size later in the source shape.
This PR aims to address all reshapes that can be determined based purely
on shapes (and original reassociation
maps, as done in
`ComposeExpandOfCollapseOp::findCollapsingReassociation)`. It doesn't
seem possible to fold all qualifying dynamic shape patterns in a
deterministic way without looking into affine expressions
simultaneously. That would be difficult to maintain in a single general
utility, so a path forward would be to provide dialect-specific
implementations for Linalg/Tensor.
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
---------
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
Co-authored-by: Ian Wood <ianwood2024@u.northwestern.edu>
This patch wraps `populateLowerContractionToSMMLAPatternPatterns` into a
new TD Op `apply_patterns.arm_neon.vector_contract_to_i8mm` .
It also removes the "test-lower-to-arm-neon" pass.
Adds a check to make sure that the linalg op is safe to erase by
ensuring that the `linalg.yield` is yielding one of the linalg op's
block args. This check already exists for linalg ops with pure tensor
semantics.
Closes https://github.com/llvm/llvm-project/issues/129414
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
This PR introduces the initial implementation of a blocking pass for
XeGPU programs. The pass leverages unroll patterns from both the XeGPU
and Vector dialects.
---------
Co-authored-by: Adam Siemieniuk <adam.siemieniuk@intel.com>
Example:
```mlir
%0 = vector.extract %source[0, 0] : i8 from vector<1x2xi8>
%1 = vector.extract %source[0, 1] : i8 from vector<1x2xi8>
%2 = vector.from_elements %0, %1 : vector<2xi8>
```
becomes
```mlir
%2 = vector.shape_cast %source : vector<1x2xi8> to vector<2xi8>
```
It was decided that we should spill canonicalization tests into new
files (see
[discussion](https://github.com/llvm/llvm-project/pull/135096#pullrequestreview-2760245596))
In view of this I added the new tests to a new file specifically for
canonicalization of from_elements. To be consistent in the location of
the tests, I moved existing tests `extract_scalar_from_from_element`,
`extract_1d_from_from_elements`, `extract_2d_from_from_elements` and
`from_elements_to_splat` from `canonicalize.mlir` to
`canonicalze/vector-from-elements.mlir`. In addition to moving I changed
the LIT variables to all be upper-case for consistency.
With
ffb9bbfd07,
memref.assume_alignment op returns a result value. The revision updates
the tests to reflect the change:
- Update all the lit tests to use the result of memref.assume_alignment,
if it is present.
- Capture the result of the op in lit tests.
---------
Signed-off-by: hanhanW <hanhan0912@gmail.com>
`vector<t>` types are not compatible with the LLVM type system – with
the current approach employed within `LLVMTypeConverter`, they must be
explicitly converted into `vector<1xt>` when lowering. Employ this rule
within the conversion patterns for intrinsics that are handled directly
within `MathToLLVM`: `math.ctlz` `.cttz`, `.absi`, `.expm1`, `.log1p`,
`.rsqrt`, `.isnan`, `.isfinite`.
This change does not cover/test patterns that are based off
`VectorConvertToLLVMPattern` template from `LLVMCommon/VectorPattern.h`.
---------
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
COND_IF's simplified form - where redundant operand notations are
omitted - is not conformant to the specification. According to the
specification, all operands passed into an operation must be explicitly
declared at each operation's structure. Add optional check to verify if
the given form complies with the specification.
Previously the dialects registered were fixed per LSP binary. This works
as long as all the dialects of interest from the different projects
across which one uses the LSP, are disjoint. This expands this to
support cases where there are dialects that overlap in dialect name but
usage of these are separate wrt projects. The alternative is multiple
binaries and switching LSP used in editor per project (there is some
extra complexity in hosted instances).
This handles a simple (I believe common case) where one can determine
based on path and have single binary - the cost of dynamically doing so
based on path would be either keeping different registries to return or
repopulating dialect & extension maps.
Now that `Property` is a `PropConstraint`, hook it up to the same
constraint-uniquing machinery that other types of constraints use. This
will primarily save on code size for types, like enums, that have
inherent constraints which are shared across many operations.
This patch adds support for the -mprefer-vector-width= command line
option. The parsing of this options is equivalent to Clang's and it is
implemented by setting the "prefer-vector-width" function attribute.
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Moves all the remaining Linalg vectorization tests from:
* `mlir/tests/Dialect/Linalg/*`
to:
* `mlir/tests/Dialect/Linalg/vectorization/*`
To maintain consistency within tests, `vectorize-convolution.mlir`
was updated to use:
* `transform.structured.vectorize_children_and_apply_patterns`
instead of:
* `-test-linalg-transform-patterns=test-linalg-to-vector-patterns`
This change required minor updates to some `CHECK` lines, reflecting
only reordering of ops due to an additional pattern being applied.
Closes#141025
We already have hasOneUse. Like llvm::Value we provide helper methods to
query the number of uses of a Value. Add unittests for Value, because
that was missing.
---------
Co-authored-by: Michael Maitland <michaelmaitland@meta.com>
Fixes the final reduction steps which were taken from an implementation
of scan, not reduction, causing lanes earlier in the wave to have
incorrect results due to masking.
Now aligning more closely with triton implementation :
https://github.com/triton-lang/triton/pull/5019
# Hypothetical example
To provide an explanation of the issue with the current implementation,
let's take the simple example of attempting to perform a sum over 64
lanes where the initial values are as follows (first lane has value 1,
and all other lanes have value 0):
```
[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
When performing a sum reduction over these 64 lanes, in the current
implementation we perform 6 dpp instructions which in sequential order
do the following:
1) sum over clusters of 2 contiguous lanes
2) sum over clusters of 4 contiguous lanes
3) sum over clusters of 8 contiguous lanes
4) sum over an entire row
5) broadcast the result of last lane in each row to the next row and
each lane sums current value with incoming value.
5) broadcast the result of the 32nd lane to last two rows and each lane
sums current value with incoming value.
After step 4) the result for the example above looks like this:
```
[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
After step 5) the result looks like this:
```
[2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
```
After step 6) the result looks like this:
```
[4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
```
Note that the correct value here is always 1, yet after the
`dpp.broadcast` ops some lanes have incorrect values. The reason is that
for these incorrect lanes, like lanes 0-15 in step 5, the
`dpp.broadcast` op doesn't provide them incoming values from other
lanes. Instead these lanes are provided either their own values, or 0
(depending on whether `bound_ctrl` is true or false) as values to sum
over, either way these values are stale and these lanes shouldn't be
used in general.
So what this means:
- For a subgroup reduce over 32 lanes (like Step 5), the correct result
is stored in lanes 16 to 31
- For a subgroup reduce over 64 lanes (like Step 6), the correct result
is stored in lanes 32 to 63.
However in the current implementation we do not specifically read the
value from one of the correct lanes when returning a final value. In
some workloads it seems without this specification, the stale value from
the first lane is returned instead.
# Actual failing test
For a specific example of how the current implementation causes issues,
take a look at the IR below which represents an additive reduction over
a dynamic dimension.
```
!matA = tensor<1x?xf16>
!matB = tensor<1xf16>
#map = affine_map<(d0, d1) -> (d0, d1)>
#map1 = affine_map<(d0, d1) -> (d0)>
func.func @only_producer_fusion_multiple_result(%arg0: !matA) -> !matB {
%cst_1 = arith.constant 0.000000e+00 : f16
%c2_i64 = arith.constant 2 : i64
%0 = tensor.empty() : !matB
%2 = linalg.fill ins(%cst_1 : f16) outs(%0 : !matB) -> !matB
%4 = linalg.generic {indexing_maps = [#map, #map1], iterator_types = ["parallel", "reduction"]} ins(%arg0 : !matA) outs(%2 : !matB) {
^bb0(%in: f16, %out: f16):
%7 = arith.addf %in, %out : f16
linalg.yield %7 : f16
} -> !matB
return %4 : !matB
}
```
When provided an input of type `tensor<1x2xf16>` and values `{0, 1}` to
perform the reduction over, the value returned is consistently 4. By the
same analysis done above, this shows that the returned value is coming
from one of these stale lanes and needs to be read instead from one of
the lanes storing the correct result.
Signed-off-by: Muzammiluddin Syed <muzasyed@amd.com>
- Introduced `gpu.subgroup_mma_extract` operation to extract values from
`!gpu.mma_matrix` by invocation and indices.
- Introduced `gpu.subgroup_mma_insert` operation to insert values into
`!gpu.mma_matrix` by invocation and indices.
- Updated the conversion patterns to SPIR-V for both extract and insert
operations.
- Added test cases to validate the new operations in the GPU to SPIR-V
conversion.
RFC:
https://discourse.llvm.org/t/rfc-add-gpu-operations-to-permute-data-in-2-loaded-mma-matrix/86148?u=hsiangkai
This PR fixes a crash that curently happens given the following input:
```fortran
subroutine caller()
real :: x
integer :: i
!$omp target
x = i
call callee(x,x)
!$omp end target
endsubroutine caller
subroutine callee(x1,x2)
real :: x1, x2
endsubroutine callee
```
The crash happens because the following sequence of events is taken by
the `OMPIRBuilder`:
1. ....
2. An outlined function for the target region is created. At first the
outlined function still refers to the SSA values from the original
function of the target region.
3. The builder then iterates over the users of SSA values used in the
target region to replace them with the corresponding function arguments
of outlined function.
4. If the same instruction references the SSA value more than once (say
m), all uses of that SSA value are replaced in the instruction. Deleting
all m uses of the value.
5. The next m-1 iterations will still iterate over the same instruction
dropping the last m-1 actual users of the value.
Hence, we collect all users first before modifying them.
The PR continues the work started in #141019 by adding the `BufferizationState` class also to the `getBufferType` and `resolveConflicts` interface methods, together with the additional support functions that are used throughout the bufferization infrastructure.
This patch moves scalable vectorization tests into an existing generic
vectorization test file:
* vectorization-scalable.mlir --> merged into vectorization.mlir
Rationale:
* Most tests in vectorization-scalable.mlir are variants of existing
tests in vectorization.mlir. Keeping them together improves
maintainability.
* Consolidating tests makes it easier to spot gaps in coverage for
regular vectorization.
* In the Vector dialect, we don't separate tests for scalable vectors;
this change aligns Linalg with that convention.
Notable changes beyond moving tests:
* Updated one of the two matrix-vector multiplication tests to use
`linalg.matvec` instead of `linalg.generic`. CHECK lines remain
unchanged.
* Simplified the lone `linalg.index` test by removing an unnecessary
`tensor.extract`. Also removed canonicalization patterns from the
TD sequence for consistency with other tests.
This patch contributes to the implementation of #141025 — please refer
to that ticket for full context.
asin, acos, atan, and atan2 were being lowered to libm calls instead of
llvm intrinsics. Add the conversion patterns to handle these intrinsics
and update tests to expect this.
This patch updates `CombineContractBroadcastMask` to inherit from
`MaskableOpRewritePattern`, enabling it to handle masked
`vector.contract` operations. The pattern rewrites:
```mlir
%a = vector.broadcast %a_bc
%res vector.contract %a_bc, %b, ...
```
into:
```mlir
// Move the broadcast into vector.contract (by updating the indexing
// maps)
%res vector.contract %a, %b, ...
```
The main challenge is supporting cases where the pattern drops a leading
unit dimension. For example:
```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
%arg0 : vector<8x4xi32>,
%arg1 : vector<8x4xi32>,
%arg2 : vector<8x8xi32>,
%mask: vector<1x8x8x4xi1>) -> vector<8x8xi32> {
%0 = vector.broadcast %arg0 : vector<8x4xi32> to vector<1x8x4xi32>
%1 = vector.broadcast %arg1 : vector<8x4xi32> to vector<1x8x4xi32>
%result = vector.mask %mask {
vector.contract {
indexing_maps = [#map0, #map1, #map2],
iterator_types = ["reduction", "parallel", "parallel", "reduction"],
kind = #vector.kind<add>
} %0, %1, %arg2 : vector<1x8x4xi32>, vector<1x8x4xi32> into vector<8x8xi32>
} : vector<1x8x8x4xi1> -> vector<8x8xi32>
return %result : vector<8x8xi32>
}
```
Here, the leading unit dimension is dropped. To handle this, the mask is
cast to the correct shape using a `vector.shape_cast`:
```mlir
func.func @contract_broadcast_unit_dim_reduction_masked(
%arg0: vector<8x4xi32>,
%arg1: vector<8x4xi32>,
%arg2: vector<8x8xi32>,
%arg3: vector<1x8x8x4xi1>) -> vector<8x8xi32> {
%mask_sc = vector.shape_cast %arg3 : vector<1x8x8x4xi1> to vector<8x8x4xi1>
%res = vector.mask %mask_sc {
vector.contract {
indexing_maps = [#map, #map1, #map2],
iterator_types = ["parallel", "parallel", "reduction"],
kind = #vector.kind<add>
} %arg0, %arg1, %mask_sc : vector<8x4xi32>, vector<8x4xi32> into vector<8x8xi32>
} : vector<8x8x4xi1> -> vector<8x8xi32>
return %res : vector<8x8xi32>
}
```
While this isn't ideal - since it introduces a `vector.shape_cast` that
must be cleaned up later - it reflects the best we can do once the input
reaches `CombineContractBroadcastMask`. A more robust solution may
involve simplifying the input earlier. I am leaving that as a TODO for
myself to explore this further. Posting this now to unblock downstream
work.
LIMITATIONS
Currently, this pattern assumes:
* Only leading dimensions are dropped in the mask.
* All dropped dimensions must be unit-sized.
Integer attributes supplied to `emitc.call_opaque` as arguments were
treated as index into the operands list. This should be the case only
for the normal arguments but not for the template arguments which can't
refer to SSA values. This commit updates the handling of template
arguments in mlir-to-cpp by removing special handling of integer
attributes.
Along with the changes to rescale op attributes, commit 7208649 dropped
the builtin casts between signed and signless types. However, explicitly
unsigned types are still legal input and output values from the TOSA IR
perspective.
The change adds back the casts when the unsigned<->signless semantics
are explicit in the underlying tensor types. This prevents the
conversion routine from trying to generate illegal `arith` casts that
are constrained to signless types. Whether the `arith` casts themselves
are signed or unsigned should still depend on the rescale's `*_unsigned`
attribute values.
---------
Signed-off-by: Artem Gindinson <gindinson@roofline.ai>
This patch adds an Op for the TMA prefetch
(non-tensor) variant. llvm-lit tests are added
to verify the lowering to the intrinsics.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
Reverts #140730 - that turned out not to be an NFC as we originally
thought. See the attached test for an example. Many thanks to @Garra1980
for reporting!
Note, without this change, the newly added test would be incorrectly
converted to:
```mlir
func.func @view_memref_as_rank0(%arg0: index, %arg1: memref<2xi8>) {
%0 = llvm.mlir.poison : !llvm.struct<(ptr, ptr, i64)>
return
}
```
Fixes#108161
This PR adds a decorateType method for MatrixType, ensuring that
`spirv.matrix` with offset in `spirv.struct` can be handled correctly.
Signed-off-by: MingZhu Yan <yanmingzhu@iscas.ac.cn>