ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch replaces {} with std::nullopt.
This revision adds a pass working on FunctionOpInterface to connect recently introduced AffineMin/Max simplification patterns.
Additionally fixes some minor issues that have surfaced upon larger scale testing.
This patch enforces a restriction in the Vector dialect: the non-indexed
operands of `vector.insert` and `vector.extract` must no longer be 0-D
vectors. In other words, rank-0 vector types like `vector<f32>` are
disallowed as the source or result.
EXAMPLES
--------
The following are now **illegal** (note the use of `vector<f32>`):
```mlir
%0 = vector.insert %v, %dst[0, 0] : vector<f32> into vector<2x2xf32>
%1 = vector.extract %src[0, 0] : vector<f32> from vector<2x2xf32>
```
Instead, use scalars as the source and result types:
```mlir
%0 = vector.insert %v, %dst[0, 0] : f32 into vector<2x2xf32>
%1 = vector.extract %src[0, 0] : f32 from vector<2x2xf32>
```
Note, this change serves three goals. These are summarised below.
## 1. REDUCED AMBIGUITY
By enforcing scalar-only semantics when the result (`vector.extract`)
or source (`vector.insert`) are rank-0, we eliminate ambiguity
in interpretation. Prior to this patch, both `f32` and `vector<f32>`
were accepted.
## 2. MATCH IMPLEMENTATION TO DOCUMENTATION
The current behaviour contradicts the documented intent. For example,
`vector.extract` states:
> Degenerates to an element type if n-k is zero.
This patch enforces that intent in code.
## 3. ENSURE SYMMETRY BETWEEN INSERT AND EXTRACT
With the stricter semantics in place, it’s natural and consistent to
make `vector.insert` behave symmetrically to `vector.extract`, i.e.,
degenerate the source type to a scalar when n = 0.
NOTES FOR REVIEWERS
-------------------
1. Main change is in "VectorOps.cpp", where stricter type checks are
implemented.
2. Test updates in "invalid.mlir" and "ops.mlir" are minor cleanups to
remove now-illegal examples.
2. Lowering changes in "VectorToSCF.cpp" are the main trade-off: we now
require an additional `vector.extract` when a preceding
`vector.transfer_read` generates a rank-0 vector.
RELATED RFC
-----------
*
https://discourse.llvm.org/t/rfc-should-we-restrict-the-usage-of-0-d-vectors-in-the-vector-dialect
This commit adds extra state to `ConversionPatternRewriterImpl` to store
all modified / newly-created operations and moved / newly-created blocks
in separate lists on a per-pattern basis.
This is in preparation of the One-Shot Dialect Conversion refactoring:
the new driver will no longer maintain a list of all IR rewrites, so
information about newly-created operations (which is needed to trigger
recursive legalization) must be retained in a different data structure.
This commit is also expected to improve the performance of the existing
driver. The previous implementation iterated over all new IR
modifications and then filtered them by type. It also required an
additional pointer indirection (through `std::unique_ptr<IRRewrite>`) to
retrieve the operation/block pointer.
This thread through proper error handling / reporting capabilities to
avoid hitting llvm_unreachable while parsing linalg ops.
Fixes#132755Fixes#132740Fixes#129185
For consumer fusion cases of this form
```
%0:2 = scf.forall .. shared_outs(%arg0 = ..., %arg0 = ...) {
tensor.parallel_insert_slice ... into %arg0
tensor.parallel_insert_slice ... into %arg1
}
%1 = linalg.generic ... ins(%0#0, %0#1)
```
the current consumer fusion that handles one slice at a time cannot fuse
the consumer into the loop, since fusing along one slice will create and
SSA violation on the other use from the `scf.forall`. The solution is to
allow consumer fusion to allow considering multiple slices at once. This
PR changes the `TilingInterface` methods related to consumer fusion,
i.e.
- `getTiledImplementationFromOperandTile`
- `getIterationDomainFromOperandTile`
to allow fusion while considering multiple operands. It is upto the
`TilingInterface` implementation to return an error if a list of tiles
of the operands cannot result in a consistent implementation of the
tiled operation.
The Linalg operation implementation of `TilingInterface` has been
modified to account for these changes and allow cases where operand
tiles that can result in a consistent tiling implementation are handled.
---------
Signed-off-by: MaheshRavishankar <mahesh.ravishankar@gmail.com>
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch migrates away from std::nullopt in favor of ArrayRef<T>()
where we use perfect forwarding. Note that {} would be ambiguous for
perfect forwarding to work.
Refactors the `@hoist_vector_transfer_pairs` test function in
`hoisting.mlir` into smaller, focused test functions - each covering a
specific `vector.transfer_read`/`vector.transfer_write` pair.
This makes it easier to identify which edge cases are tested, spot
duplication, and write more targeted and readable check lines, with less
surrounding noise.
This refactor also helped identify some issues with the original
`@hoist_vector_transfer_pairs` test:
* Input variables `%val` and `%cmp` were unused.
* There were no check lines for reads from `memref5`.
**Note for reviewers (current and future):**
This PR is split into small, incremental, and self-contained commits. It
should be easier to follow the changes by reviewing those commits
individually, rather than reading the full squashed diff. However, this
will be merged as a single commit to avoid adding unnecessary history
noise in-tree.
https://github.com/llvm/llvm-project/pull/144657 added #include
"mlir/Dialect/Linalg/TransformOps/LinalgTransformOps.h" to
mlir/test/lib/Dialect/Linalg/TestLinalgTransforms.cpp, though it is not
in use
This include introduces a dependency for LinalgTransforms on
LinalgTransformOps, which is unspecified in the module dependencies, and
would produce a cyclic dependency if it were specified.
The include is unused in WinogradConv2D.cpp, so this change removes it.
As a follow-up to PR #135841 (see discussion for background), this patch
removes the `ConvertIllegalShapeCastOpsToTransposes` pattern from the SME
legalization pass. This change unblocks folding for ShapeCastOp involving
scalable vectors.
Originally, the `ConvertIllegalShapeCastOpsToTransposes` pattern was introduced
to rewrite certain `vector.shape_cast` ops that could not be lowered otherwise.
Based on local end-to-end testing, this workaround is no longer required, and
the pattern can now be safely removed.
This patch also removes a special case from `ShapeCastOp::fold`, simplifying
the fold logic.
As a side effect of removing `ConvertIllegalShapeCastOpsToTransposes`, we lose
the mechanism that enabled lowering of certain ops like:
```mlir
%res = vector.transfer_read %mem[%a, %b] (...) :
memref<?x?xf32>, vector<[4]x1xf32>
```
Previously, such cases were handled by:
* Rewriting a nearby `vector.shape_cast` to a `vector.transpose` (via
`ConvertIllegalShapeCastOpsToTransposes`)
* Then lowering the result with `LiftIllegalVectorTransposeToMemory`.
This patch introduces a new dedicated pattern,
`LowerColumnTransferReadToLoops`, that directly handles illegal
`vector.transfer_read` ops involving leading scalable dimensions.
Removes the Debug... prefix on the ops in tablegen, in line with pretty
much all other Transform-dialect extension ops. This means that the ops
in Python look like
`debug.EmitParamAsRemarkOp`/`debug.emit_param_as_remark` instead of
`debug.DebugEmitParamAsRemarkOp`/`debug.debug_emit_param_as_remark`.
When comparing the type of the initializer in a `memref::GlobalOp`
against its result only consider the element type and the shape. Other
attributes such as memory space should be ignored since comparing these
between tensors and memrefs doesn't make sense and constructing a memref
in a specific memory space with a tensor that has no such attribute
should be valid.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
This patch adds a mask operand to elect.sync explicitly.
When provided, this overrides the default value of 0xffffffff.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
We only support fixed set of minimum filtering algorithm for Winograd
Conv2D decomposition. Instead of letting users specify any integer,
define a fixed set of enumeration values for the parameters of minimum
filtering algorithm.
Please see the following program.
```
module test_0
INTEGER :: sp = 1
!$omp declare target link(sp)
end module test_0
program main
use test_0
integer :: new_len
!$omp target map(tofrom:new_len) map(tofrom:sp)
new_len = sp
!$omp end target
print *, new_len
print *, sp
end program
```
When compiled with
`flang -g -O0 -fopenmp --offload-arch=gfx1100`
will fail the compilation with the following error:
`dbg attachment points at wrong subprogram for function`
The reason is that with the `link` clause on `!$omp declare target`, an
extra load instruction is inserted. But the debug location was not
updated before insertion which caused an invalid location to be attached
to the instruction.
CodeExtractor can currently erroneously insert an alloca into a
different function than it inserts its users into, in cases where code
is being extracted out of a function that has already been outlined. Add
an assertion that the two blocks being inserted into are actually in the
same function.
Add a check to findAllocaInsertPoint in OpenMP to LLVMIR translation to
prevent the aforementioned scenario from happening.
OpenMPIRBuilder relies on a callback mechanism to fix-up a module later
on during the finaliser step. In some cases this results in the module
being invalid prior to the finalise step running. Remove calls to
verifyModule wrapped in LLVM_DEBUG from CodeExtractor, as the presence
of those results in the compiler crashing with -mllvm -debug due to
premature module verification where it would not crash without -debug.
Call ompBuilder->finalize() the end of mlir::translateModuleToLLVMIR, in
order to make sure the module has actually been finalized prior to
trying to verify it.
Resolves https://github.com/llvm/llvm-project/issues/138102.
---------
Signed-off-by: Kajetan Puchalski <kajetan.puchalski@arm.com>
When no offload targets are specified flang will avoid offloading for
"target" constructs, but not "target data" constructs. This patch makes
the behavior consistent across all offload-related operations.
While ignoring "target" may produce semantically incorrect code, it may
still be a useful debugging tool.
--
This reinstates commits 6ba1955 and 349f8d6, reverted due to compilation
failures in the gfortran test suite. These build problems were caused by
an unrelated issue (https://github.com/llvm/llvm-project/issues/145558)
which is now fixed.
Ref: https://github.com/llvm/llvm-project/pull/144534
This patch adds a transform of `transfer_read` operation to change the
vector type to one that can be mapped to an LLVM type. This is done by
collapsing trailing dimensions so we obtain a vector type with a single
trailing scalable dimension.
The vector type allows element types that implement the
`VectorElementTypeInterface`. `vector.splat` should allow any element
type that is supported by the vector type.
[MLIR] Fix circular dependency introduced in In
https://github.com/llvm/llvm-project/pull/144897. This PR is to break
the dependency. by moving StateStack to IR folder
This commit resolves a circular dependency issue between mlir/Support
and mlir/IR:
- Move StateStack.h and StateStack.cpp from Support to IR folder
- Update CMakeLists.txt files to reflect the new locations
- Update Bazel BUILD file to maintain correct dependencies
- Update includes in affected files (flang, Target/LLVMIR)
The circular dependency was caused by StateStack.h depending on
IR/Visitors.h
while other IR files depended on Support. Moving StateStack to IR
eliminates
this cycle while maintaining proper separation of concerns.
Given a TARGET DATA construct with USE_DEVICE_PTR(x) and IF(FALSE), the
compiler will crash if `x` was used in the body. The cause of the crash
is that the MLIR->LLVM codegen tries to look up the translated value of
x, but one had not been mapped.
Given an IF clause, the translation will generate an if-then-else
construct, with the "else" block corresponding to the false condition,
i.e. the host device playing the role of the target device. In that
block, still process the USE_DEVICE_ADDR/USE_DEVICE_PTR clauses, which
will cause the translation mappings to be created.
Fixes https://github.com/llvm/llvm-project/issues/145558
Goal: Enable using C++ classes to AOT compile models for MLGO.
This commit introduces a transformation pass that converts standalone
`emitc.func` operations into `emitc.class `structures to support
class-based C++ code generation for MLGO.
Transformation details:
- Wrap `emitc.func @func_name` into `emitc.class @Myfunc_nameClass`
- Converts function arguments to class fields with preserved attributes
- Transforms function body into an `execute()` method with no arguments
- Replaces argument references with `get_field` operations
Before: emitc.func @Model(%arg0, %arg1, %arg2) with direct argument
access
After: emitc.class with fields and execute() method using get_field
operations
This enables generating C++ classes that can be instantiated and
executed as self-contained model objects for AOT compilation workflows.
Updates the linalg::vectorize function to return a
`FailureOr<VectorizationResult>` containing the values to replace the
original operation, instead of directly replacing the original
operation. This aligns better with the style of transforms used with the
TilingInterface, and gives more control to users over the lowering,
since it allows for additional transformation of the IR before
replacement.
There was already a `VectorizationResult` defined, which was used for
the internal vectorize implementation using `CustomVectorizationHook`s,
so the old struct is renamed to `VectorizationHookResult`.
Note for integration: The replacement of the original operation is now
the responsibility of the caller, so wherever `linalg::vectorize` is
used, the caller must also do
`rewriter.replaceOp(vectorizeResults->replacements)`.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
The previous code was effectively that a symbol is dead if was not
nested in sequence of SymbolTables. But one can have operations that one
cannot delete/DCE that refers to symbols which one could delete which
resulted in symbol-dce deleting symbols that are still referenced and
the resulting IR being invalid. This changes it so that all operations
inside non SymbolTable op are considered to find nested SymbolTable ops.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
The OpenACC data clause operations have been updated to support the
OpenACC 3.4 data clause modifiers. This includes ensuring verifiers
check that only supported ones are used on relevant operations.
In order to support a seamless update from encoding the modifiers in the
data clause to this attribute, the following considerations were made:
- Ensure that modifier builders which do not take modifier are still
available.
- All data clause enum values are left in place until a complete
transition is made to the new modifiers.
ArrayRef has a constructor that accepts std::nullopt. This
constructor dates back to the days when we still had llvm::Optional.
Since the use of std::nullopt outside the context of std::optional is
kind of abuse and not intuitive to new comers, I would like to move
away from the constructor and eventually remove it.
This patch migrates away from TypeRagne(std::nullopt) and
ValueRange(std::nullopt).
This patch fixes a bug discovered in the
`affine::makeComposedFoldedAffineApply` function when `composeAffineMin
== true`. The bug happened because the simplification assumed the
symbols appearing in the `affine.apply` op corresponded to symbols in
the `affine.min` op, and that's not always the case. For example:
```mlir
#map = affine_map<()[s0, s1] -> (s1)>
#map1 = affine_map<()[s0, s1] -> (s0 ceildiv s1)>
module {
func.func @min_max_full_simplify() -> index {
%0 = test.value_with_bounds {max = 64 : index, min = 32 : index}
%1 = test.value_with_bounds {max = 64 : index, min = 32 : index}
%2 = affine.min #map()[%0, %1]
%3 = affine.apply #map1()[%2, %0]
return %3 : index
}
}
```
This patch also introduces the test `make_composed_folded_affine_apply`
transform operation to test this simplification. It also adds tests
ensuring we get correct behavior.
---------
Co-authored-by: Nicolas Vasilache <nico.vasilache@amd.com>