Currently, `simplifyMul()` asserts that either `lhs` or `rhs` is
symbolic or constant. This method is called by the overloaded `*`
operator for `AffineExpr`s which leads to a crash when building a
multiplication expression where neither operand is symbolic or constant.
This patch returns a `nullptr` from `simplifyMul()` to signal that the
expression could not be simplified instead.
Fix https://github.com/llvm/llvm-project/issues/75770
Support WalkResult for AffineExpr walk and support interrupting walks
along the lines of Operation::walk. This allows interrupted walks when a
condition is met. Also, switch from std::function to llvm::function_ref
for the walk function.
Loop peeling is not beneficial if the step size already divides "ub -
lb". There are currently some simple checks to prevent peeling in such
cases when lb, ub, step are constants. This commit adds support for IR
that is the result of loop peeling in the general case; i.e., lb, ub,
step do not necessarily have to be constants.
This change adds a new affine_map simplification rule for semi-affine
maps that appear during loop peeling and are guaranteed to evaluate to a
constant zero. Affine maps such as:
```
(1) affine_map<()[ub, step] -> ((ub - ub mod step) mod step)
(2) affine_map<()[ub, lb, step] -> ((ub - (ub - lb) mod step - lb) mod step)
(3) ^ may contain additional summands
```
Other affine maps with modulo expressions are not supported by the new
simplification rule.
This fixes#71469.
Move out trivial affine expression simplification out of AffineOps
library. Expose it from libIR. Users of such methods shouldn't have to
rely on the AffineOps dialect. For eg., with this change, the method can
be used now from lib/Analysis/ (FlatLinearConstraints) as well as
AffineOps dialect canonicalization.
This way those one won't need to depend on AffineOps for some
simplification of affine expressions.
This change refactors some of the utilities used to unroll larger vector
computations into smaller vector computations. In fact, the indexing
computations used here are rather generic and are useful in other dialects or
downstream projects. Therefore, a utility for iterating over all possible tile
offsets for a particular pair of static (shape, tiled shape) is introduced in
IndexingUtils and replaces the existing computations in the vector unrolling
transformations. This builds off of the refactoring of IndexingUtils introduced
in 203fad476b.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150000
The new class hierarchy is as follows:
* `IntegerRelation` (no change)
* `IntegerPolyhedron` (no change)
* `FlatLinearConstraints`: provides an AffineExpr-based API
* `FlatLinearValueConstraints`: stores an additional mapping of non-local vars to SSA values
* `FlatAffineValueConstraints`: provides additional helper functions for Affine dialect ops
* `FlatAffineRelation` (no change)
`FlatConstraints` and `FlatValueConstraints` are moved from `MLIRAffineAnalysis` to `MLIRAnalysis` and can be used without depending on the Affine dialect.
This change is in preparation of D145681, which adds an MLIR interface that depends on `FlatConstraints` (and cannot depend on the Affine dialect or any other dialect).
Differential Revision: https://reviews.llvm.org/D146201
The largest known divisor for expressions like (32 * d0 + 32, 128)
ceildiv 8 wasn't being computed tightly; a conservative value of 1 was
being returned. Address this. This leads to a broad improvement for
several affine analyses and rewrites that depend on knowing whether
something is a multiple of a specific constant or such largest known
constant.
Differential Revision: https://reviews.llvm.org/D140185
Set proper offset to the second element of the index pair when either
lhs or rhs of a local expression is a dimensional identifier, so that
we do not have same index values for more than one local expression.
Reviewed By: springerm, hanchung
Differential Revision: https://reviews.llvm.org/D137389
Set proper offset to the second element of the index pair, so that
we do not have same index values for more than one local expression.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D137062
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
Add rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D112985
For the semi affine expressions, whenever rhs of a floordiv, ceildiv, mod
or product expression is a symbolic expression, we introduce a local variable
representing the result, and store the floordiv/ceildiv, mod or product
affine expression in LocalExprs. In this way the expression is flattened,
and trivial addition and subtraction related simplifications are performed.
Also rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Differential Revision: https://reviews.llvm.org/D112808
Fix AffineExpr `getLargestKnownDivisor` for ceil/floor div cases.
In these cases, nothing can be inferred on the divisor of the
result.
Add test case for `mod` as well.
Differential Revision: https://reviews.llvm.org/D112523
It is the case that, for all positive a and b such that b divides a
(e mod (a * b)) mod b = e mod b. For example, ((d0 mod 35) mod 5) can
be simplified to (d0 mod 5), but ((d0 mod 35) mod 4) cannot be simplified
further (x = 36 is a counterexample).
This change enables more complex simplifications. For example,
((d0 * 72 + d1) mod 144) mod 9 can now simplify to (d0 * 72 + d1) mod 9
and thus to d1 mod 9. Expressions with chained modulus operators are
reasonably common in tensor applications, and this change _should_
improve code generation for such expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109930
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
The AffineMap in the MemRef inferred by SubViewOp may have uncompressed symbols which result in type mismatch on otherwise unused symbols. Make the computation of the AffineMap compress those unused symbols which results in better canonical types.
Additionally, improve the error message to report which inferred type was expected.
Differential Revision: https://reviews.llvm.org/D96551
In prehistorical times, AffineApplyOp was allowed to produce multiple values.
This allowed the creation of intricate SSA use-def chains.
AffineApplyNormalizer was originally introduced as a means of reusing the AffineMap::compose method to write SSA use-def chains.
Unfortunately, symbols that were produced by an AffineApplyOp needed to be promoted to dims and reordered for the mathematical composition to be valid.
Since then, single result AffineApplyOp became the law of the land but the original assumptions were not revisited.
This revision revisits these assumptions and retires AffineApplyNormalizer.
Differential Revision: https://reviews.llvm.org/D94920
This greatly simplifies a large portion of the underlying infrastructure, allows for lookups of singleton classes to be much more efficient and always thread-safe(no locking). As a result of this, the dialect symbol registry has been removed as it is no longer necessary.
For users broken by this change, an alert was sent out(https://llvm.discourse.group/t/removing-kinds-from-attributes-and-types) that helps prevent a majority of the breakage surface area. All that should be necessary, if the advice in that alert was followed, is removing the kind passed to the ::get methods.
Differential Revision: https://reviews.llvm.org/D86121
This allows for bucketing the different possible storage types, with each bucket having its own allocator/mutex/instance map. This greatly reduces the amount of lock contention when multi-threading is enabled. On some non-trivial .mlir modules (>300K operations), this led to a compile time decrease of a single conversion pass by around half a second(>25%).
Differential Revision: https://reviews.llvm.org/D82596
This revision adds a folding pattern to replace affine.min ops by the actual min value, when it can be determined statically from the strides and bounds of enclosing scf loop .
This matches the type of expressions that Linalg produces during tiling and simplifies boundary checks. For now Linalg depends both on Affine and SCF but they do not depend on each other, so the pattern is added there.
In the future this will move to a more appropriate place when it is determined.
The canonicalization of AffineMinOp operations in the context of enclosing scf.for and scf.parallel proceeds by:
1. building an affine map where uses of the induction variable of a loop
are replaced by `%lb + %step * floordiv(%iv - %lb, %step)` expressions.
2. checking if any of the results of this affine map divides all the other
results (in which case it is also guaranteed to be the min).
3. replacing the AffineMinOp by the result of (2).
The algorithm is functional in simple parametric tiling cases by using semi-affine maps. However simplifications of such semi-affine maps are not yet available and the canonicalization does not succeed yet.
Differential Revision: https://reviews.llvm.org/D82009
Simplify semi-affine expression for the operations like ceildiv,
floordiv and modulo by any given symbol by checking divisibilty by that
symbol.
Some properties used in simplification are:
1) Commutative property of the floordiv and ceildiv:
((expr1 floordiv expr2) floordiv expr3 ) = ((expr1 floordiv expr3) floordiv expr2)
((expr1 ceildiv expr2) ceildiv expr3 ) = ((expr1 ceildiv expr3) ceildiv expr2)
While simplification if operations are different no simplification is
possible as there is no property that simplify expressions like these:
((expr1 ceildiv expr2) floordiv expr3) or ((expr1 floordiv expr2)
ceildiv expr3).
2) If both expr1 and expr2 are divisible by the expr3 then:
(expr1 % expr2) / expr3 = ((expr1 / expr3) % (expr2 / expr3))
where / is divide symbol.
3) If expr1 is divisible by expr2 then expr1 % expr2 = 0.
Signed-off-by: Yash Jain <yash.jain@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D84920
This commit adds functionality needed for implementation of convolutions with
linalg.generic op. Since linalg.generic right now expects indexing maps to be
just permutations, offset indexing needed in convolutions is not possible.
Therefore in this commit we address the issue by adding support for symbols inside
indexing maps which enables more advanced indexing. The upcoming commit will
solve the problem of computing loop bounds from such maps.
Differential Revision: https://reviews.llvm.org/D83158
Summary:
This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).
While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.
Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:
1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
2. implicit and eager discovery of dims and symbols when parsing
3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)
A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.
For the following "Tensor Comprehension-inspired" string:
```
def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}
```
With -gen-ods-decl=1, this emits (modulo formatting):
```
def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
NInputs<2>,
NOutputs<1>,
NamedStructuredOpTraits]> {
let arguments = (ins Variadic<LinalgOperand>:$views);
let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
let extraClassDeclaration = [{
llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
void regionBuilder(ArrayRef<BlockArgument> args);
}];
let hasFolder = 1;
}
```
With -gen-ods-impl, this emits (modulo formatting):
```
llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
{
MLIRContext *context = getContext();
AffineExpr d0, d1, d2, d3;
bindDims(context, d0, d1, d2, d3);
return SmallVector<AffineMap, 8>{
AffineMap::get(4, 0, {d0, d1, d3}),
AffineMap::get(4, 0, {d3, d2}),
AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
using namespace edsc;
using namespace intrinsics;
ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
ValueHandle _4 = std_mulf(_0, _1);
ValueHandle _5 = std_addf(_2, _4);
(linalg_yield(ValueRange{ _5 }));
}
```
Differential Revision: https://reviews.llvm.org/D77067
Summary:
Looks like a refactor that was never completed.
This change removes some unused and ambiguous definitions.
Reviewed By: bondhugula, nicolasvasilache, rriddle
Differential Revision: https://reviews.llvm.org/D75586
Add one more simplification for floordiv and mod affine expressions.
Examples:
(2*d0 + 1) floordiv 2 is simplified to d0
(8*d0 + 4*d1 + d2) floordiv 4 simplified to 4*d0 + d1 + d2 floordiv 4.
etc.
Similarly, (4*d1 + 1) mod 2 is simplified to 1,
(2*d0 + 8*d1) mod 8 simplified to 2*d0 mod 8.
Change getLargestKnownDivisor to return int64_t to be consistent and
to avoid casting at call sites (since the return value is used in expressions
of int64_t/index type).
Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>
Closestensorflow/mlir#202
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/202 from bondhugula:affine b13fcb2f1c00a39ca5434613a02408e085a80e77
PiperOrigin-RevId: 284866710
- fix store to load forwarding for a certain set of cases (where
forwarding shouldn't have happened); use AffineValueMap difference
based MemRefAccess equality checking; utility logic is also greatly
simplified
- add missing equality/inequality operators for AffineExpr ==/!= ints
- add == != operators on MemRefAccess
Closestensorflow/mlir#136
COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/136 from bondhugula:store-load-forwarding d79fd1add8bcfbd9fa71d841a6a9905340dcd792
PiperOrigin-RevId: 270457011
This CL refactors tiling to enable tiling of views that are not just specified by a simple permutation. This allows the tiling of convolutions for which a new example is added.
PiperOrigin-RevId: 256346028