Fixes an issue where the `SimpleAffineExprFlattener` would simplify
`lhs % rhs` to just `-(lhs floordiv rhs)` instead of
`lhs - (lhs floordiv rhs)`
if `lhs` happened to be equal to `lhs floordiv rhs`.
The reported failure case was
`(d0, d1) -> (((d1 - (d1 + 2)) floordiv 8) % 8)`
from https://github.com/llvm/llvm-project/issues/114654.
Note that many paths that simplify AffineMaps (e.g. the AffineApplyOp
folder and canonicalization) would not observe this bug because of
of slightly different paths taken by the code. Slightly different
grouping of the terms could also result in avoiding the bug.
Resolves https://github.com/llvm/llvm-project/issues/114654.
my prove:
we can simple `(n * s) ceildiv a ceildiv s` to `n ceildiv a`
because `(n * s) ceildiv a ceildiv b` <=> `(n * s) ceildiv s ceildiv a`
<=> `n ceildiv a`
let's prove the `s floordiv a floor b` <=> `s floordiv b floor a`
let `s = ka +m (m < a)` so `s floordiv a` <=> `s / a - m / a`
similarly, it can be proven that:
`s floordiv a floordiv b` <=> `s / (a * b) - m / (a * b) - n / (b) constrain (n < b)`
<=> `s / (a * b) - (m + a*n) / (a*b)`
because `a* b - (m + a*n)` <=> `a*b - a*n - m` > `a - m` > `0`
so `s floordiv a floordiv b` <=> `[s / (a*b)]` <=> `s floordiv b floordiv a`
but if `s floordiv b` mutiply a factor above didn't always hold true.
Fixes https://github.com/llvm/llvm-project/issues/107508
Currently, this is rewritten to d0 mod -c. However, we do not support
modulo with a negative RHS in our lowering passes, so this triggers
undefined behavior.
It would be better to not have these ad hoc simplifications at all, but
I guess that ship has sailed.
5221634 (Do not trigger UB during AffineExpr parsing) noticed that
divideCeilSigned and divideFloorSigned would overflow when Numerator =
INT_MIN, and Denominator = -1. This observation has already been made by
DynamicAPInt, and it has code to check this. To avoid checks in multiple
callers, centralize this query in MathExtras, and change
divideCeilSigned/divideFloorSigned to assert on overflow.
Currently, we only fold if the RHS is a positive constant. There doesn't
seem to be a good reason to do that. The comment claims that division by
negative values is undefined, but I suspect that was just copied over
from the `mod` simplifier.
Currently, parsing expressions that are undefined will trigger UB during
compilation (e.g. `9223372036854775807 * 2`). This change instead
leaves the expressions as they were.
This change is an NFC for compilations that did not previously involve
UB.
This patch adds support for computing bounds for semi-affine mod
expression to FlatLinearConstraints. This is then enabled within the
ScalableValueBoundsConstraintSet to allow computing the bounds of
scalable remainder loops.
E.g. computing the bound of something like:
```
// `1000 mod s0` is a semi-affine.
#remainder_start_index = affine_map<()[s0] -> (-(1000 mod s0) + 1000)>
#remaining_iterations = affine_map<(d0) -> (-d0 + 1000)>
%0 = affine.apply #remainder_start_index()[%c8_vscale]
scf.for %i = %0 to %c1000 step %c8_vscale {
%remaining_iterations = affine.apply #remaining_iterations(%i)
// The upper bound for the remainder loop iterations should be:
// %c8_vscale - 1 (expressed as an affine map,
// affine_map<()[s0] -> (s0 * 8 - 1)>, where s0 is vscale)
%bound = "test.reify_bound"(%remaining_iterations) <{scalable, ...}>
}
```
There are caveats to this implementation. To be able to add a bound for
a `mod` we need to assume the rhs is positive (> 0). This may not be
known when adding the bounds for the `mod` expression. So to handle this
a constraint is added for `rhs > 0`, this may later be found not to hold
(in which case the constraints set becomes empty/invalid).
This is not a problem for computing scalable bounds where it's safe to
assume `s0` is vscale (or some positive multiple of it). But this may
need to be considered when enabling this feature elsewhere (to ensure
correctness).
Currently, `simplifyMul()` asserts that either `lhs` or `rhs` is
symbolic or constant. This method is called by the overloaded `*`
operator for `AffineExpr`s which leads to a crash when building a
multiplication expression where neither operand is symbolic or constant.
This patch returns a `nullptr` from `simplifyMul()` to signal that the
expression could not be simplified instead.
Fix https://github.com/llvm/llvm-project/issues/75770
Support WalkResult for AffineExpr walk and support interrupting walks
along the lines of Operation::walk. This allows interrupted walks when a
condition is met. Also, switch from std::function to llvm::function_ref
for the walk function.
Loop peeling is not beneficial if the step size already divides "ub -
lb". There are currently some simple checks to prevent peeling in such
cases when lb, ub, step are constants. This commit adds support for IR
that is the result of loop peeling in the general case; i.e., lb, ub,
step do not necessarily have to be constants.
This change adds a new affine_map simplification rule for semi-affine
maps that appear during loop peeling and are guaranteed to evaluate to a
constant zero. Affine maps such as:
```
(1) affine_map<()[ub, step] -> ((ub - ub mod step) mod step)
(2) affine_map<()[ub, lb, step] -> ((ub - (ub - lb) mod step - lb) mod step)
(3) ^ may contain additional summands
```
Other affine maps with modulo expressions are not supported by the new
simplification rule.
This fixes#71469.
Move out trivial affine expression simplification out of AffineOps
library. Expose it from libIR. Users of such methods shouldn't have to
rely on the AffineOps dialect. For eg., with this change, the method can
be used now from lib/Analysis/ (FlatLinearConstraints) as well as
AffineOps dialect canonicalization.
This way those one won't need to depend on AffineOps for some
simplification of affine expressions.
This change refactors some of the utilities used to unroll larger vector
computations into smaller vector computations. In fact, the indexing
computations used here are rather generic and are useful in other dialects or
downstream projects. Therefore, a utility for iterating over all possible tile
offsets for a particular pair of static (shape, tiled shape) is introduced in
IndexingUtils and replaces the existing computations in the vector unrolling
transformations. This builds off of the refactoring of IndexingUtils introduced
in 203fad476b.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D150000
The new class hierarchy is as follows:
* `IntegerRelation` (no change)
* `IntegerPolyhedron` (no change)
* `FlatLinearConstraints`: provides an AffineExpr-based API
* `FlatLinearValueConstraints`: stores an additional mapping of non-local vars to SSA values
* `FlatAffineValueConstraints`: provides additional helper functions for Affine dialect ops
* `FlatAffineRelation` (no change)
`FlatConstraints` and `FlatValueConstraints` are moved from `MLIRAffineAnalysis` to `MLIRAnalysis` and can be used without depending on the Affine dialect.
This change is in preparation of D145681, which adds an MLIR interface that depends on `FlatConstraints` (and cannot depend on the Affine dialect or any other dialect).
Differential Revision: https://reviews.llvm.org/D146201
The largest known divisor for expressions like (32 * d0 + 32, 128)
ceildiv 8 wasn't being computed tightly; a conservative value of 1 was
being returned. Address this. This leads to a broad improvement for
several affine analyses and rewrites that depend on knowing whether
something is a multiple of a specific constant or such largest known
constant.
Differential Revision: https://reviews.llvm.org/D140185
Set proper offset to the second element of the index pair when either
lhs or rhs of a local expression is a dimensional identifier, so that
we do not have same index values for more than one local expression.
Reviewed By: springerm, hanchung
Differential Revision: https://reviews.llvm.org/D137389
Set proper offset to the second element of the index pair, so that
we do not have same index values for more than one local expression.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D137062
llvm::sort is beneficial even when we use the iterator-based overload,
since it can optionally shuffle the elements (to detect
non-determinism). However llvm::sort is not usable everywhere, for
example, in compiler-rt.
Reviewed By: nhaehnle
Differential Revision: https://reviews.llvm.org/D130406
Add rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D112985
For the semi affine expressions, whenever rhs of a floordiv, ceildiv, mod
or product expression is a symbolic expression, we introduce a local variable
representing the result, and store the floordiv/ceildiv, mod or product
affine expression in LocalExprs. In this way the expression is flattened,
and trivial addition and subtraction related simplifications are performed.
Also rule based matching for detecting and transforming "expr - q * (expr floordiv q)"
to "expr mod q", where q is a symbolic exxpression, in simplifyAdd function.
Differential Revision: https://reviews.llvm.org/D112808
Fix AffineExpr `getLargestKnownDivisor` for ceil/floor div cases.
In these cases, nothing can be inferred on the divisor of the
result.
Add test case for `mod` as well.
Differential Revision: https://reviews.llvm.org/D112523
It is the case that, for all positive a and b such that b divides a
(e mod (a * b)) mod b = e mod b. For example, ((d0 mod 35) mod 5) can
be simplified to (d0 mod 5), but ((d0 mod 35) mod 4) cannot be simplified
further (x = 36 is a counterexample).
This change enables more complex simplifications. For example,
((d0 * 72 + d1) mod 144) mod 9 can now simplify to (d0 * 72 + d1) mod 9
and thus to d1 mod 9. Expressions with chained modulus operators are
reasonably common in tensor applications, and this change _should_
improve code generation for such expressions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D109930
Simplify affine.min ops, enabling various other canonicalizations inside the peeled loop body.
affine.min ops such as:
```
map = affine_map<(d0)[s0, s1] -> (s0, -d0 + s1)>
%r = affine.min #affine.min #map(%iv)[%step, %ub]
```
are rewritten them into (in the case the peeled loop):
```
%r = %step
```
To determine how an affine.min op should be rewritten and to prove its correctness, FlatAffineConstraints is utilized.
Differential Revision: https://reviews.llvm.org/D107222
The AffineMap in the MemRef inferred by SubViewOp may have uncompressed symbols which result in type mismatch on otherwise unused symbols. Make the computation of the AffineMap compress those unused symbols which results in better canonical types.
Additionally, improve the error message to report which inferred type was expected.
Differential Revision: https://reviews.llvm.org/D96551
In prehistorical times, AffineApplyOp was allowed to produce multiple values.
This allowed the creation of intricate SSA use-def chains.
AffineApplyNormalizer was originally introduced as a means of reusing the AffineMap::compose method to write SSA use-def chains.
Unfortunately, symbols that were produced by an AffineApplyOp needed to be promoted to dims and reordered for the mathematical composition to be valid.
Since then, single result AffineApplyOp became the law of the land but the original assumptions were not revisited.
This revision revisits these assumptions and retires AffineApplyNormalizer.
Differential Revision: https://reviews.llvm.org/D94920
This greatly simplifies a large portion of the underlying infrastructure, allows for lookups of singleton classes to be much more efficient and always thread-safe(no locking). As a result of this, the dialect symbol registry has been removed as it is no longer necessary.
For users broken by this change, an alert was sent out(https://llvm.discourse.group/t/removing-kinds-from-attributes-and-types) that helps prevent a majority of the breakage surface area. All that should be necessary, if the advice in that alert was followed, is removing the kind passed to the ::get methods.
Differential Revision: https://reviews.llvm.org/D86121
This allows for bucketing the different possible storage types, with each bucket having its own allocator/mutex/instance map. This greatly reduces the amount of lock contention when multi-threading is enabled. On some non-trivial .mlir modules (>300K operations), this led to a compile time decrease of a single conversion pass by around half a second(>25%).
Differential Revision: https://reviews.llvm.org/D82596
This revision adds a folding pattern to replace affine.min ops by the actual min value, when it can be determined statically from the strides and bounds of enclosing scf loop .
This matches the type of expressions that Linalg produces during tiling and simplifies boundary checks. For now Linalg depends both on Affine and SCF but they do not depend on each other, so the pattern is added there.
In the future this will move to a more appropriate place when it is determined.
The canonicalization of AffineMinOp operations in the context of enclosing scf.for and scf.parallel proceeds by:
1. building an affine map where uses of the induction variable of a loop
are replaced by `%lb + %step * floordiv(%iv - %lb, %step)` expressions.
2. checking if any of the results of this affine map divides all the other
results (in which case it is also guaranteed to be the min).
3. replacing the AffineMinOp by the result of (2).
The algorithm is functional in simple parametric tiling cases by using semi-affine maps. However simplifications of such semi-affine maps are not yet available and the canonicalization does not succeed yet.
Differential Revision: https://reviews.llvm.org/D82009
Simplify semi-affine expression for the operations like ceildiv,
floordiv and modulo by any given symbol by checking divisibilty by that
symbol.
Some properties used in simplification are:
1) Commutative property of the floordiv and ceildiv:
((expr1 floordiv expr2) floordiv expr3 ) = ((expr1 floordiv expr3) floordiv expr2)
((expr1 ceildiv expr2) ceildiv expr3 ) = ((expr1 ceildiv expr3) ceildiv expr2)
While simplification if operations are different no simplification is
possible as there is no property that simplify expressions like these:
((expr1 ceildiv expr2) floordiv expr3) or ((expr1 floordiv expr2)
ceildiv expr3).
2) If both expr1 and expr2 are divisible by the expr3 then:
(expr1 % expr2) / expr3 = ((expr1 / expr3) % (expr2 / expr3))
where / is divide symbol.
3) If expr1 is divisible by expr2 then expr1 % expr2 = 0.
Signed-off-by: Yash Jain <yash.jain@polymagelabs.com>
Differential Revision: https://reviews.llvm.org/D84920
This commit adds functionality needed for implementation of convolutions with
linalg.generic op. Since linalg.generic right now expects indexing maps to be
just permutations, offset indexing needed in convolutions is not possible.
Therefore in this commit we address the issue by adding support for symbols inside
indexing maps which enables more advanced indexing. The upcoming commit will
solve the problem of computing loop bounds from such maps.
Differential Revision: https://reviews.llvm.org/D83158
Summary:
This revision adds a tool that generates the ODS and C++ implementation for "named" Linalg ops according to the [RFC discussion](https://llvm.discourse.group/t/rfc-declarative-named-ops-in-the-linalg-dialect/745).
While the mechanisms and language aspects are by no means set in stone, this revision allows connecting the pieces end-to-end from a mathematical-like specification.
Some implementation details and short-term decisions taken for the purpose of bootstrapping and that are not set in stone include:
1. using a "[Tensor Comprehension](https://arxiv.org/abs/1802.04730)-inspired" syntax
2. implicit and eager discovery of dims and symbols when parsing
3. using EDSC ops to specify the computation (e.g. std_addf, std_mul_f, ...)
A followup revision will connect this tool to tablegen mechanisms and allow the emission of named Linalg ops that automatically lower to various loop forms and run end to end.
For the following "Tensor Comprehension-inspired" string:
```
def batch_matmul(A: f32(Batch, M, K), B: f32(K, N)) -> (C: f32(Batch, M, N)) {
C(b, m, n) = std_addf<k>(std_mulf(A(b, m, k), B(k, n)));
}
```
With -gen-ods-decl=1, this emits (modulo formatting):
```
def batch_matmulOp : LinalgNamedStructured_Op<"batch_matmul", [
NInputs<2>,
NOutputs<1>,
NamedStructuredOpTraits]> {
let arguments = (ins Variadic<LinalgOperand>:$views);
let results = (outs Variadic<AnyRankedTensor>:$output_tensors);
let extraClassDeclaration = [{
llvm::Optional<SmallVector<StringRef, 8>> referenceIterators();
llvm::Optional<SmallVector<AffineMap, 8>> referenceIndexingMaps();
void regionBuilder(ArrayRef<BlockArgument> args);
}];
let hasFolder = 1;
}
```
With -gen-ods-impl, this emits (modulo formatting):
```
llvm::Optional<SmallVector<StringRef, 8>> batch_matmul::referenceIterators() {
return SmallVector<StringRef, 8>{ getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getParallelIteratorTypeName(),
getReductionIteratorTypeName() };
}
llvm::Optional<SmallVector<AffineMap, 8>> batch_matmul::referenceIndexingMaps()
{
MLIRContext *context = getContext();
AffineExpr d0, d1, d2, d3;
bindDims(context, d0, d1, d2, d3);
return SmallVector<AffineMap, 8>{
AffineMap::get(4, 0, {d0, d1, d3}),
AffineMap::get(4, 0, {d3, d2}),
AffineMap::get(4, 0, {d0, d1, d2}) };
}
void batch_matmul::regionBuilder(ArrayRef<BlockArgument> args) {
using namespace edsc;
using namespace intrinsics;
ValueHandle _0(args[0]), _1(args[1]), _2(args[2]);
ValueHandle _4 = std_mulf(_0, _1);
ValueHandle _5 = std_addf(_2, _4);
(linalg_yield(ValueRange{ _5 }));
}
```
Differential Revision: https://reviews.llvm.org/D77067