Disallows initialization of scalable vectors with an attribute of
arbitrary values, e.g.:
```mlir
%c = arith.constant dense<[0, 1]> : vector<[2] x i32>
```
Initialization using vector splats remains allowed (i.e. when all the
init values are identical):
```mlir
%c = arith.constant dense<[1, 1]> : vector<[2] x i32>
```
Note: This is a re-upload of #86178
Disallows initialization of scalable vectors with an attribute of
arbitrary values, e.g.:
```mlir
%c = arith.constant dense<[0, 1]> : vector<[2] x i32>
```
Initialization using vector splats remains allowed (i.e. when all the
init values are identical):
```mlir
%c = arith.constant dense<[1, 1]> : vector<[2] x i32>
```
This commit adds a `ValueBoundsOpInterface` implementation for
`arith.select`. The implementation is almost identical to `scf.if`
(#85895), but there is one special case: if the condition is a shaped
value, the selection is applied element-wise and the result shape can be
inferred from either operand.
Note: This is a re-upload of #86383.
The C++ standard does not specify an evaluation order for addition/...
operands. E.g., in `a() + b()`, the compiler is free to evaluate `a` or
`b` first.
This lead to different `mlir-opt` outputs in #85895. (FileCheck passed
when compiled with LLVM but failed when compiled with gcc.)
In scalable code it is very common to have constant multiples of vscale,
e.g. `4 * vscale`. This updates `arith.muli` to pretty print the result
name in cases like this, so `4 * vscale` would be `%c4_vscale`.
This makes reading IR dumps of scalable code a little nicer.
This commit adds a `ValueBoundsOpInterface` implementation for
`arith.select`. The implementation is almost identical to `scf.if`
(#85895), but there is one special case: if the condition is a shaped
value, the selection is applied element-wise and the result shape can be
inferred from either operand.
This commit changes the API of `ValueBoundsConstraintSet`: the stop
condition is now passed to the constructor instead of `processWorklist`.
That makes it easier to add items to the worklist multiple times and
process them in a consistent manner. The current
`ValueBoundsConstraintSet` is passed as a reference to the stop
function, so that the stop function can be defined before the the
`ValueBoundsConstraintSet` is constructed.
This change is in preparation of adding support for branches.
`CeilDivUIOp` seemed to have been added by mistake to the list of
dynamically
illegal operations in `arith-unsigned-when-equivalent`. The only illegal
operations
should be the signed operations that can be converted to their unsigned
counterpart.
Add rounding mode attribute to `arith`. This attribute can be used in
different FP `arith` operations to control rounding mode. Rounding modes
correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding.
As this is not supported in dialects other than LLVM, conversion should fail for
now in case this attribute is present.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
This commit adds the `BufferViewFlowOpInterface` to the bufferization
dialect. This interface can be implemented by ops that operate on
buffers to indicate that a buffer op result and/or region entry block
argument may be the same buffer as a buffer operand (or a view thereof).
This interface is queried by the `BufferViewFlowAnalysis`.
The new interface has two interface methods:
* `populateDependencies`: Implementations use the provided callback to
declare dependencies between operands and op results/region entry block
arguments. E.g., for `%r = arith.select %c, %m1, %m2 : memref<5xf32>`,
the interface implementation should declare two dependencies: %m1 -> %r
and %m2 -> %r.
* `mayBeTerminalBuffer`: An SSA value is a terminal buffer if the buffer
view flow analysis stops at the specified value. E.g., because the value
is a newly allocated buffer or because no further information is
available about the origin of the buffer.
Ops that implement the `RegionBranchOpInterface` or `BranchOpInterface`
do not have to implement the `BufferViewFlowOpInterface`. The buffer
dependencies can be inferred from those two interfaces.
This commit makes the `BufferViewFlowAnalysis` more accurate. For
unknown ops, it conservatively used to declare all combinations of
operands and op results/region entry block arguments as dependencies
(false positives). This is no longer the case. While the analysis is
still a "maybe" analysis with false positives (e.g., when analyzing ops
such as `arith.select` or `scf.if` where the taken branch is not known
at compile time), results and region entry block arguments of unknown
ops are now marked as terminal buffers.
This commit addresses a TODO in `BufferViewFlowAnalysis.cpp`:
```
// TODO: We should have an op interface instead of a hard-coded list of
// interfaces/ops.
```
It is no longer needed to hard-code ops.
This PR adds promised interface declarations for all interfaces declared
in `InitAllDialects.h`.
Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
Because `arith.select` does not propagate poison of the second or third
operand depending on the condition, some canonicalization patterns are
currently incorrect. This patch removes these incorrect patterns, and
adds a new pattern to fix the case of `i1` select with constants.
Patterns that are removed:
* select(predA, select(predB, x, y), y) => select(and(predA, predB), x,
y)
* select(predA, select(predB, y, x), y) => select(and(predA,
not(predB)), x, y)
* select(predA, x, select(predB, x, y)) => select(or(predA, predB), x,
y)
* select(predA, x, select(predB, y, x)) => select(or(predA, not(predB)),
x, y)
* arith.select %arg, %x, %y : i1 => and(%arg, %x) or and(!%arg, %y)
Pattern that is added:
* select(pred, false, true) => not(pred) for i1
The first two patterns are incorrect when `predB` is poison and `predA`
is false, as a non-poison `y` gets compiled to `poison`. The next two
patterns are incorrect when `predB` is poison and `predA` is true, as a
non-poison `x` gets compiled to `poison`. The last pattern is incorrect
as it propagates poison from all operands afer compilation.
This lowering was not correctly handling the case where saturation of
the mantissa results in an increase of the exponent value. The new code
borrows, with credit, the idea from
e1502c0cdb/c10/util/BFloat16.h (L60-L79)
and adds comments to explain the magic trick going on here and why it's
correct. Hat tip to its original author, whom I believe to be
@Maratyszcza.
A testcase was also requiring a tie to be broken upwards in a case where
"to nearest-even" required going downward. The fact that it used to pass
suggests that there was another bug in the old code.
This op is the inverse of all-gather. It is useful to have an explicit
concise representation instead of having a blob of slicing logic.
Add lowering for the op that slices from the tensor based on the
in-group process index.
Make resharding generate an all-slice instead of inserting the slicing
logic directly.
Collection of changes with the goal of being able to convert `encoding`
to `memorySpace` during bufferization
- new API for encoder to allow implementation to select destination
memory space
- update existing bufferization implementations to support the new
interface
* Use APFloat conversion function to avoid losing information by
converting to `double`. This would be the case with large types like
`f80` or `f128`.
* Check for potential information loss. This is intended for small
floating point types that may have values not present in larger ones
(e.g., f8m2e5fnuz and f16).
* Support folding vector constants.
Specifying an enum case of an enum attr currently requires the use of
either `NativeCodeCall` or a `ConstantAttr` specifying the full C++ name
of the enum case. The disadvantages of both are less readable code due
to including C++ expressions and very few checks of any kind, creating
C++ code that does not compile instead.
This PR adds `ConstantEnumCase`, a kind of `ConstantAttr` which
automatically derives the correct value representation from a given enum
and the string representation of an enum case. It supports both
`EnumAttrInfo`s (enums wrapping `IntegerAttr`) and `EnumAttr` (proper
dialect attributes). It even supports bit-enums, allowing one to list
multiple enum cases and have them be combined. If an enum case is not
found, an assertion is triggered with a proper error message.
Besides the tests, it was also used to simplify DRR patterns in the
arith dialect.
Many machine-learning applications (and most software written at AMD)
expect the operation that truncates floats to 8-bit floats to be
saturatinng. That is, they expect `truncf 256.0 : f32 to f8E4M3FNUZ` to
yield `240.0`, not `NaN`, and similarly for negative numbers. However,
the underlying hardware instruction that can be used for this truncation
implements overflow-to-NaN semantics.
To enable handling this usecase, we add the saturate-fp8-truncf option
to ArithToAMDGPU (off by default), which causes the requisite clamping
code to be emitted. Said clamping code ensures that Inf and NaN are
passed through exactly (and thus trancate to NaN).
Per review feedback, this commit efactors
createScalarOrSplatConstant() to the Arith dialect utilities and uses
it in this code. It also fixes naming of existing patterns and
switches from vector.extractelement/insertelement to
vector.extract/insert.
Add overflow flags support to the following ops:
* `arith.addi`
* `arith.subi`
* `arith.muli`
Example of new syntax:
```
%res = arith.addi %arg1, %arg2 overflow<nsw> : i64
```
Similar to existing LLVM dialect syntax
```
%res = llvm.add %arg1, %arg2 overflow<nsw> : i64
```
Tablegen canonicalization patterns updated to always drop flags, proper
support with tests will be added later.
Updated LLVMIR translation as part of this commit as it currenly written
in a way that it will crash when new attributes added to arith ops
otherwise.
Also lower `arith` overflow flags to corresponding SPIR-V op decorations
Discussion
https://discourse.llvm.org/t/rfc-integer-overflow-flags-support-in-arith-dialect/76025
This effectively rolls forward #77211, #77700 and #77714 while adding a
test to ensure the Python usage is not broken. More follow up needed but
unrelated to the core change here. The changes here are minimal and just
correspond to "textual namespacing" ODS side, no C++ or Python changes
were needed.
---------
---------
Co-authored-by: Ivan Butygin <ivan.butygin@gmail.com>, Yi Wu <yi.wu2@arm.com>
Add overflow flags support to the following ops:
* `arith.addi`
* `arith.subi`
* `arith.muli`
Example of new syntax:
```
%res = arith.addi %arg1, %arg2 overflow<nsw> : i64
```
Similar to existing LLVM dialect syntax
```
%res = llvm.add %arg1, %arg2 overflow<nsw> : i64
```
Tablegen canonicalization patterns updated to always drop flags, proper
support with tests will be added later.
Updated LLVMIR translation as part of this commit as it currenly written
in a way that it will crash when new attributes added to arith ops
otherwise.
Discussion
https://discourse.llvm.org/t/rfc-integer-overflow-flags-support-in-arith-dialect/76025
---------
Co-authored-by: Yi Wu <yi.wu2@arm.com>
This PR adds promised interface declarations for
`ConvertToLLVMPatternInterface` in all the dialects that support the
`ConvertToLLVM` dialect extension.
Promised interfaces allow a dialect to declare that it will have an
implementation of a particular interface, crashing the program if one
isn't provided when the interface is used.
The maxnum/minnum semantics can be found at
https://llvm.org/docs/LangRef.html#llvm-minnum-intrinsic.
The revision also updates function names in lit tests to match op name.
Take arith.maxnumf as example:
```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
%result = arith.maxnumf %lhs, %rhs : f32
return %result : f32
}
```
will be expanded to
```
func.func @maxnumf(%lhs: f32, %rhs: f32) -> f32 {
%0 = arith.cmpf ugt, %lhs, %rhs : f32
%1 = arith.select %0, %lhs, %rhs : f32
%2 = arith.cmpf uno, %lhs, %lhs : f32
%3 = arith.select %2, %rhs, %1 : f32
return %3 : f32
}
```
Case 1: Both LHS and RHS are not NaN; LHS > RHS
In this case, `%1` is LHS. `%3` and `%1` have the same value, so `%3` is
LHS.
Case 2: LHS is NaN and RHS is not NaN
In this case, `%2` is true, so `%3` is always RHS.
Case 3: LHS is not NaN and RHS is NaN
In this case, `%0` is true and `%1` is LHS. `%2` is false, so `%3` and
`%1` have the same value, which is LHS.
Case 4: Both LHS and RHS are NaN:
`%1` and RHS are all NaN, so the result is still NaN.
This commit adds extra assertions to `OperationFolder` and `OpBuilder`
to ensure that the types of the folded SSA values match with the result
types of the op. There used to be checks that discard the folded results
if the types do not match. This commit makes these checks stricter and
turns them into assertions.
Discarding folded results with the wrong type (without failing
explicitly) can hide bugs in op folders. Two such bugs became apparent
in MLIR (and some more in downstream projects) and are fixed with this
change.
Note: The existing type checks were introduced in
https://reviews.llvm.org/D95991.
Migration guide: If you see failing assertions (`folder produced value
of incorrect type`; make sure to run with assertions enabled!), run with
`-debug` or dump the operation right before the failing assertion. This
will point you to the op that has the broken folder. A common mistake is
a mismatch between static/dynamic dimensions (e.g., input has a static
dimension but folded result has a dynamic dimension).
* Declare arguments/results with `let` statements.
* Rename `transp` to `permutation`.
* Change type of `transp` from `I64ArrayAttr` to `DenseI64ArrayAttr`
(provides direct access to `ArrayRef<int64_t>` instead of `ArrayAttr`).
-- In order to compute maximum, we should always initialise the
result with the largest negative value possible for the concerned
element type, instead of the smallest.
-- This commit essentially adds a fix to this issue.
Signed-off-by: Abhishek Varma <abhishek@nod-labs.com>
This cherry-picks the changes in
llvm-project/5bf701a6687a46fd898621f5077959ff202d716b and extends the
pattern to handle vector types.
To reuse `getBoolAttribute` method, it moves the static method above the
include of generated file.
This adds the following canonicalization patterns:
- Inverting condition:
- `select(not(pred), a, b) => select(pred, b, a)`
- Merging consecutive selects with the same predicate
- `select(pred, select(pred, a, b), c) => select(pred, a, c)`
- `select(pred, a, select(pred, b, c)) => select(pred, a, c)`
- Merging consecutive selects with a common value:
- `select(predA, select(predB, a, b), b) => select(and(predA, predB), a,
b)`
- `select(predA, select(predB, b, a), b) => select(and(predA,
not(predB)), a, b)`
- `select(predA, a, select(predB, a, b)) => select(or(predA, predB), a,
b)`
- `select(predA, a, select(predB, b, a)) => select(or(predA,
not(predB)), a, b)`
Note that the `Pass` suffix is added in tablegen, and as a side effect the
options are renamed from `ArithExpandOpsOptions` to `ArithExpandOpsPassOptions`.
Inserting clones requires a lot of assumptions to hold on the input IR, e.g., all writes to a buffer need to dominate all reads. This is not guaranteed by one-shot bufferization and isn't easy to verify, thus it could quickly lead to incorrect results that are hard to debug. This commit changes the mechanism of how an ownership indicator is materialized when there is not already a unique ownership present. Additionally, we don't create copies of returned memrefs anymore when we don't have ownership. Instead, we insert assert operations to make sure we have ownership at runtime, or otherwise report to the user that correctness could not be guaranteed.
This patch is part of a larger initiative aimed at fixing floating-point
`max` and `min` operations in MLIR:
https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
Here we introduce new operations for floating-point numbers: `minnum`
and `maxnum`.
These operations have different semantics than `minumumf` and `maximumf`
ops.
They follow the eponymous LLVM intrinsics semantics, which differs
in the handling positive and negative zeros and NaNs.
This patch addresses the 1.3 task from the RFC.
Add a method to the BufferDeallocationOpInterface that allows operations to implement the interface and provide custom logic to compute the ownership indicators of values it defines. As a demonstrating example, this new method is implemented by the `arith.select` operation.