This commit fixes a case of incorrect dialect conversion API usage
during `FuncOpConversion`. `replaceAllUsesExcept` (same as
`replaceAllUsesWith`) is currently not supported in a dialect
conversion. `replaceUsesOfBlockArgument` should be used instead. It
sometimes works anyway (like in this case), but that's just because of
the way we insert materializations.
This commit is in preparation of merging the 1:1 and 1:N dialect
conversion drivers. (At that point, the current use of
`replaceAllUsesExcept` will no longer work.)
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRLinalgToStandard.a only:
```
In file included from llvm/include/llvm/IR/Module.h:22,
from mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h:37,
from mlir/lib/Conversion/LinalgToStandard/LinalgToStandard.cpp:13:
llvm/include/llvm/IR/Attributes.h:90:14: fatal error: llvm/IR/Attributes.inc: No such file or directory
```
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRMathToLibm.a only:
```
In file included from llvm/include/llvm/IR/Module.h:22,
from mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h:37,
from mlir/lib/Conversion/MathToLibm/MathToLibm.cpp:13
llvm/include/llvm/IR/Attributes.h:90:14: fatal error: llvm/IR/Attributes.inc: No such file or directory
```
This fixes the following failure when doing a clean build (in particular
no .ninja* lying around) of lib/libMLIRControlFlowToSCF.a only:
```
In file included from llvm/include/llvm/IR/Module.h:22,
from mlir/include/mlir/Dialect/LLVMIR/LLVMDialect.h:37,
from mlir/lib/Conversion/ControlFlowToSCF/ControlFlowToSCF.cpp:19
llvm/include/llvm/IR/Attributes.h:90:14: fatal error: llvm/IR/Attributes.inc: No such file or directory
```
This PR fixes an issue related to integer overflow when computing
`(intmax+1)` for `i64` during `tosa-to-linalg` pass for `tosa.cast`.
Found this issue while debugging a numerical mismatch for `deeplabv3`
model from `torchvision` represented in `tosa` dialect using the
`TorchToTosa` pipeline in `torch-mlir` repository. `torch.aten.to.dtype`
is converted to `tosa.cast` that casts `f32` to `i64` type. Technically
by the specification, `tosa.cast` doesn't handle casting `f32` to `i64`.
So it's possible to add a verifier to error out for such tosa ops
instead of producing incorrect code. However, I chose to fix the
overflow issue to still be able to represent the `deeplabv3` model with
`tosa` ops in the above-mentioned pipeline. Open to suggestions if
adding the verifier is more appropriate instead.
This PR fixes a crash in `VectorToGPU` when the operand of `extOp` is a
function argument, which cannot be retrieved using `getDefiningOp`.
Fixes#107967.
This commit simplifies the result type of materialization functions.
Previously: `std::optional<Value>`
Now: `Value`
The previous implementation allowed 3 possible return values:
- Non-null value: The materialization function produced a valid
materialization.
- `std::nullopt`: The materialization function failed, but another
materialization can be attempted.
- `Value()`: The materialization failed and so should the dialect
conversion. (Previously: Dialect conversion can roll back.)
This commit removes the last variant. It is not particularly useful
because the dialect conversion will fail anyway if all other
materialization functions produced `std::nullopt`.
Furthermore, in contrast to type conversions, at least one
materialization callback is expected to succeed. In case of a failing
type conversion, the current dialect conversion can roll back and try a
different pattern. This also used to be the case for materializations,
but that functionality was removed with #107109: failed materializations
can no longer trigger a rollback. (They can just make the entire dialect
conversion fail without rollback.) With this in mind, it is even less
useful to have an additional error state for materialization functions.
This commit is in preparation of merging the 1:1 and 1:N type
converters. Target materializations will have to return multiple values
instead of a single one. With this commit, we can keep the API simple:
`SmallVector<Value>` instead of `std::optional<SmallVector<Value>>`.
Note for LLVM integration: All 1:1 materializations should return
`Value` instead of `std::optional<Value>`. Instead of `std::nullopt`
return `Value()`.
This PR updates the cast to bool from IntN to treat any non-zero value
as TRUE. This makes the cast more resilient to non-generic (i.e. "non
1") TRUE values.
Signed-off-by: Dmitriy Smirnov <dmitriy.smirnov@arm.com>
This patch simplifies the representation of OpenMP loop wrapper
operations by introducing the `NoTerminator` trait and updating
accordingly the verifier for the `LoopWrapperInterface`.
Since loop wrappers are already limited to having exactly one region
containing exactly one block, and this block can only hold a single
`omp.loop_nest` or loop wrapper and an `omp.terminator` that does not
return any values, it makes sense to simplify the representation of loop
wrappers by removing the terminator.
There is an extensive list of Lit tests that needed updating to remove
the `omp.terminator`s adding some noise to this patch, but actual
changes are limited to the definition of the `omp.wsloop`, `omp.simd`,
`omp.distribute` and `omp.taskloop` loop wrapper ops, Flang lowering for
those, `LoopWrapperInterface::verifyImpl()`, SCF to OpenMP conversion
and OpenMP dialect documentation.
This patch fixes the sm90 cluster test by:
* Fixing a typo in LowerGpuOpsToNVVMOps where one of the ClusterDim Op
conversion pattern should actually be for the
ClusterDimBlocks Op. This addresses the compilation error for this test.
* The grid-size should be (4,4,1) instead of (2,2,1). This passes the
scf-if check against the threshold of 3 below and actually
generates the required prints from the GPU.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
This fixes all the places in MLIR that hit the new assertion added in
#106524, in preparation for enabling it by default. That is, cases where
the value passed to the APInt constructor is not an N-bit
signed/unsigned integer, where N is the bit width and signedness is
determined by the isSigned flag.
The fixes either set the correct value for isSigned, or set the
implicitTrunc flag to retain the old behavior. I've left TODOs for the
latter case in some places, where I think that it may be worthwhile to
stop doing implicit truncation in the future.
Note that the assertion is currently still disabled by default, so this
patch is mostly NFC.
This is just the MLIR changes split off from
https://github.com/llvm/llvm-project/pull/80309.
In tosa valiation pass, change the type of profile option to ListOption.
Now TOSA profiles is turned from hierarchical to composable. Each
profile is an independent set, i.e. an target can implement multiple
profiles.
Set the profile option to none by default, and limit to profiles if
requested.
The profiles can be specified via command line, e.g.
$ mlir-opt ... --tosa-validate="profile=bi,mi" which tells the valiation
pass that BI and MI are enabled.
Change-Id: I1fb8d0c1b27eccd768349b6eb4234093313efb57
Default to Global address space for memrefs that do not have an explicit address space set in the IR.
---------
Co-authored-by: Victor Perez <victor.perez@intel.com>
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
Co-authored-by: Victor Perez <victor.perez@codeplay.com>
Relaxes vector.transfer_write lowering to allow out-of-bound writes.
This aligns lowering with the current hardware specification which does
not update bytes in out-of-bound locations during block stores.
The PR updates math.powf lowering to produce NaN result for a negative
base with a fractional exponent which matches the actual behaviour of
the C/C++ implementation.
Found by inspecting AMDGPU assembly - so the arithmetic ops created
there were definitely making their way into the target ISA. A
`LLVM::BitcastOp` seems equivalent, and evaporates as expected in the
target asm.
Along the way, I thought that this helper function `mfmaConcatIfNeeded`
could be renamed to `convertMFMAVectorOperand` to better convey its
contract; so I don't need to think about whether a bitcast is a
legitimate "concat" :-)
---------
Signed-off-by: Benoit Jacob <jacob.benoit.1@gmail.com>
This commit marks the type converter in `populate...` functions as
`const`. This is useful for debugging.
Patterns already take a `const` type converter. However, some
`populate...` functions do not only add new patterns, but also add
additional type conversion rules. That makes it difficult to find the
place where a type conversion was added in the code base. With this
change, all `populate...` functions that only populate pattern now have
a `const` type converter. Programmers can then conclude from the
function signature that these functions do not register any new type
conversion rules.
Also some minor cleanups around the 1:N dialect conversion
infrastructure, which did not always pass the type converter as a
`const` object internally.
The `memref.alloca` lowering computed the allocation size incorrectly
when there were 0 dimensions.
Previously:
```
memref.alloca() : memref<10x0x2xf32>
--> llvm.alloca 20xf32
```
Now:
```
memref.alloca() : memref<10x0x2xf32>
--> llvm.alloca 0xf32
```
From the `llvm.alloca` documentation:
```
Allocating zero bytes is legal, but the returned pointer may not be unique.
```
This PR adds `f8E8M0FNU` type to MLIR.
`f8E8M0FNU` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 8-bit floating point number with bit layout S0E8M0. Unlike
IEEE-754 types, there are no infinity, denormals, zeros or negative
values.
```c
f8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111
Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```
Related PRs:
- [PR-107127](https://github.com/llvm/llvm-project/pull/107127)
[APFloat] Add APFloat support for E8M0 type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
- [PR-108877](https://github.com/llvm/llvm-project/pull/108877) [MLIR]
Add f4E2M1FN type
Support broadcasting of depthwise conv2d bias in tosa->linalg named
lowering in the case that bias is a rank-1 tensor with exactly 1
element. In this case TOSA specifies the value should first be broadcast
across the bias dimension and then across the result tensor.
Add `lit` tests for depthwise conv2d with scalar bias and for conv3d
which was already supported but missing coverage.
Signed-off-by: Jack Frankland <jack.frankland@arm.com>
There are some spurious libraries which can be removed.
I'm trying to bundle MLIR/LLVM library dependencies for our own
libraries. We're utilizing cmake function to recursively collect
MLIR/LLVM related dependencies. However, we identified certain library
dependencies as redundant and safe for removal.
This change contains following:
- adds lowering of printf op to spirv.CL.printf op in GPUToSPIRV pass.
- Fixes Constant decoration parsing for spirv GlobalVariable.
- minor modification to spirv.CL.printf op assembly format.
---------
Co-authored-by: Jakub Kuderski <kubakuderski@gmail.com>
This PR adds LLVM [operand
bundle](https://llvm.org/docs/LangRef.html#operand-bundles) support to
MLIR LLVM dialect. It affects these 3 operations related to making
function calls: `llvm.call`, `llvm.invoke`, and `llvm.call_intrinsic`.
This PR adds two new parameters to each of the 3 operations. The first
parameter is a variadic operand `op_bundle_operands` that contains the
SSA values for operand bundles. The second parameter is a property
`op_bundle_tags` which holds an array of strings that represent the tags
of each operand bundle.
This PR adds `f4E2M1FN` type to mlir.
`f4E2M1FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 4-bit floating point number with bit layout S1E2M1. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5
```
Related PRs:
- [PR-95392](https://github.com/llvm/llvm-project/pull/95392) [APFloat]
Add APFloat support for FP4 data type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
Don't call raw_string_ostream::flush(), which is essentially a no-op.
As specified in the docs, raw_string_ostream is always unbuffered.
( 65b13610a5 for further reference )