LLVM already supports `DW_TAG_LLVM_annotation` entries for subprograms,
but this hasn't been surfaced to the LLVM dialect.
I'm doing the minimal amount of work to support string-based
annotations, which is useful for attaching metadata to
functions, which is useful for debuggers to offer features beyond basic
DWARF.
As LLVM already supports this, this patch is not controversial.
This PR adds `f8E8M0FNU` type to MLIR.
`f8E8M0FNU` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 8-bit floating point number with bit layout S0E8M0. Unlike
IEEE-754 types, there are no infinity, denormals, zeros or negative
values.
```c
f8E8M0FNU
- Exponent bias: 127
- Maximum stored exponent value: 254 (binary 1111'1110)
- Maximum unbiased exponent value: 254 - 127 = 127
- Minimum stored exponent value: 0 (binary 0000'0000)
- Minimum unbiased exponent value: 0 − 127 = -127
- Doesn't have zero
- Doesn't have infinity
- NaN is encoded as binary 1111'1111
Additional details:
- Zeros cannot be represented
- Negative values cannot be represented
- Mantissa is always 1
```
Related PRs:
- [PR-107127](https://github.com/llvm/llvm-project/pull/107127)
[APFloat] Add APFloat support for E8M0 type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
- [PR-108877](https://github.com/llvm/llvm-project/pull/108877) [MLIR]
Add f4E2M1FN type
This PR adds `f4E2M1FN` type to mlir.
`f4E2M1FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 4-bit floating point number with bit layout S1E2M1. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f4E2M1FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.00.0
- Max normal number: S.11.1 = ±2^(2) x (1 + 0.5) = ±6.0
- Min normal number: S.01.0 = ±2^(0) = ±1.0
- Min subnormal number: S.00.1 = ±2^(0) x 0.5 = ±0.5
```
Related PRs:
- [PR-95392](https://github.com/llvm/llvm-project/pull/95392) [APFloat]
Add APFloat support for FP4 data type
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
- [PR-107999](https://github.com/llvm/llvm-project/pull/107999) [MLIR]
Add f6E2M3FN type
When using the `enable_ir_printing` API from Python, it invokes IR
printing with default args, printing the IR before each pass and
printing IR after pass only if there have been changes. This PR attempts
to align the `enable_ir_printing` API with the documentation
This PR adds `f6E2M3FN` type to mlir.
`f6E2M3FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E2M3. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f6E2M3FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.00.000
- Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5
- Min normal number: S.01.000 = ±2^(0) = ±1.0
- Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875
- Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125
```
Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
This PR adds `f6E3M2FN` type to mlir.
`f6E3M2FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E3M2. Unlike
IEEE-754 types, there are no infinity or NaN values.
```c
f6E3M2FN
- Exponent bias: 3
- Maximum stored exponent value: 7 (binary 111)
- Maximum unbiased exponent value: 7 - 3 = 4
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs
Additional details:
- Zeros (+/-): S.000.00
- Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28
- Min normal number: S.001.00 = ±2^(-2) = ±0.25
- Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875
- Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625
```
Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 type - was used as a template for this PR
This reverts commit fa93be4, restoring
commit d884b77, with fixes that ensure the CAPI declarations are
exported properly.
This commit implements LLVM_DIRecursiveTypeAttrInterface for the
DISubprogramAttr to ensure cyclic subprograms can be imported properly.
In the process multiple shortcuts around the recently introduced
DIImportedEntityAttr can be removed.
This commit implements LLVM_DIRecursiveTypeAttrInterface for the
DISubprogramAttr to ensure cyclic subprograms can be imported properly.
In the process multiple shortcuts around the recently introduced
DIImportedEntityAttr can be removed.
This patch adds the `#gpu.kernel_metadata` and `#gpu.kernel_table`
attributes. The `#gpu.kernel_metadata` attribute allows storing metadata
related to a compiled kernel, for example, the number of scalar
registers used by the kernel. The attribute only has 2 required
parameters, the name and function type. It also has 2 optional
parameters, the arguments attributes and generic dictionary for storing
all other metadata.
The `#gpu.kernel_table` stores a table of `#gpu.kernel_metadata`,
mapping the name of the kernel to the metadata.
Finally, the function `ROCDL::getAMDHSAKernelsELFMetadata` was added to
collect ELF metadata from a binary, and to test the class methods in
both attributes.
Example:
```mlir
gpu.binary @binary [#gpu.object<#rocdl.target<chip = "gfx900">, kernels = #gpu.kernel_table<[
#gpu.kernel_metadata<"kernel0", (i32) -> (), metadata = {sgpr_count = 255}>,
#gpu.kernel_metadata<"kernel1", (i32, f32) -> (), arg_attrs = [{llvm.read_only}, {}]>
]> , bin = "BLOB">]
```
The motivation behind these attributes is to provide useful information
for things like tunning.
---------
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
The `DIImporedEntity` can be used to represent imported entities like
C++'s namespace with using directive or fortran's moudule with use
statement.
This PR adds `DIImportedEntityAttr` and 2-way translation from
`DIImportedEntity` to `DIImportedEntityAttr` and vice versa.
When an entity is imported in a function, the `retainedNodes` field of
the `DISubprogram` contains all the imported nodes. See the C++ code and
the LLVM IR below.
```
void test() {
using namespace n1;
...
}
!2 = !DINamespace(name: "n1", scope: null)
!16 = distinct !DISubprogram(name: "test", ..., retainedNodes: !19) !19 = !{!20}
!20 = !DIImportedEntity(tag: DW_TAG_imported_module, scope: !16, entity: !2 ...)
```
This PR makes sure that the translation from mlir to `retainedNodes`
field happens correctly both ways.
To side step the cyclic dependency between `DISubprogramAttr` and `DIImportedEntityAttr`,
we have decided to not have `scope` field in the `DIImportedEntityAttr` and it is inferred
from the entity which hold the list of `DIImportedEntityAttr`. A `retainedNodes` field has been
added in the `DISubprogramAttr` which contains the list of `DIImportedEntityAttr` for that
function.
This PR currently does not handle entities imported in a global scope
but that should be easy to handle in a subsequent PR.
This PR adds `f8E3M4` type to mlir.
`f8E3M4` type follows IEEE 754 convention
```c
f8E3M4 (IEEE 754)
- Exponent bias: 3
- Maximum stored exponent value: 6 (binary 110)
- Maximum unbiased exponent value: 6 - 3 = 3
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Precision specifies the total number of bits used for the significand (mantissa),
including implicit leading integer bit = 4 + 1 = 5
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 3
- Min exp (unbiased): -2
- Infinities (+/-): S.111.0000
- Zeros (+/-): S.000.0000
- NaNs: S.111.{0,1}⁴ except S.111.0000
- Max normal number: S.110.1111 = +/-2^(6-3) x (1 + 15/16) = +/-2^3 x 31 x 2^(-4) = +/-15.5
- Min normal number: S.001.0000 = +/-2^(1-3) x (1 + 0) = +/-2^(-2)
- Max subnormal number: S.000.1111 = +/-2^(-2) x 15/16 = +/-2^(-2) x 15 x 2^(-4) = +/-15 x 2^(-6)
- Min subnormal number: S.000.0001 = +/-2^(-2) x 1/16 = +/-2^(-2) x 2^(-4) = +/-2^(-6)
```
Related PRs:
- [PR-99698](https://github.com/llvm/llvm-project/pull/99698) [APFloat]
Add support for f8E3M4 IEEE 754 type
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 IEEE 754 type
This PR adds `f8E4M3` type to mlir.
`f8E4M3` type follows IEEE 754 convention
```c
f8E4M3 (IEEE 754)
- Exponent bias: 7
- Maximum stored exponent value: 14 (binary 1110)
- Maximum unbiased exponent value: 14 - 7 = 7
- Minimum stored exponent value: 1 (binary 0001)
- Minimum unbiased exponent value: 1 − 7 = −6
- Precision specifies the total number of bits used for the significand (mantisa),
including implicit leading integer bit = 3 + 1 = 4
- Follows IEEE 754 conventions for representation of special values
- Has Positive and Negative zero
- Has Positive and Negative infinity
- Has NaNs
Additional details:
- Max exp (unbiased): 7
- Min exp (unbiased): -6
- Infinities (+/-): S.1111.000
- Zeros (+/-): S.0000.000
- NaNs: S.1111.{001, 010, 011, 100, 101, 110, 111}
- Max normal number: S.1110.111 = +/-2^(7) x (1 + 0.875) = +/-240
- Min normal number: S.0001.000 = +/-2^(-6)
- Max subnormal number: S.0000.111 = +/-2^(-6) x 0.875 = +/-2^(-9) x 7
- Min subnormal number: S.0000.001 = +/-2^(-6) x 0.125 = +/-2^(-9)
```
Related PRs:
- [PR-97179](https://github.com/llvm/llvm-project/pull/97179) [APFloat]
Add support for f8E4M3 IEEE 754 type
This exposes most of the `RewriterBase` methods to the C API.
This allows to manipulate both the `IRRewriter` and the
`PatternRewriter`. The
`IRRewriter` can be created from the C API, while the `PatternRewriter`
cannot.
The missing operations are the ones taking `Block::iterator` and
`Region::iterator` as
parameters, as they are not exposed by the C API yet AFAIK.
The Python bindings for these methods and classes are not implemented.
Expose `elideLargeResourceString` to the c api.
This was done in the same way as `elideLargeElementsAttrs` is exposed.
The docs were grabbed from the `elideLargeResourceString` method and
forwarded here.
The MLIR C and Python Bindings expose various methods from
`mlir::OpPrintingFlags` . This PR adds a binding for the `skipRegions`
method, which allows to skip the printing of Regions when printing Ops.
It also exposes this option as parameter in the python `get_asm` and
`print` methods
Following a rather direct approach to expose PDL usage from C and then
Python. This doesn't yes plumb through adding support for custom
matchers through this interface, so constrained to basics initially.
This also exposes greedy rewrite driver. Only way currently to define
patterns is via PDL (just to keep small). The creation of the PDL
pattern module could be improved to avoid folks potentially accessing
the module used to construct it post construction. No ergonomic work
done yet.
---------
Signed-off-by: Jacques Pienaar <jpienaar@google.com>
This PR handle translation of DIStringType. Mostly mechanical changes to
translate DIStringType to/from DIStringTypeAttr. The 'stringLength'
field is 'DIVariable' in DIStringType. As there was no `DIVariableAttr`
previously, it has been added to ease the translation.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
The PR implements MLIR Python Bindings for a few simple edit operations
on Block arguments, namely, `add_argument`, `erase_argument`, and
`erase_arguments`.
The fortran arrays use 'dataLocation', 'rank', 'allocated' and
'associated' fields of the DICompositeType. These were not available in
'DICompositeTypeAttr'. This PR adds the missing fields.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
This field is present in LLVM, but was missing from the MLIR wrapper
type. This addition allows MLIR languages to add proper DWARF info for
GPU programs.
Being able to add custom dialects is one of the big missing pieces of
the C API. This change should make it achievable via IRDL. Hopefully
this should open custom dialect definition to non-C++ users of MLIR.
1. Explicit value means the non-zero value in a sparse tensor. If
explicitVal is set, then all the non-zero values in the tensor have the
same explicit value. The default value Attribute() indicates that it is
not set.
2. Implicit value means the "zero" value in a sparse tensor. If
implicitVal is set, then the "zero" value in the tensor is equal to the
implicit value. For now, we only support `0` as the implicit value but
it could be extended in the future. The default value Attribute()
indicates that the implicit value is `0` (same type as the tensor
element type).
Example:
```
#CSR = #sparse_tensor.encoding<{
map = (d0, d1) -> (d0 : dense, d1 : compressed),
posWidth = 64,
crdWidth = 64,
explicitVal = 1 : i64,
implicitVal = 0 : i64
}>
```
Note: this PR tests that implicitVal could be set to other values as
well. The following PR will add verifier and reject any value that's not
zero for implicitVal.
This commit adds `walk` method to PyOperationBase that uses a python
object as a callback, e.g. `op.walk(callback)`. Currently callback must
return a walk result explicitly.
We(SiFive) have implemented walk method with python in our internal
python tool for a while. However the overhead of python is expensive and
it didn't scale well for large MLIR files. Just replacing walk with this
version reduced the entire execution time of the tool by 30~40% and
there are a few configs that the tool takes several hours to finish so
this commit significantly improves tool performance.
Transform interfaces are implemented, direction or via extensions, in
libraries belonging to multiple other dialects. Those dialects don't
need to depend on the non-interface part of the transform dialect, which
includes the growing number of ops and transitive dependency footprint.
Split out the interfaces into a separate library. This in turn requires
flipping the dependency from the interface on the dialect that has crept
in because both co-existed in one library. The interface shouldn't
depend on the transform dialect either.
As a consequence of splitting, the capability of the interpreter to
automatically walk the payload IR to identify payload ops of a certain
kind based on the type used for the entry point symbol argument is
disabled. This is a good move by itself as it simplifies the interpreter
logic. This functionality can be trivially replaced by a
`transform.structured.match` operation.
This commit extends the DIDerivedTypeAttr with the `extraData` field.
For now, the type of it is limited to be a `DINodeAttr`, as extending
the debug metadata handling to support arbitrary metadata nodes does not
seem to be necessary so far.
Following the discussion from [this
thread](https://discourse.llvm.org/t/handling-cyclic-dependencies-in-debug-info/67526/11),
this PR adds support for recursive DITypes.
This PR adds:
1. DIRecursiveTypeAttrInterface: An interface that DITypeAttrs can
implement to indicate that it supports recursion. See full description
in code.
2. Importer & exporter support (The only DITypeAttr that implements the
interface is DICompositeTypeAttr, so the exporter is only implemented
for composites too. There will be two methods that each llvm DI type
that supports mutation needs to implement since there's nothing
general).
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
The base class llvm::ThreadPoolInterface will be renamed
llvm::ThreadPool in a subsequent commit.
This is a breaking change: clients who use to create a ThreadPool must
now create a DefaultThreadPool instead.
Expose the API for constructing and inspecting StructTypes from the LLVM
dialect. Separate constructor methods are used instead of overloads for
better readability, similarly to IntegerType.