Commit Graph

10814 Commits

Author SHA1 Message Date
Chao Chen
2162723636 [MLIR][XeGPU] Updates XeGPU TensorDescAttr and Refine Gather/Scatter definition. (#109144)
The PR makes the following refine changes to the XeGPU dialect. 
1. Separated the old `TensorDescAttr` into two independent attributes: `BlockTensorDescAttr` and `ScatterTensorDescAttr`
2. Renamed the `MemoryScopeAttr` to `MemorySpaceAttr` and updated the enumeration value for shared memory following OpenCL standard.
3. Introduced `transpose` UnitAttr to `StoreScatterOp`and `LoadGatherOp`
4. Added memory space check for `CreateNdDesc` and `CreateDesc` op, as well as valid and invalid test cases for them.
2024-09-23 09:00:26 -05:00
Matteo Franciolini
2f664f2bdf [mlir][mesh] Fix empty split_axes sharding annotation (#108236)
The `split_axes` attribute is defined as "array attribute of array
attributes". Following the definition, empty `split_axes` values should
not be allowed, since that would break the definition and would lead to
invalid IR. In such scenario, passes leveraging the mesh dialect can
observe:
* crashes in sharding-propagation;
* creation of null MeshShardingAttrs in spmdization;
* non roundtrippable IR.

The patch prevents `split_axes` to become empty by modifying the
`removeTrailingEmptySubArray` such that a minimum size of one is
guaranteed when constructing the attribute, and adds a test that would
crash without the change.
2024-09-21 08:43:12 -07:00
Daniel Hernandez-Juarez
b014265d99 [mlir][AMDGPU] New gfx12 barrier instructions and update lowering LDSBarrierOp (#109273)
New gfx12 barrier instructions: s.barrier.signal, s.barrier.wait and
s.wait.dscnt. And update lowering LDSBarrierOp accordingly.

CC: @krzysz00 @manupak @giuseros
2024-09-20 17:41:36 -05:00
Daniil Fukalov
65bc259a97 [NFC] Add explicit #include llvm-config.h where its macros are used, last part. (#107615)
(this is the part related to bolt, lld and mlir)

Without these explicit includes, removing other headers, who implicitly
include llvm-config.h, may have non-trivial side effects. For example,
`clangd` may report even `llvm-config.h` as "no used" in case it defines
a macro, that is explicitly used with #ifdef. It is actually amplified
with different build configs which use different set of macros.
2024-09-20 19:59:39 +02:00
Umang Yadav
d0a7cb709e [ROCDL] Pass amd_code_object_version when serializing ROCDL gpu module (#108874)
This PR adds ability to pass non-default value to
`.amdhsa_code_object_version` metadata when serializing ROCDL GPU
modules.

It also fixes typos in two places.

---------

Co-authored-by: Fabian Mora <fmora.dev@gmail.com>
2024-09-20 09:53:09 -05:00
David Spickett
737c414e1d Revert "[clang][flang][mlir] Support -frecord-command-line option (#102975)"
This reverts commit b3533a156d.

It caused test failures in shared library builds:
https://lab.llvm.org/buildbot/#/builders/80/builds/3854
2024-09-20 11:30:50 +00:00
Chuanqi Xu
e8a7390624 [mlir] [LLVM IR] Introduce VaArgOp (#109260)
I find there is no LLVMOp corresponding to LLVM's [va_arg
instruction](https://llvm.org/docs/LangRef.html#va-arg-instruction) so I
tried to add one. This is helpful for clangir
(https://github.com/llvm/clangir/pull/865).

New to MLIR and not sure who are the appropriate reviewers. Appreciated
in ahead for reviewing and triaging.
2024-09-20 13:19:50 +08:00
Tarun Prabhu
b3533a156d [clang][flang][mlir] Support -frecord-command-line option (#102975)
Add support for the -frecord-command-line option that will produce the
llvm.commandline metadata which will eventually be saved in the object
file. This behavior is also supported in clang. Some refactoring of the
code in flang to handle these command line options was carried out. The
corresponding -grecord-command-line option which saves the command line
in the debug information has not yet been enabled for flang.
2024-09-19 18:28:50 -06:00
Andrzej Warzyński
1335a11176 [mlir][vector][nfc] Clean-up VectorOps.{h|cpp} (#109316) 2024-09-19 21:45:01 +01:00
Adam Siemieniuk
02d34d800b [mlir][vector][xegpu] Vector to XeGPU conversion pass (#107419)
Add pass for Vector to XeGPU dialect conversion and initial conversion
patterns for vector.transfer_read|write operations.
2024-09-19 15:16:23 -05:00
Ivan Butygin
96ac627238 [mlir][vector][nfc] Update vector load/store doc wrt unit strides. (#109267)
Follow up to https://github.com/llvm/llvm-project/pull/108998.

Non-contiguous strides are allowed now for 1-element vector load/stores.
2024-09-19 14:52:35 +03:00
Jianjian Guan
87dc3e89e7 [mlir][LLVMIR] Add more vector predication intrinsic ops (#107663)
This revision adds vector predication smax, smin, umax and umin
intrinsic ops.
2024-09-19 10:33:36 +08:00
Andrea Faulds
a800ffac41 [mlir][gpu] Disjoint patterns for lowering clustered subgroup reduce (#109158)
Making the existing populateGpuLowerSubgroupReduceToShufflePatterns()
function also cover the new "clustered" subgroup reductions is proving
to be inconvenient, because certain backends may have more specific
lowerings that only cover the non-clustered type, and this creates pass
ordering constraints. This commit removes coverage of clustered
reductions from this function in favour of a new separate function,
which makes controlling the lowering much more straightforward.
2024-09-18 15:55:53 -04:00
Bimo
f8eceb45d0 [MLIR] [Python] align python ir printing with mlir-print-ir-after-all (#107522)
When using the `enable_ir_printing` API from Python, it invokes IR
printing with default args, printing the IR before each pass and
printing IR after pass only if there have been changes. This PR attempts
to align the `enable_ir_printing` API with the documentation
2024-09-18 11:54:16 +08:00
Andrea Faulds
fd26f8444a [mlir][gpu] Rename two misspelled pattern population functions (#109015) 2024-09-17 15:26:14 -04:00
Sergey Kozub
73d83f20c9 [MLIR] Add f6E2M3FN type (#107999)
This PR adds `f6E2M3FN` type to mlir.

`f6E2M3FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E2M3. Unlike
IEEE-754 types, there are no infinity or NaN values.

```c
f6E2M3FN
- Exponent bias: 1
- Maximum stored exponent value: 3 (binary 11)
- Maximum unbiased exponent value: 3 - 1 = 2
- Minimum stored exponent value: 1 (binary 01)
- Minimum unbiased exponent value: 1 − 1 = 0
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.00.000
- Max normal number: S.11.111 = ±2^(2) x (1 + 0.875) = ±7.5
- Min normal number: S.01.000 = ±2^(0) = ±1.0
- Max subnormal number: S.00.111 = ±2^(0) x 0.875 = ±0.875
- Min subnormal number: S.00.001 = ±2^(0) x 0.125 = ±0.125
```

Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-105573](https://github.com/llvm/llvm-project/pull/105573) [MLIR]
Add f6E3M2FN type - was used as a template for this PR
2024-09-16 21:09:27 +02:00
Arteen Abrishami
00f239e48a [MLIR][TOSA] Add --tosa-reduce-transposes pass (#108260)
----------
Motivation:
----------

Some legalization pathways introduce redundant tosa.TRANSPOSE
operations that result in avoidable data movement. For example,
PyTorch -> TOSA contains a lot of unnecessary transposes due
to conversions between NCHW and NHWC.

We wish to remove all the ones that we can, since in general
it is possible to remove the overwhelming majority.

------------
Changes Made:
------------

- Add the --tosa-reduce-transposes pass
- Add TosaElementwiseOperator trait.

-------------------
High-Level Overview:
-------------------

The pass works through the transpose operators in the program. It begins
at some
transpose operator with an associated permutations tensor. It traverses
upwards
through the dependencies of this transpose and verifies that we
encounter only
operators with the TosaElementwiseOperator trait and terminate in either
constants, reshapes, or transposes.

We then evaluate whether there are any additional restrictions (the
transposes
it terminates in must invert the one we began at, and the reshapes must
be ones
in which we can fold the transpose into), and then we hoist the
transpose through
the intervening operators, folding it at the constants, reshapes, and
transposes.

Finally, we ensure that we do not need both the transposed form (the
form that
had the transpose hoisted through it) and the untransposed form (which
it was prior),
by analyzing the usages of those dependent operators of a given
transpose we are
attempting to hoist and replace.

If they are such that it would require both forms to be necessary, then
we do not
replace the hoisted transpose, causing the new chain to be dead.
Otherwise, we do
and the old chain (untransposed form) becomes dead. Only one chain will
ever then
be live, resulting in no duplication.

We then perform a simple one-pass DCE, so no canonicalization is
necessary.

--------------
Impact of Pass:
--------------

Patching the dense_resource artifacts (from PyTorch) with dense
attributes to
permit constant folding, we receive the following results.

Note that data movement represents total transpose data movement,
calculated
by noting which dimensions moved during the transpose.

///////////
MobilenetV3:
///////////

BEFORE total data movement: 11798776 B (11.25 MiB)
AFTER total data movement: 2998016 B (2.86 MiB)
74.6% of data movement removed.

BEFORE transposes: 82
AFTER transposes: 20
75.6% of transposes removed.

////////
ResNet18:
////////

BEFORE total data movement: 20596556 B (19.64 MiB)
AFTER total data movement: 1003520 B (0.96 MiB)
95.2% of data movement removed.

BEFORE transposes: 56
AFTER transposes: 5
91.1% of transposes removed.

////////
ResNet50:
////////

BEFORE total data movement: 83236172 B (79.3 MiB)
AFTER total data movement: 3010560 B (2.87 MiB)
96.4% of data movement removed

BEFORE transposes: 120
AFTER transposes: 7
94.2% of transposes removed.

/////////
ResNet101:
/////////

BEFORE total data movement: 124336460 B (118.58 MiB)
AFTER total data movement: 3010560 B (2.87 MiB)
97.6% of data movement removed

BEFORE transposes: 239
AFTER transposes: 7
97.1% of transposes removed.

/////////
ResNet152:
/////////

BEFORE total data movement: 175052108 B (166.94 MiB)
AFTER total data movement: 3010560 B (2.87 MiB)
98.3% of data movement removed

BEFORE transposes: 358
AFTER transposes: 7
98.0% of transposes removed.

////////
Overview:
////////

We see that we remove up to 98% of transposes and eliminate
up to 98.3% of redundant transpose data movement.

In the context of ResNet50, with 120 inferences per second,
we reduce dynamic transpose data bandwidth from 9.29 GiB/s
to 344.4 MiB/s.

-----------
Future Work:
-----------

(1) Evaluate tradeoffs with permitting ConstOp to be duplicated across
hoisted
    transposes with different permutation tensors.

(2) Expand the class of foldable upstream ReshapeOp we permit beyond
    N -> 1x1x...x1xNx1x...x1x1.

(3) Enchance the pass to permit folding arbitrary transpose pairs,
beyond
    those that form the identity.

(4) Add support for more instructions besides TosaElementwiseOperator as
    the intervening ones (for example, the reduce_* operators).

(5) Support hoisting transposes up to an input parameter.

Signed-off-by: Arteen Abrishami <arteen.abrishami@arm.com>
2024-09-13 19:16:55 -07:00
Krzysztof Drewniak
a953982cb7 [mlir][GPU] Plumb range information through the NVVM lowerings (#107659)
Update the GPU to NVVM lowerings to correctly propagate range
information on IDs and dimension queries, etiher from
known_{block,grid}_size attributes or from `upperBound` annotations on
the operations themselves.
2024-09-13 12:07:51 -05:00
Sergio Afonso
6568062ff1 [MLIR][OpenMP] Improve assemblyFormat handling for clause-based ops (#108023)
This patch modifies the representation of `OpenMP_Clause` to allow
definitions to incorporate both required and optional arguments while
still allowing operations including them and overriding the
`assemblyFormat` to take advantage of automatically-populated format
strings.

The proposed approach is to split the `assemblyFormat` clause property
into `reqAssemblyFormat` and `optAssemblyFormat`, and remove the
`isRequired` template and associated `required` property. The
`OpenMP_Op` class, in turn, populates the new `clausesReqAssemblyFormat`
and `clausesOptAssemblyFormat` properties in addition to
`clausesAssemblyFormat`. These properties can be used by clause-based
OpenMP operation definitions to reconstruct parts of the
clause-inherited format string in a more flexible way when overriding
it.

Clause definitions are updated to follow this new approach and some
operation definitions overriding the `assemblyFormat` are simplified by
taking advantage of the improved flexibility, reducing code duplication.
The `verify-openmp-ops` tablegen pass is updated for the new
`OpenMP_Clause` representation.

Some MLIR and Flang unit tests had to be updated due to changes to the
default printing order of clauses on updated operations.
2024-09-13 12:57:41 +01:00
Krzysztof Drewniak
9596e83b2a [mlir][AMDGPU] Enable emulating vector buffer_atomic_fadd on gfx11 (#108312)
* Fix a bug introduced by the Chipset refactoring in #107720 where
atomics emulation for adds was mistakenly applied to gfx11+
* Add the case needed for gfx11+ atomic emulation, namely that gfx11
doesn't support atomically adding a v2f16 or v2bf16, thus requiring
MLIR-level legalization for buffer intrinsics that attempt to do such an
addition
* Add tests, including tests for gfx11 atomic emulation

Co-authored-by: Manupa Karunaratne <manupa.karunaratne@amd.com>
2024-09-12 09:47:52 -05:00
Krzysztof Drewniak
90a0be9482 [mlir][LLVM] Refactor how range() annotations are handled for ROCDL intrinsics (#107658)
This commit introduces a ConstantRange attribute to match the
ConstantRange attribute type present in LLVM IR.

It then refactors the LLVM_IntrOpBase so that the basic part of the
intrinsic builder code can be re-used without needing to copy it or
get rid of important context. This, along with adding code for
handling an optional `range` attribute to that same base, allows us to
make the support for range() annotations generic without adding
another bit to IntrOpBase.

This commit then updates the lowering of index intrinsic operations to
use the new ConstantRange attribute and fixes a bug (where we'd be
subtracting 1 from upper bounds instead of adding it on operations
like gpu.block_dim) along the way.

The point of these changes is to enable these range annotations to be
used for the corresponding NVVM operations in a future commit.
2024-09-12 09:46:42 -05:00
MaheshRavishankar
d5f0969c96 [mlir][TilingInterface] Avoid looking at operands for getting slices to continue tile + fuse. (#107882)
Current implementation of `scf::tileConsumerAndFuseProducerUsingSCF`
looks at operands of tiled/tiled+fused operations to see if they are
produced by `extract_slice` operations to populate the worklist used to
continue fusion. This implicit assumption does not always work. Instead
make the implementations of `getTiledImplementation` return the slices
to use to continue fusion.

This is a breaking change

- To continue to get the same behavior of
`scf::tileConsumerAndFuseProducerUsingSCF`, change all out-of-tree
implementation of `TilingInterface::getTiledImplementation` to return
the slices to continue fusion on. All in-tree implementations have been
adapted to this.
- This change touches parts that required a simplification to the
`ControlFn` in `scf::SCFTileAndFuseOptions`. It now returns a
`std::optional<scf::SCFTileAndFuseOptions::ControlFnResult>` object that
should be `std::nullopt` if fusion is not to be performed.

Signed-off-by: MaheshRavishankar <mahesh.revishankar@gmail.com>
2024-09-11 22:15:43 -07:00
Jie Fu
b7167c7844 [mlir] Fix incorrect comparison due to -Wtautological-constant-out-of-range-compare (NFC)
/llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:320:26:
error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
  preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0;
               ~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~
/llvm-project/mlir/include/mlir/Analysis/Presburger/Utils.h:335:28:
error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
    preIndent = (preIndent != std::string::npos) ? preIndent + 1 : 0;
                 ~~~~~~~~~ ^  ~~~~~~~~~~~~~~~~~
2 errors generated.
2024-09-12 11:57:29 +08:00
Amy Wang
2740273505 [MLIR][Presburger] Make printing aligned to assist in debugging (#107648)
Hello Arjun! Please allow me to contribute this patch as it helps me
debugging significantly! When the 1's and 0's don't line up when
debugging farkas lemma of numerous polyhedrons using simplex lexmin
solver, it is truly straining on the eyes. Hopefully this patch can help
others!

The unfortunate part is the lack of testcase as I'm not sure how to add
testcase for debug dumps. :) However, you can add this testcase to the
SimplexTest.cpp to witness the nice printing!

```c++
TEST(SimplexTest, DumpTest) {
  int COLUMNS = 2;
  int ROWS = 2;
  LexSimplex simplex(COLUMNS * 2);
  IntMatrix m1(ROWS, COLUMNS * 2 + 1);
  // Adding LHS columns.
  for (int i = 0; i < ROWS; i++) {
    // an arbitrary formula to test all kinds of integers
    for (int j = 0; j < COLUMNS; j++) 
      m1(i, j) = i + (2 << (i % 3)) * (-1 * ((i + j) % 2));
  }
  // Adding RHS columns.
  for (int i = 0; i < ROWS; i++) {
    for (int j = 0; j < COLUMNS; j++)
      m1(i, j + COLUMNS) = j - (3 << (j % 4)) * (-1 * ((i + j * 2) % 2));
  }
  for (int i = 0; i < m1.getNumRows(); i++) {
    ArrayRef<DynamicAPInt> curRow = m1.getRow(i);
    simplex.addInequality(curRow);
  }
  IntegerRelation rel =
      parseRelationFromSet("(x, y, z)[] : (z - x - 17 * y == 0, x - 11 * z >= 1)",2);
  simplex.dump();
  m1.dump();
  rel.dump();
}
```

```
rows = 2, columns = 7
var: c3, c4, c5, c6
con: r0 [>=0], r1 [>=0]
r0: -1, r1: -2
c0: denom, c1: const, c2: 2147483647, c3: 0, c4: 1, c5: 2, c6: 3
  1  0  1  0 -2  0  1
  1  0 -8 -3  1  3  7

  0 -2  0  1  0
 -3  1  3  7  0
Domain: 2, Range: 1, Symbols: 0, Locals: 0
2 constraints
 -1  -17  1   0   = 0
  1   0  -11 -1  >= 0

```
2024-09-11 23:22:54 -04:00
Longsheng Mou
1a431bcea7 [mlir][Tosa] Fix attr type of out_shape for tosa.transpose_conv2d (#108041)
This patch fixes attr type of out_shape, which is i64 dense array
attribute with exactly 4 elements.

- Fix description of DenseArrayMaxCt
- Add DenseArrayMinCt and move it to CommonAttrConstraints.td
- Change type of out_shape to Tosa_IntArrayAttr4

Fixes #107804.
2024-09-12 09:10:16 +08:00
Krzysztof Drewniak
aa60a3e4d0 [mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108286)
Extend the lowering of atomic.fadd to support the v2f16 variant
avaliable on some AMDGPU chips.

Re-lands #108238 (and addresses review comments from there)

Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>
2024-09-11 17:51:07 -05:00
Matteo Franciolini
aabb0121ee [mlir][bufferization] Fix OpFilter::denyDialect (#108249)
The implementation would crash with unloaded dialects.
2024-09-11 12:03:49 -07:00
Krzysztof Drewniak
cb031267bd Revert "[mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238)" (#108256)
This reverts commit 0d48d4d835.

Mistakenly landed without approval
2024-09-11 12:28:15 -05:00
Krzysztof Drewniak
0d48d4d835 [mlir][AMDGPU] Support vector<2xf16> inputs to buffer atomic fadd (#108238)
Extend the lowering of atomic.fadd to support the v2f16 variant
avaliable on some AMDGPU chips.

Co-authored-by: Giuseppe Rossini <giuseppe.rossini@amd.com>
2024-09-11 12:12:17 -05:00
Arteen Abrishami
a54efdbdc4 [MLIR][TOSA] add additional verification to TOSA (#108133)
----------
Motivation:
----------

Spec conformance. Allows assumptions to be made in TOSA code.

------------
Changes Made:
------------

Add full permutation tensor verification to tosa.TRANSPOSE. Priorly
would not verify that permuted values were between 0 - (rank - 1).

Update tosa.TRANSPOSE perms data type to be strictly i32.

Verify input/output shapes for tosa.TRANSPOSE.

Add verifier to tosa.CONST, with consideration for quantization.

Fix TOSA conformance of tensor type to disallow dimensions with size 0
for ranked tensors, per spec.
This is not the same as rank 0 tensors. Here is an example of a
disallowed tensor: tensor<3x0xi32>. Naturally, this means that the
number of elements in a TOSA tensor will always be greater than 0.

Signed-off-by: Arteen Abrishami <arteen.abrishami@arm.com>
2024-09-11 17:18:09 +01:00
Alex Rice
135bd31975 [mlir] [tblgen-to-irdl] Refactor tblgen-to-irdl script and support more types (#105505)
Refactors the tblgen-to-irdl script slightly and adds support for
- Various integer types
- Various Float types
- Confined types
- Complex types (with fixed element type)

Also doesn't add the operand and result ops if they are empty.

I could potentially split this into smaller PRs if that'd be helpful
(refactor + integer/float/complex, confined type, optional
operand/result).

@math-fehr
2024-09-11 14:02:44 +01:00
Amy Wang
334873fe2d [MLIR][Python] Python binding support for IntegerSet attribute (#107640)
Support IntegerSet attribute python binding.
2024-09-11 07:37:35 -04:00
Sergio Afonso
2f3d061918 [MLIR][OpenMP] Automate operand structure definition (#99508)
This patch adds the "gen-openmp-clause-ops" `mlir-tblgen` generator to
produce the structure definitions previously in OpenMPClauseOperands.h
automatically from the information contained in OpenMPOps.td and
OpenMPClauses.td.

The original header is maintained to enable the definition of similar
structures that are not directly related to any single `OpenMP_Clause`
or `OpenMP_Op` tablegen definition.
2024-09-11 12:16:34 +01:00
Kunwar Grover
c9aa55da62 [mlir][Linalg] Add speculation for LinalgStructuredOps (#108032)
This patch adds speculation behavior for linalg structured ops, allowing
them to be hoisted out of loops using LICM.
2024-09-11 09:30:05 +01:00
Henrich Lauko
d1cad2290c Reland [MLIR] Make resolveCallable customizable in CallOpInterface (#107989)
Relands #100361 with fixed dependencies.
2024-09-10 15:33:13 +02:00
Sven van Haastregt
bda9474f57 Add missing newlines at EOF; NFC 2024-09-10 13:55:31 +01:00
Sergey Kozub
918222ba43 [MLIR] Add f6E3M2FN type (#105573)
This PR adds `f6E3M2FN` type to mlir.

`f6E3M2FN` type is proposed in [OpenCompute MX
Specification](https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf).
It defines a 6-bit floating point number with bit layout S1E3M2. Unlike
IEEE-754 types, there are no infinity or NaN values.

```c
f6E3M2FN
- Exponent bias: 3
- Maximum stored exponent value: 7 (binary 111)
- Maximum unbiased exponent value: 7 - 3 = 4
- Minimum stored exponent value: 1 (binary 001)
- Minimum unbiased exponent value: 1 − 3 = −2
- Has Positive and Negative zero
- Doesn't have infinity
- Doesn't have NaNs

Additional details:
- Zeros (+/-): S.000.00
- Max normal number: S.111.11 = ±2^(4) x (1 + 0.75) = ±28
- Min normal number: S.001.00 = ±2^(-2) = ±0.25
- Max subnormal number: S.000.11 = ±2^(-2) x 0.75 = ±0.1875
- Min subnormal number: S.000.01 = ±2^(-2) x 0.25 = ±0.0625
```

Related PRs:
- [PR-94735](https://github.com/llvm/llvm-project/pull/94735) [APFloat]
Add APFloat support for FP6 data types
- [PR-97118](https://github.com/llvm/llvm-project/pull/97118) [MLIR] Add
f8E4M3 type - was used as a template for this PR
2024-09-10 10:41:05 +02:00
Matthias Springer
7574042e2a Revert "[MLIR] Make resolveCallable customizable in CallOpInterface" (#107984)
Reverts llvm/llvm-project#100361

This commit caused some linker errors. (Missing `MLIRCallInterfaces`
dependency.)
2024-09-10 10:24:05 +02:00
Pradeep Kumar
831236e78c [MLIR][NVVM] Add support for nvvm.breakpoint Op (#107193)
This commit adds support for `nvvm.breakpoint` Op which lowers to the
PTX brkpt instruction. Also, added the respective tests in `nvvmir.mlir`
2024-09-10 10:14:25 +02:00
Henrich Lauko
958f59d90f [MLIR] Make resolveCallable customizable in CallOpInterface (#100361)
Allow customization of the `resolveCallable` method in the
`CallOpInterface`. This change allows for operations implementing this
interface to provide their own logic for resolving callables.

- Introduce the `resolveCallable` method, which does not include the
optional symbol table parameter. This method replaces the previously
existing extra class declaration `resolveCallable`.

- Introduce the `resolveCallableInTable` method, which incorporates the
symbol table parameter. This method replaces the previous extra class
declaration `resolveCallable` that used the optional symbol table
parameter.
2024-09-10 10:08:41 +02:00
Jakub Kuderski
763bc9249c [mlir][amdgpu] Align Chipset with TargetParser (#107720)
Update the Chipset struct to follow the `IsaVersion` definition from
llvm's `TargetParser`. This is a follow up to
https://github.com/llvm/llvm-project/pull/106169#discussion_r1733955012.

* Add the stepping version. Note: This may break downstream code that
compares against the minor version directly.
* Use comparisons with full Chipset version where possible.

Note that we can't use the code in `TargetParser` directly because the
chipset utility is outside of `mlir/Target` that re-exports llvm's
target library.
2024-09-09 11:12:26 -04:00
Amy Wang
6634d44e5e [MLIR][Transform] Allow stateInitializer and stateExporter for applyTransforms (#101186)
This is discussed in RFC:

https://discourse.llvm.org/t/rfc-making-the-constructor-of-the-transformstate-class-protected/80377
2024-09-09 10:57:13 -04:00
Artem Kroviakov
663e9cec9c [Func][GPU] Use SymbolUserOpInterface in func::ConstantOp (#107748)
This PR enables `func::ConstantOp` creation and usage for device
functions inside GPU modules.
The current main returns error for referencing device functions via
`func::ConstantOp`, because during the `ConstantOp` verification it only
checks symbols in `ModuleOp` symbol table, which, of course, does not
contain device functions that are defined in `GPUModuleOp`. This PR
proposes a more general solution.

Co-authored-by: Artem Kroviakov <artem.kroviakov@tum.de>
2024-09-09 11:49:16 +02:00
Jerry-Ge
476b1a661f [TOSA] Update input name for Sin and Cos operators (#107606)
Update the dialect input names from input to input1 for Sin/Cos for
consistency.

Signed-off-by: Jerry Ge <jerry.ge@arm.com>
2024-09-09 10:26:39 +01:00
Rahul Joshi
b60c6cbc0b [MLIR][TableGen] Migrate MLIR backends to use const RecordKeeper (#107505)
- Migrate MLIR backends to use a const RecordKeeper reference.
2024-09-07 15:13:19 -07:00
Amr Hesham
a1e06f7674 [mlir][vector] Fix the enum type in vector::CombiningKind (#107681)
Change the enum type fo vector::CombiningKind from I32BitEnumAttrCaseBit
to I32EnumAttrCase

Fixes #107448
2024-09-07 19:59:25 +02:00
anjenner
4af249fe6e Add usub_cond and usub_sat operations to atomicrmw (#105568)
These both perform conditional subtraction, returning the minuend and
zero respectively, if the difference is negative.
2024-09-06 16:19:20 +01:00
Kazu Hirata
56b29074fe [mlir] Avoid repeated hash lookups (NFC) (#107518) 2024-09-06 07:41:52 -07:00
Johannes de Fine Licht
6ab5829ab7 [MLIR][LLVM][NFC] Remove dead interface and add namespace qualifiers (#107573)
The `GetResultPtrElementType` interface is dead now that MLIR has fully
moved to opaque pointers, and can be removed.

Add namespace qualifiers to all argument types and return types of
interface methods for when they're used outside of LLVM dialect.
2024-09-06 15:56:02 +02:00
Matthias Springer
c2e53b2d50 [mlir][Transforms][NFC] Dialect conversion: Fix typo and improve docs (#107539) 2024-09-06 10:35:07 +02:00