Commit Graph

1684 Commits

Author SHA1 Message Date
Mehdi Amini
43b2b2ebce Revert "Fix complex log1p accuracy with large abs values." (#88290)
Reverts llvm/llvm-project#88260

The test fails on the GCC7 buildbot.
2024-04-10 18:25:16 +02:00
Johannes Reifferscheid
49ef12a08c Fix complex log1p accuracy with large abs values. (#88260)
This ports https://github.com/openxla/xla/pull/10503 by @pearu. The new
implementation matches mpmath's results for most inputs, see caveats in
the linked pull request. In addition to the filecheck test here, the
accuracy was tested with XLA's complex_unary_op_test and its MLIR
emitters.
2024-04-10 14:55:56 +02:00
Kai Sasaki
51089e360e [mlir][complex] Support fast math flag for complex.tan op (#87919)
See
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-04-09 15:22:43 +09:00
Corentin Ferry
50b937331f [mlir] Add missing libm member operations to MathToLibm (#87981)
This PR adds support for lowering the following Math operations to
`libm` calls:
* `math.absf` -> `fabsf, fabs`
* `math.exp` -> `expf, exp`
* `math.exp2` -> `exp2f, exp2`
* `math.fma` -> `fmaf, fma`
* `math.log` -> `logf, log`
* `math.log2` -> `log2f, log2`
* `math.log10` -> `log10f, log10`
* `math.powf` -> `powf, pow`
* `math.sqrt` -> `sqrtf, sqrt`

These operations are direct members of `libm`, and do not seem to
require any special manipulations on their operands.
2024-04-09 00:41:12 +02:00
Kai Sasaki
a522dbbd62 [mlir][complex] Support fast math flag for complex.sign op (#87148)
We are going to support the fast math flag given in `complex.sign` op in
the conversion to standard dialect.

See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-04-06 15:35:10 +09:00
Diego Caballero
42a6ad7bad [mlir][Vector] Fix n-D vector.extract/insert lowering to LLVM (#87591)
The lowering of n-D vector.extract/insert ops to LLVM is not supported
but if one of these accidentally reaches the vector-to-llvm conversion
patterns, we end up with a kind of puzzling crash. This PR fixes that
crash and gracefully bails out in those cases.
2024-04-05 15:01:20 -07:00
Matthias Springer
a4c470555b [mlir][linalg] Fix builder API usage in RegionBuilderHelper (#87451)
Operations must be created with the supplied builder. Otherwise, the
dialect conversion / greedy pattern rewrite driver can break.

This commit fixes a crash in the dialect conversion:
```
within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: error: failed to legalize operation 'tosa.add'
  %0 = tosa.add %1, %arg2 : (tensor<10x10xf32>, tensor<*xf32>) -> tensor<*xf32>
       ^
within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: note: see current operation: %9 = "tosa.add"(%8, %arg2) : (tensor<10x10xf32>, tensor<*xf32>) -> tensor<*xf32>
mlir-opt: llvm-project/mlir/include/mlir/IR/UseDefLists.h:198: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed.
```

This commit is the proper fix for #87297 (which was reverted).
2024-04-04 11:17:59 +09:00
Simon Camphausen
1f268092c7 [mlir][EmitC] Add support for pointer and opaque types to subscript op (#86266)
For pointer types the indices are restricted to one integer-like
operand.
For opaque types no further restrictions are made.
2024-04-03 13:06:14 +02:00
Mitch Phillips
56aeac47ab Revert "[mlir] Reland the dialect conversion hanging use fix (#87297)"
This reverts commit 49a4ec20a8.

Reason: Broke the ASan build bot with a memory leak. See the comments at
https://github.com/llvm/llvm-project/pull/87297
for more information.
2024-04-02 14:46:56 +02:00
Rob Suderman
49a4ec20a8 [mlir] Reland the dialect conversion hanging use fix (#87297)
Dialect conversion sometimes can have a hanging use of an argument.
Ensured that argument uses are dropped before removing the block.
2024-04-01 19:22:49 -07:00
Victor Perez
8827ff92b9 [MLIR][Arith] Add rounding mode attribute to truncf (#86152)
Add rounding mode attribute to `arith`. This attribute can be used in
different FP `arith` operations to control rounding mode. Rounding modes
correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding.

As this is not supported in dialects other than LLVM, conversion should fail for
now in case this attribute is present.

---------

Signed-off-by: Victor Perez <victor.perez@codeplay.com>
2024-04-01 11:57:14 +02:00
Mehdi Amini
23941019c0 Revert "[mlir]Fix dialect conversion drop uses" (#87205)
Reverts llvm/llvm-project#86991

Some bots are broken with a leak being detected now.
2024-03-31 23:25:51 +02:00
Rob Suderman
0030fc4ac7 [mlir]Fix dialect conversion drop uses (#86991)
Before deleting the block we need to drop uses to the surrounding args.
If this is not performed dialect conversion failures can result in a
failure to remove args (despite the block having no remaining uses).
2024-03-29 15:04:40 -07:00
Ivan Butygin
f050a098b5 [mlir][spirv] Remove enableFastMathMode flag from SPIR-V conversion (#86578)
Most of arith/math ops support fastmath attribute, use it instead of
global flag.
2024-03-26 20:06:06 +03:00
Rafael Ubal
26d896f368 Fixes in 'tosa.reshape' lowering and folder (#85798)
- Revamped lowering conversion pattern for `tosa.reshape` to handle previously unsupported combinations of dynamic dimensions in input and output tensors. The lowering strategy continues to rely on pairs `tensor.collapse_shape` + `tensor.expand_shape`, which allow for downstream fusion with surrounding `linalg.generic` ops.

- Fixed bug in canonicalization pattern `ReshapeOp::fold()` in `TosaCanonicalizations.cpp`. The input and result types being equal is not a sufficient condition for folding. If there is more than 1 dynamic dimension in the input and result types, a productive reshape could still occur.

- This work exposed the fact that bufferization does not properly handle a `tensor.collapse_shape` op producing a 0D tensor from a dynamically shaped one due to a limitation in `memref.collapse_shape`. While the proper way to address this would involve releasing the `memref.collapse_shape` restriction and verifying correct bufferization, this is left as possible future work. For now, this scenario is avoided by casting the `tosa.reshape` input tensor to a static shape if necessary (see `inferReshapeInputType()`.

- An extended set of tests are intended to cover relevant conversion paths. Tests are named using pattern `test_reshape_<rank>_{up|down|same}_{s2s|s2d|d2s|d2d}_{explicit|auto}[_empty][_identity]`, where:
	
  - `<rank>` is the input rank (e.g., 3d, 6d)
  - `{up|down|same}` indicates whether the reshape increases, decreases, or retains the input rank.
  - `{s2s|s2d|d2s|d2d}` indicates whether reshape converts a statically shaped input to a statically shaped result (`s2s`), a statically shaped input to a dynamically shaped result (`s2d`), etc.
  - `{explicit|auto}` is used to indicate that all values in the `new_shape` attribute are >=0 (`explicit`) or that a -1 placeholder value is used (`auto`).
  - `empty` is used to indicate that `new_shape` includes a component set to 0.
  - `identity` is used when the input and result shapes are the same.
2024-03-26 10:52:55 -04:00
Kai Sasaki
7d2d8e2a72 [mlir][complex] Fastmath flag for the trigonometric ops in complex (#85563)
Support Fastmath flag to convert trigonometric ops in the complex
dialect.

See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-03-25 10:59:42 +09:00
Matthias Gehre
71db971521 [mlir][emitc] Arith to EmitC: Handle addi, subi and muli (#86120)
Important to consider that `arith` has wrap around semantics, and in C++
signed overflow is UB.
Unless the operation guarantees that no signed overflow happens, we will
perform the arithmetic in an equivalent unsigned type.
`bool` also doesn't wrap around in C++, and is not addressed here.
2024-03-22 15:39:52 +01:00
Finn Plummer
38f8a3cf0d [mlir][spirv] Improve folding of MemRef to SPIRV Lowering (#85433)
Investigate the lowering of MemRef Load/Store ops and implement
additional folding of created ops

Aims to improve readability of generated lowered SPIR-V code.

Part of work llvm#70704
2024-03-21 08:49:27 -07:00
Matthias Gehre
0aa6d57e57 [MLIR] Add initial convert-memref-to-emitc pass (#85389)
This converts `memref.alloca`, `memref.load` & `memref.store` to
`emitc.variable`, `emitc.subscript` and `emitc.assign`.
2024-03-21 14:27:37 +01:00
Johannes Reifferscheid
a6a9215b93 Lower shuffle to single-result form if possible. (#84321)
We currently always lower shuffle to the struct-returning variant. I saw
some cases where this survived all the way through ptx, resulting in
increased register usage. The easiest fix is to simply lower to the
single-result version when the predicate is unused.
2024-03-21 10:33:49 +01:00
Sergio Afonso
d84252e064 [MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393)
This patch proposes the renaming of certain OpenMP dialect operations with the
goal of improving readability and following a uniform naming convention for
MLIR operations and associated classes. In particular, the following operations
are renamed:

- `omp.map_info` -> `omp.map.info`
- `omp.target_update_data` -> `omp.target_update`
- `omp.ordered_region` -> `omp.ordered.region`
- `omp.cancellationpoint` -> `omp.cancellation_point`
- `omp.bounds` -> `omp.map.bounds`
- `omp.reduction.declare` -> `omp.declare_reduction`

Also, the following MLIR operation classes have been renamed:

- `omp::TaskLoopOp` -> `omp::TaskloopOp`
- `omp::TaskGroupOp` -> `omp::TaskgroupOp`
- `omp::DataBoundsOp` -> `omp::MapBoundsOp`
- `omp::DataOp` -> `omp::TargetDataOp`
- `omp::EnterDataOp` -> `omp::TargetEnterDataOp`
- `omp::ExitDataOp` -> `omp::TargetExitDataOp`
- `omp::UpdateDataOp` -> `omp::TargetUpdateOp`
- `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp`
- `omp::WsLoopOp` -> `omp::WsloopOp`
2024-03-20 11:19:38 +00:00
Finn Plummer
8cbb8ac02c [mlir][spirv] Add folding for SelectOp (#85430)
Add missing constant propogation folder for spirv.Select

Implement additional folding when both selections are equivalent or the
condition is a constant Scalar/SplatVector.

Allows for constant folding in the IndexToSPIRV pass.

Part of work #70704
2024-03-19 13:27:35 -07:00
Guray Ozen
8819f87998 [MLIR][NVVM] Add barrier.arrive (#85412)
PR adds `nvvm.barrier.arrive` Op. It is useful op for producer consumer
modeling.
2024-03-19 16:51:32 +01:00
Kai Sasaki
34ba90745f [mlir][complex] Support Fastmath flag in conversion of complex.sqrt to standard (#85019)
When converting complex.sqrt op to standard, we need to keep the fast
math flag given to the op.

See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-03-14 15:53:28 +09:00
Han-Chung Wang
bb82092de7 [mlir][tensor] Make getMixedPadImpl return static values when possible. (#85016)
If low and high are constants (i.e., not attributes), users still prefer
attributes. Otherwise, there could be failures in type inference. A
failure is introduced by
60e562d11a,
see the drop_known_unit_constant_low_high test for more details.
2024-03-13 08:52:05 -07:00
Marius Brehler
19266ca389 [mlir][EmitC] Add an emitc.conditional operator (#84883)
This adds an `emitc.conditional` operation for the ternary conditional
operator. Furthermore, this adds a converion from `arith.select` to the
new op.
2024-03-12 11:27:26 +01:00
Krzysztof Drewniak
b05c15259b [mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942)
On some architectures (currently gfx90a, gfx94*, and gfx10**), we can
implement an LDS barrier using compiler intrinsics instead of inline
assembly, improving optimization possibilities and decreasing the
fragility of the underlying code.

Other AMDGPU chipsets continue to require inline assembly to implement
this barrier, as, by the default, the LLVM backend will insert waits on
global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure
memory watchpoints set by debuggers work correctly.

Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff
between debugability and performance. The documentation, as well as the
generated inline assembly, have been updated to explicitly call
attention to this fact.

For chipsets that did not require the inline assembly hack, we move to
the s.waitcnt and s.barrier intrinsics, which have been added to the
ROCDL dialect. The magic constants used as an argument to the waitcnt
intrinsic can be derived from
llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
2024-03-11 10:06:49 -05:00
Tina Jung
0ddb122147 [mlir][emitc] Arith to EmitC conversion: constants (#83798)
* Add a conversion from `arith.constant` to `emitc.constant`.
* Drop the translation for `arith.constant`s.
2024-03-08 09:16:10 +01:00
Marius Brehler
c40146c214 [mlir][EmitC] Add Arith to EmitC conversions (#84151)
This adds patterns and a pass to convert the Arith dialect to EmitC. For
now, this covers arithemtic binary ops operating on floating point
types.

It is not checked within the patterns whether the types, such as the
Tensor type, are supported in the respective EmitC operations. If
unsupported types should be converted, the conversion will fail anyway
because no legal EmitC operation can be created. This can clearly be
improved in a follow up, also resulting in better error messages.
Functions for such checks should not solely be used in the conversions
and should also be (re)used in the verifier.
2024-03-07 11:34:11 +01:00
Kai Sasaki
b930b14d5d [mlir][complex] Support fast math flag in converting complex.atan2 op (#82101)
When converting complex.atan2 op to standard, we need to keep the fast
math flag given to the op.

See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-03-06 13:33:06 +09:00
Artem Tyurin
def16bca81 [mlir][spirv] Retain nontemporal attribute when converting memref load/store (#82119)
Fixes #77156.
2024-03-02 15:49:18 -08:00
Matthias Gehre
8ec28af8ea Reapply "[mlir][PDL] Add support for native constraints with results (#82760)"
with a small stack-use-after-scope fix in getConstraintPredicates()

This reverts commit c80e6edba4.
2024-03-02 20:57:30 +01:00
Matthias Gehre
c80e6edba4 Revert "[mlir][PDL] Add support for native constraints with results (#82760)"
Due to buildbot failure https://lab.llvm.org/buildbot/#/builders/88/builds/72130

This reverts commit dca32a3b59.
2024-03-01 07:44:30 +01:00
Matthias Gehre
dca32a3b59 [mlir][PDL] Add support for native constraints with results (#82760)
From https://reviews.llvm.org/D153245

This adds support for native PDL (and PDLL) C++ constraints to return
results.

This is useful for situations where a pattern checks for certain
constraints of multiple interdependent attributes and computes a new
attribute value based on them. Currently, for such an example it is
required to escape to C++ during matching to perform the check and after
a successful match again escape to native C++ to perform the computation
during the rewriting part of the pattern. With this work we can do the
computation in C++ during matching and use the result in the rewriting
part of the pattern. Effectively this enables a choice in the trade-off
of memory consumption during matching vs recomputation of values.

This is an example of a situation where this is useful: We have two
operations with certain attributes that have interdependent constraints.
For instance `attr_foo: one_of [0, 2, 4, 8], attr_bar: one_of [0, 2, 4,
8]` and `attr_foo == attr_bar`. The pattern should only match if all
conditions are true. The new operation should be created with a new
attribute which is computed from the two matched attributes e.g.
`attr_baz = attr_foo * attr_bar`. For the check we already escape to
native C++ and have all values at hand so it makes sense to directly
compute the new attribute value as well:

```
Constraint checkAndCompute(attr0: Attr, attr1: Attr) -> Attr;

Pattern example with benefit(1) {
    let foo = op<test.foo>() {attr = attr_foo : Attr};
    let bar = op<test.bar>(foo) {attr = attr_bar : Attr};
    let attr_baz = checkAndCompute(attr_foo, attr_bar);
    rewrite bar with {
        let baz = op<test.baz> {attr=attr_baz};
        replace bar with baz;
    };
}
```
To achieve this the following notable changes were necessary:
PDLL:
- Remove check in PDLL parser that prevented native constraints from
returning results

PDL:
- Change PDL definition of pdl.apply_native_constraint to allow variadic
results

PDL_interp:
- Change PDL_interp definition of pdl_interp.apply_constraint to allow
variadic results

PDLToPDLInterp Pass:
The input to the pass is an arbitrary number of PDL patterns. The pass
collects the predicates that are required to match all of the pdl
patterns and establishes an ordering that allows creation of a single
efficient matcher function to match all of them. Values that are matched
and possibly used in the rewriting part of a pattern are represented as
positions. This allows fusion and thus reusing a single position for
multiple matching patterns. Accordingly, we introduce
ConstraintPosition, which records the type and index of the result of
the constraint. The problem is for the corresponding value to be used in
the rewriting part of a pattern it has to be an input to the
pdl_interp.record_match operation, which is generated early during the
pass such that its surrounding block can be referred to by branching
operations. In consequence the value has to be materialized after the
original pdl.apply_native_constraint has been deleted but before we get
the chance to generate the corresponding pdl_interp.apply_constraint
operation. We solve this by emitting a placeholder value when a
ConstraintPosition is evaluated. These placeholder values (due to fusion
there may be multiple for one constraint result) are replaced later when
the actual pdl_interp.apply_constraint operation is created.

Changes since the phabricator review:
- Addressed all comments
- In particular, removed registerConstraintFunctionWithResults and
instead changed registerConstraintFunction so that contraint functions
always have results (empty by default)
- Thus we don't need to reuse `rewriteFunctions` to store constraint
functions with results anymore, and can instead use
`constraintFunctions`
- Perform a stable sort of ConstraintQuestion, so that
ConstraintQuestion appear before other ConstraintQuestion that use their
results.
- Don't create placeholders for pdl_interp::ApplyConstraintOp. Instead
generate the `pdl_interp::ApplyConstraintOp` before generating the
successor block.
- Fixed a test failure in the pdl python bindings


Original code by @martin-luecke

Co-authored-by: martin-luecke <martinpaul.luecke@amd.com>
2024-03-01 07:29:49 +01:00
Rishabh Bali
915fce0402 [mlir][affine] Enable ConvertAffineToStandard pass to handle affine.delinearize_index Op. (#82189)
This PR, aims to enable the `ConvertAffineToStandard` to handle
`affine.dilinearize_index` Operation.

Fixes #78458
2024-02-28 18:58:53 +05:30
Krzysztof Drewniak
4cba5957e6 [mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501)
In order to ensure operations lower correctly (especially
memref.addrspacecast, which relies on the data layout benig set
correctly then dealing with dynamic memrefs) and to prevent compilation
issues later down the line, set the `llvm.data_layout` attribute on GPU
modules when lowering their contents to a ROCDL / AMDGPU target.

If there's a good way to test the embedded string to prevent it from
going out of sync with the LLVM TargetMachine, I'd appreciate hearing
about it. (Or, alternatively, if there's a place I could farctor the
string out to).
2024-02-27 09:59:50 -06:00
Kai Sasaki
288d317fff [mlir][complex] Support Fastmath flag in conversion of complex.div to standard (#82729)
Support Fastmath flag to convert `complex.div` to standard dialects. 

See:
https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981
2024-02-27 18:51:24 +09:00
Benjamin Maxwell
78890904c4 [mlir][math] Propagate scalability in convert-math-to-llvm (#82635)
This also generally increases the coverage of scalable vector types in
the math-to-llvm tests.
2024-02-23 09:48:58 +00:00
Matthias Gehre
c1e9883a81 [TOSA] TosaToLinalg: fix int64_t min/max lowering of clamp (#82641)
tosa.clamp takes `min`/`max` attributes as i64, so ensure that the
lowering to linalg works for the whole range.

Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>
2024-02-22 21:16:33 +01:00
mlevesquedion
d4fd20258f [mlir] Use arith max or min ops instead of cmp + select (#82178)
I believe the semantics should be the same, but this saves 1 op and simplifies the code.

For example, the following two instructions:

```
%2 = cmp sgt %0, %1
%3 = select %2, %0, %1
```

Are equivalent to:

```
%2 = maxsi %0 %1
```
2024-02-21 12:28:05 -08:00
Benjamin Maxwell
a1a6860314 [mlir][VectorOps] Add unrolling for n-D vector.interleave ops (#80967)
This unrolls n-D vector.interleave ops like:

```mlir
vector.interleave %i, %j : vector<6x3xf32>
```

To a sequence of 1-D operations:
```mlir
%i_0 = vector.extract %i[0] 
%j_0 = vector.extract %j[0] 
%res_0 = vector.interleave %i_0, %j_0 : vector<3xf32>
vector.insert %res_0, %result[0] :
// ... repeated x6
```

The 1-D operations can then be directly lowered to LLVM.

Depends on: #80966
2024-02-20 14:33:33 +00:00
Kareem Ergawy
118a2a52fd [MLIR][OpenMP] Support llvm conversion for omp.private regions (#81414)
Introduces conversion of `omp.private`'s regions to the LLVM dialect.
This reuses the already existing conversion pattern for
`ReducetionDeclareOp` and repurposes it to be used for multi-region ops
as well.
2024-02-16 05:57:41 +01:00
David Truby
be9f8ffd81 [mlir][flang][openmp] Rework wsloop reduction operations (#80019)
This patch reworks the way that wsloop reduction operations function to
better match the expected semantics from the OpenMP specification,
following the rework of parallel reductions.

The new semantics create a private reduction variable as a block
argument which should be used normally for all operations on that
variable in the region; this private variable is then combined with the
others into the shared variable. This way no special omp.reduction
operations are needed inside the region. These block arguments follow
the loop control block arguments.

---------

Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>
2024-02-13 19:13:54 +00:00
Benjamin Maxwell
79ce2c93ae [mlir][VectorOps] Add conversion of 1-D vector.interleave ops to LLVM (#80966)
The 1-D case directly maps to LLVM intrinsics. The n-D case will be
handled by unrolling to 1-D first (in a later patch).

Depends on: #80965
2024-02-13 10:47:33 +00:00
Guray Ozen
0a600c34c8 [mlir][nvgpu] Make phaseParity of mbarrier.try_wait i1 (#81460)
Currently, `phaseParity` argument of `nvgpu.mbarrier.try_wait.parity` is
index. This can cause a problem if it's passed any value different than
0 or 1. Because the PTX instruction only accepts even or odd phase. This
PR makes phaseParity argument i1 to avoid misuse.

Here is the information from PTX doc:

```
The .parity variant of the instructions test for the completion of the phase indicated 
by the operand phaseParity, which is the integer parity of either the current phase or 
the immediately preceding phase of the mbarrier object. An even phase has integer 
parity 0 and an odd phase has integer parity of 1. So the valid values of phaseParity 
operand are 0 and 1.
```
See for more information:

https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-mbarrier-try-wait
2024-02-13 09:50:34 +01:00
Kai Sasaki
b17348c3b5 [mlir][complex] Prevent underflow in complex.abs (#79786) (#81092) 2024-02-11 07:35:19 +09:00
Kolya Panchenko
9f6c00565a [MLIR][VCIX] Support VCIX intrinsics in LLVMIR dialect (#75875)
The changeset extends LLVMIR intrinsics with VCIX intrinsics.
The VCIX intrinsics allow MLIR users to interact with RISC-V
co-processors that are compatible with `XSfvcp` extension

Source:
https://www.sifive.com/document-file/sifive-vector-coprocessor-interface-vcix-software
2024-02-07 15:23:28 -05:00
Thomas Preud'homme
88c830a1a5 [TOSA] Fix avgpool2d accum in wider type (#80849)
Truncate result of avgpool when accumulation is done in a wider type
than the result element type, such as when doing a f16 avgpool2d with a
f32 accumulator type.
2024-02-07 10:03:19 +00:00
Cullen Rhodes
fff86c6111 [mlir][ArmSME] Support 4-way widening outer products (#79288)
This patch introduces support for 4-way widening outer products. This
enables the fusion of 4 'arm_sme.outerproduct' operations that are
chained via the accumulator into single widened operations.

Changes:

- Adds the following operations:
  - smopa_4way, smops_4way
  - umopa_4way, umops_4way
  - sumopa_4way, sumops_4way
  - sumopa_4way, sumops_4way
- Implements conversions for the above ops to intrinsics in ArmSMEToLLVM.
- Extends 'arm-sme-outer-product' pass.

For a detailed description of these operations see the
'arm_sme.smopa_4way' description.
2024-02-07 08:17:47 +00:00
Marius Brehler
9a87c5d440 [mlir][EmitC] Add support for external functions (#80547)
This adds a conversion from an externaly defined `func.func`, a
`func.func` without function body, to an `emitc.func` with an `extern`
specifier.
2024-02-05 16:58:10 +01:00