clang-p2996

Author	SHA1	Message	Date
Mehdi Amini	43b2b2ebce	Revert "Fix complex log1p accuracy with large abs values." (#88290 ) Reverts llvm/llvm-project#88260 The test fails on the GCC7 buildbot.	2024-04-10 18:25:16 +02:00
Johannes Reifferscheid	49ef12a08c	Fix complex log1p accuracy with large abs values. (#88260 ) This ports https://github.com/openxla/xla/pull/10503 by @pearu. The new implementation matches mpmath's results for most inputs, see caveats in the linked pull request. In addition to the filecheck test here, the accuracy was tested with XLA's complex_unary_op_test and its MLIR emitters.	2024-04-10 14:55:56 +02:00
Kai Sasaki	51089e360e	[mlir][complex] Support fast math flag for complex.tan op (#87919 ) See https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-04-09 15:22:43 +09:00
Corentin Ferry	50b937331f	[mlir] Add missing libm member operations to MathToLibm (#87981 ) This PR adds support for lowering the following Math operations to `libm` calls: * `math.absf` -> `fabsf, fabs` * `math.exp` -> `expf, exp` * `math.exp2` -> `exp2f, exp2` * `math.fma` -> `fmaf, fma` * `math.log` -> `logf, log` * `math.log2` -> `log2f, log2` * `math.log10` -> `log10f, log10` * `math.powf` -> `powf, pow` * `math.sqrt` -> `sqrtf, sqrt` These operations are direct members of `libm`, and do not seem to require any special manipulations on their operands.	2024-04-09 00:41:12 +02:00
Kai Sasaki	a522dbbd62	[mlir][complex] Support fast math flag for complex.sign op (#87148 ) We are going to support the fast math flag given in `complex.sign` op in the conversion to standard dialect. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-04-06 15:35:10 +09:00
Diego Caballero	42a6ad7bad	[mlir][Vector] Fix n-D vector.extract/insert lowering to LLVM (#87591 ) The lowering of n-D vector.extract/insert ops to LLVM is not supported but if one of these accidentally reaches the vector-to-llvm conversion patterns, we end up with a kind of puzzling crash. This PR fixes that crash and gracefully bails out in those cases.	2024-04-05 15:01:20 -07:00
Matthias Springer	a4c470555b	[mlir][linalg] Fix builder API usage in `RegionBuilderHelper` (#87451 ) Operations must be created with the supplied builder. Otherwise, the dialect conversion / greedy pattern rewrite driver can break. This commit fixes a crash in the dialect conversion: ``` within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: error: failed to legalize operation 'tosa.add' %0 = tosa.add %1, %arg2 : (tensor<10x10xf32>, tensor<xf32>) -> tensor<xf32> ^ within split at llvm-project/mlir/test/Conversion/TosaToLinalg/tosa-to-linalg-invalid.mlir:1 offset :8:8: note: see current operation: %9 = "tosa.add"(%8, %arg2) : (tensor<10x10xf32>, tensor<xf32>) -> tensor<xf32> mlir-opt: llvm-project/mlir/include/mlir/IR/UseDefLists.h:198: mlir::IRObjectWithUseList<mlir::OpOperand>::~IRObjectWithUseList() [OperandType = mlir::OpOperand]: Assertion `use_empty() && "Cannot destroy a value that still has uses!"' failed. ``` This commit is the proper fix for #87297 (which was reverted).	2024-04-04 11:17:59 +09:00
Simon Camphausen	1f268092c7	[mlir][EmitC] Add support for pointer and opaque types to subscript op (#86266 ) For pointer types the indices are restricted to one integer-like operand. For opaque types no further restrictions are made.	2024-04-03 13:06:14 +02:00
Mitch Phillips	56aeac47ab	Revert "[mlir] Reland the dialect conversion hanging use fix (#87297 )" This reverts commit `49a4ec20a8`. Reason: Broke the ASan build bot with a memory leak. See the comments at https://github.com/llvm/llvm-project/pull/87297 for more information.	2024-04-02 14:46:56 +02:00
Rob Suderman	49a4ec20a8	[mlir] Reland the dialect conversion hanging use fix (#87297 ) Dialect conversion sometimes can have a hanging use of an argument. Ensured that argument uses are dropped before removing the block.	2024-04-01 19:22:49 -07:00
Victor Perez	8827ff92b9	[MLIR][Arith] Add rounding mode attribute to `truncf` (#86152 ) Add rounding mode attribute to `arith`. This attribute can be used in different FP `arith` operations to control rounding mode. Rounding modes correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding. As this is not supported in dialects other than LLVM, conversion should fail for now in case this attribute is present. --------- Signed-off-by: Victor Perez <victor.perez@codeplay.com>	2024-04-01 11:57:14 +02:00
Mehdi Amini	23941019c0	Revert "[mlir]Fix dialect conversion drop uses" (#87205 ) Reverts llvm/llvm-project#86991 Some bots are broken with a leak being detected now.	2024-03-31 23:25:51 +02:00
Rob Suderman	0030fc4ac7	[mlir]Fix dialect conversion drop uses (#86991 ) Before deleting the block we need to drop uses to the surrounding args. If this is not performed dialect conversion failures can result in a failure to remove args (despite the block having no remaining uses).	2024-03-29 15:04:40 -07:00
Ivan Butygin	f050a098b5	[mlir][spirv] Remove `enableFastMathMode` flag from SPIR-V conversion (#86578 ) Most of arith/math ops support fastmath attribute, use it instead of global flag.	2024-03-26 20:06:06 +03:00
Rafael Ubal	26d896f368	Fixes in 'tosa.reshape' lowering and folder (#85798 ) - Revamped lowering conversion pattern for `tosa.reshape` to handle previously unsupported combinations of dynamic dimensions in input and output tensors. The lowering strategy continues to rely on pairs `tensor.collapse_shape` + `tensor.expand_shape`, which allow for downstream fusion with surrounding `linalg.generic` ops. - Fixed bug in canonicalization pattern `ReshapeOp::fold()` in `TosaCanonicalizations.cpp`. The input and result types being equal is not a sufficient condition for folding. If there is more than 1 dynamic dimension in the input and result types, a productive reshape could still occur. - This work exposed the fact that bufferization does not properly handle a `tensor.collapse_shape` op producing a 0D tensor from a dynamically shaped one due to a limitation in `memref.collapse_shape`. While the proper way to address this would involve releasing the `memref.collapse_shape` restriction and verifying correct bufferization, this is left as possible future work. For now, this scenario is avoided by casting the `tosa.reshape` input tensor to a static shape if necessary (see `inferReshapeInputType()`. - An extended set of tests are intended to cover relevant conversion paths. Tests are named using pattern `test_reshape_<rank>_{up\|down\|same}_{s2s\|s2d\|d2s\|d2d}_{explicit\|auto}[_empty][_identity]`, where: - `<rank>` is the input rank (e.g., 3d, 6d) - `{up\|down\|same}` indicates whether the reshape increases, decreases, or retains the input rank. - `{s2s\|s2d\|d2s\|d2d}` indicates whether reshape converts a statically shaped input to a statically shaped result (`s2s`), a statically shaped input to a dynamically shaped result (`s2d`), etc. - `{explicit\|auto}` is used to indicate that all values in the `new_shape` attribute are >=0 (`explicit`) or that a -1 placeholder value is used (`auto`). - `empty` is used to indicate that `new_shape` includes a component set to 0. - `identity` is used when the input and result shapes are the same.	2024-03-26 10:52:55 -04:00
Kai Sasaki	7d2d8e2a72	[mlir][complex] Fastmath flag for the trigonometric ops in complex (#85563 ) Support Fastmath flag to convert trigonometric ops in the complex dialect. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-03-25 10:59:42 +09:00
Matthias Gehre	71db971521	[mlir][emitc] Arith to EmitC: Handle addi, subi and muli (#86120 ) Important to consider that `arith` has wrap around semantics, and in C++ signed overflow is UB. Unless the operation guarantees that no signed overflow happens, we will perform the arithmetic in an equivalent unsigned type. `bool` also doesn't wrap around in C++, and is not addressed here.	2024-03-22 15:39:52 +01:00
Finn Plummer	38f8a3cf0d	[mlir][spirv] Improve folding of MemRef to SPIRV Lowering (#85433 ) Investigate the lowering of MemRef Load/Store ops and implement additional folding of created ops Aims to improve readability of generated lowered SPIR-V code. Part of work llvm#70704	2024-03-21 08:49:27 -07:00
Matthias Gehre	0aa6d57e57	[MLIR] Add initial convert-memref-to-emitc pass (#85389 ) This converts `memref.alloca`, `memref.load` & `memref.store` to `emitc.variable`, `emitc.subscript` and `emitc.assign`.	2024-03-21 14:27:37 +01:00
Johannes Reifferscheid	a6a9215b93	Lower shuffle to single-result form if possible. (#84321 ) We currently always lower shuffle to the struct-returning variant. I saw some cases where this survived all the way through ptx, resulting in increased register usage. The easiest fix is to simply lower to the single-result version when the predicate is unused.	2024-03-21 10:33:49 +01:00
Sergio Afonso	d84252e064	[MLIR][OpenMP] NFC: Uniformize OpenMP ops names (#85393 ) This patch proposes the renaming of certain OpenMP dialect operations with the goal of improving readability and following a uniform naming convention for MLIR operations and associated classes. In particular, the following operations are renamed: - `omp.map_info` -> `omp.map.info` - `omp.target_update_data` -> `omp.target_update` - `omp.ordered_region` -> `omp.ordered.region` - `omp.cancellationpoint` -> `omp.cancellation_point` - `omp.bounds` -> `omp.map.bounds` - `omp.reduction.declare` -> `omp.declare_reduction` Also, the following MLIR operation classes have been renamed: - `omp::TaskLoopOp` -> `omp::TaskloopOp` - `omp::TaskGroupOp` -> `omp::TaskgroupOp` - `omp::DataBoundsOp` -> `omp::MapBoundsOp` - `omp::DataOp` -> `omp::TargetDataOp` - `omp::EnterDataOp` -> `omp::TargetEnterDataOp` - `omp::ExitDataOp` -> `omp::TargetExitDataOp` - `omp::UpdateDataOp` -> `omp::TargetUpdateOp` - `omp::ReductionDeclareOp` -> `omp::DeclareReductionOp` - `omp::WsLoopOp` -> `omp::WsloopOp`	2024-03-20 11:19:38 +00:00
Finn Plummer	8cbb8ac02c	[mlir][spirv] Add folding for SelectOp (#85430 ) Add missing constant propogation folder for spirv.Select Implement additional folding when both selections are equivalent or the condition is a constant Scalar/SplatVector. Allows for constant folding in the IndexToSPIRV pass. Part of work #70704	2024-03-19 13:27:35 -07:00
Guray Ozen	8819f87998	[MLIR][NVVM] Add barrier.arrive (#85412 ) PR adds `nvvm.barrier.arrive` Op. It is useful op for producer consumer modeling.	2024-03-19 16:51:32 +01:00
Kai Sasaki	34ba90745f	[mlir][complex] Support Fastmath flag in conversion of complex.sqrt to standard (#85019 ) When converting complex.sqrt op to standard, we need to keep the fast math flag given to the op. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-03-14 15:53:28 +09:00
Han-Chung Wang	bb82092de7	[mlir][tensor] Make getMixedPadImpl return static values when possible. (#85016 ) If low and high are constants (i.e., not attributes), users still prefer attributes. Otherwise, there could be failures in type inference. A failure is introduced by `60e562d11a`, see the drop_known_unit_constant_low_high test for more details.	2024-03-13 08:52:05 -07:00
Marius Brehler	19266ca389	[mlir][EmitC] Add an `emitc.conditional` operator (#84883 ) This adds an `emitc.conditional` operation for the ternary conditional operator. Furthermore, this adds a converion from `arith.select` to the new op.	2024-03-12 11:27:26 +01:00
Krzysztof Drewniak	b05c15259b	[mlir][AMDGPU] Improve amdgpu.lds_barrier, add warnings (#77942 ) On some architectures (currently gfx90a, gfx94, and gfx10*), we can implement an LDS barrier using compiler intrinsics instead of inline assembly, improving optimization possibilities and decreasing the fragility of the underlying code. Other AMDGPU chipsets continue to require inline assembly to implement this barrier, as, by the default, the LLVM backend will insert waits on global memory (s_waintcnt vmcnt(0)) before barriers in order to ensure memory watchpoints set by debuggers work correctly. Use of amdgpu.lds_barrier, on these architectures, imposes a tradeoff between debugability and performance. The documentation, as well as the generated inline assembly, have been updated to explicitly call attention to this fact. For chipsets that did not require the inline assembly hack, we move to the s.waitcnt and s.barrier intrinsics, which have been added to the ROCDL dialect. The magic constants used as an argument to the waitcnt intrinsic can be derived from llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp	2024-03-11 10:06:49 -05:00
Tina Jung	0ddb122147	[mlir][emitc] Arith to EmitC conversion: constants (#83798 ) * Add a conversion from `arith.constant` to `emitc.constant`. * Drop the translation for `arith.constant`s.	2024-03-08 09:16:10 +01:00
Marius Brehler	c40146c214	[mlir][EmitC] Add Arith to EmitC conversions (#84151 ) This adds patterns and a pass to convert the Arith dialect to EmitC. For now, this covers arithemtic binary ops operating on floating point types. It is not checked within the patterns whether the types, such as the Tensor type, are supported in the respective EmitC operations. If unsupported types should be converted, the conversion will fail anyway because no legal EmitC operation can be created. This can clearly be improved in a follow up, also resulting in better error messages. Functions for such checks should not solely be used in the conversions and should also be (re)used in the verifier.	2024-03-07 11:34:11 +01:00
Kai Sasaki	b930b14d5d	[mlir][complex] Support fast math flag in converting complex.atan2 op (#82101 ) When converting complex.atan2 op to standard, we need to keep the fast math flag given to the op. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-03-06 13:33:06 +09:00
Artem Tyurin	def16bca81	[mlir][spirv] Retain nontemporal attribute when converting memref load/store (#82119 ) Fixes #77156.	2024-03-02 15:49:18 -08:00
Matthias Gehre	8ec28af8ea	Reapply "[mlir][PDL] Add support for native constraints with results (#82760 )" with a small stack-use-after-scope fix in getConstraintPredicates() This reverts commit `c80e6edba4`.	2024-03-02 20:57:30 +01:00
Matthias Gehre	c80e6edba4	Revert "[mlir][PDL] Add support for native constraints with results (#82760 )" Due to buildbot failure https://lab.llvm.org/buildbot/#/builders/88/builds/72130 This reverts commit `dca32a3b59`.	2024-03-01 07:44:30 +01:00
Matthias Gehre	dca32a3b59	[mlir][PDL] Add support for native constraints with results (#82760 ) From https://reviews.llvm.org/D153245 This adds support for native PDL (and PDLL) C++ constraints to return results. This is useful for situations where a pattern checks for certain constraints of multiple interdependent attributes and computes a new attribute value based on them. Currently, for such an example it is required to escape to C++ during matching to perform the check and after a successful match again escape to native C++ to perform the computation during the rewriting part of the pattern. With this work we can do the computation in C++ during matching and use the result in the rewriting part of the pattern. Effectively this enables a choice in the trade-off of memory consumption during matching vs recomputation of values. This is an example of a situation where this is useful: We have two operations with certain attributes that have interdependent constraints. For instance `attr_foo: one_of [0, 2, 4, 8], attr_bar: one_of [0, 2, 4, 8]` and `attr_foo == attr_bar`. The pattern should only match if all conditions are true. The new operation should be created with a new attribute which is computed from the two matched attributes e.g. `attr_baz = attr_foo * attr_bar`. For the check we already escape to native C++ and have all values at hand so it makes sense to directly compute the new attribute value as well: ``` Constraint checkAndCompute(attr0: Attr, attr1: Attr) -> Attr; Pattern example with benefit(1) { let foo = op<test.foo>() {attr = attr_foo : Attr}; let bar = op<test.bar>(foo) {attr = attr_bar : Attr}; let attr_baz = checkAndCompute(attr_foo, attr_bar); rewrite bar with { let baz = op<test.baz> {attr=attr_baz}; replace bar with baz; }; } ``` To achieve this the following notable changes were necessary: PDLL: - Remove check in PDLL parser that prevented native constraints from returning results PDL: - Change PDL definition of pdl.apply_native_constraint to allow variadic results PDL_interp: - Change PDL_interp definition of pdl_interp.apply_constraint to allow variadic results PDLToPDLInterp Pass: The input to the pass is an arbitrary number of PDL patterns. The pass collects the predicates that are required to match all of the pdl patterns and establishes an ordering that allows creation of a single efficient matcher function to match all of them. Values that are matched and possibly used in the rewriting part of a pattern are represented as positions. This allows fusion and thus reusing a single position for multiple matching patterns. Accordingly, we introduce ConstraintPosition, which records the type and index of the result of the constraint. The problem is for the corresponding value to be used in the rewriting part of a pattern it has to be an input to the pdl_interp.record_match operation, which is generated early during the pass such that its surrounding block can be referred to by branching operations. In consequence the value has to be materialized after the original pdl.apply_native_constraint has been deleted but before we get the chance to generate the corresponding pdl_interp.apply_constraint operation. We solve this by emitting a placeholder value when a ConstraintPosition is evaluated. These placeholder values (due to fusion there may be multiple for one constraint result) are replaced later when the actual pdl_interp.apply_constraint operation is created. Changes since the phabricator review: - Addressed all comments - In particular, removed registerConstraintFunctionWithResults and instead changed registerConstraintFunction so that contraint functions always have results (empty by default) - Thus we don't need to reuse `rewriteFunctions` to store constraint functions with results anymore, and can instead use `constraintFunctions` - Perform a stable sort of ConstraintQuestion, so that ConstraintQuestion appear before other ConstraintQuestion that use their results. - Don't create placeholders for pdl_interp::ApplyConstraintOp. Instead generate the `pdl_interp::ApplyConstraintOp` before generating the successor block. - Fixed a test failure in the pdl python bindings Original code by @martin-luecke Co-authored-by: martin-luecke <martinpaul.luecke@amd.com>	2024-03-01 07:29:49 +01:00
Rishabh Bali	915fce0402	[mlir][affine] Enable ConvertAffineToStandard pass to handle affine.delinearize_index Op. (#82189 ) This PR, aims to enable the `ConvertAffineToStandard` to handle `affine.dilinearize_index` Operation. Fixes #78458	2024-02-28 18:58:53 +05:30
Krzysztof Drewniak	4cba5957e6	[mlir][ROCDL] Set the LLVM data layout when lowering to ROCDL LLVM (#74501 ) In order to ensure operations lower correctly (especially memref.addrspacecast, which relies on the data layout benig set correctly then dealing with dynamic memrefs) and to prevent compilation issues later down the line, set the `llvm.data_layout` attribute on GPU modules when lowering their contents to a ROCDL / AMDGPU target. If there's a good way to test the embedded string to prevent it from going out of sync with the LLVM TargetMachine, I'd appreciate hearing about it. (Or, alternatively, if there's a place I could farctor the string out to).	2024-02-27 09:59:50 -06:00
Kai Sasaki	288d317fff	[mlir][complex] Support Fastmath flag in conversion of complex.div to standard (#82729 ) Support Fastmath flag to convert `complex.div` to standard dialects. See: https://discourse.llvm.org/t/rfc-fastmath-flags-support-in-complex-dialect/71981	2024-02-27 18:51:24 +09:00
Benjamin Maxwell	78890904c4	[mlir][math] Propagate scalability in `convert-math-to-llvm` (#82635 ) This also generally increases the coverage of scalable vector types in the math-to-llvm tests.	2024-02-23 09:48:58 +00:00
Matthias Gehre	c1e9883a81	[TOSA] TosaToLinalg: fix int64_t min/max lowering of clamp (#82641 ) tosa.clamp takes `min`/`max` attributes as i64, so ensure that the lowering to linalg works for the whole range. Co-authored-by: Tiago Trevisan Jost <tiago.trevisanjost@amd.com>	2024-02-22 21:16:33 +01:00
mlevesquedion	d4fd20258f	[mlir] Use arith max or min ops instead of cmp + select (#82178 ) I believe the semantics should be the same, but this saves 1 op and simplifies the code. For example, the following two instructions: ``` %2 = cmp sgt %0, %1 %3 = select %2, %0, %1 ``` Are equivalent to: ``` %2 = maxsi %0 %1 ```	2024-02-21 12:28:05 -08:00
Benjamin Maxwell	a1a6860314	[mlir][VectorOps] Add unrolling for n-D vector.interleave ops (#80967 ) This unrolls n-D vector.interleave ops like: ```mlir vector.interleave %i, %j : vector<6x3xf32> ``` To a sequence of 1-D operations: ```mlir %i_0 = vector.extract %i[0] %j_0 = vector.extract %j[0] %res_0 = vector.interleave %i_0, %j_0 : vector<3xf32> vector.insert %res_0, %result[0] : // ... repeated x6 ``` The 1-D operations can then be directly lowered to LLVM. Depends on: #80966	2024-02-20 14:33:33 +00:00
Kareem Ergawy	118a2a52fd	[MLIR][OpenMP] Support `llvm` conversion for `omp.private` regions (#81414 ) Introduces conversion of `omp.private`'s regions to the LLVM dialect. This reuses the already existing conversion pattern for `ReducetionDeclareOp` and repurposes it to be used for multi-region ops as well.	2024-02-16 05:57:41 +01:00
David Truby	be9f8ffd81	[mlir][flang][openmp] Rework wsloop reduction operations (#80019 ) This patch reworks the way that wsloop reduction operations function to better match the expected semantics from the OpenMP specification, following the rework of parallel reductions. The new semantics create a private reduction variable as a block argument which should be used normally for all operations on that variable in the region; this private variable is then combined with the others into the shared variable. This way no special omp.reduction operations are needed inside the region. These block arguments follow the loop control block arguments. --------- Co-authored-by: Kiran Chandramohan <kiran.chandramohan@arm.com>	2024-02-13 19:13:54 +00:00
Benjamin Maxwell	79ce2c93ae	[mlir][VectorOps] Add conversion of 1-D vector.interleave ops to LLVM (#80966 ) The 1-D case directly maps to LLVM intrinsics. The n-D case will be handled by unrolling to 1-D first (in a later patch). Depends on: #80965	2024-02-13 10:47:33 +00:00
Guray Ozen	0a600c34c8	[mlir][nvgpu] Make `phaseParity` of `mbarrier.try_wait` `i1` (#81460 ) Currently, `phaseParity` argument of `nvgpu.mbarrier.try_wait.parity` is index. This can cause a problem if it's passed any value different than 0 or 1. Because the PTX instruction only accepts even or odd phase. This PR makes phaseParity argument i1 to avoid misuse. Here is the information from PTX doc: ``` The .parity variant of the instructions test for the completion of the phase indicated by the operand phaseParity, which is the integer parity of either the current phase or the immediately preceding phase of the mbarrier object. An even phase has integer parity 0 and an odd phase has integer parity of 1. So the valid values of phaseParity operand are 0 and 1. ``` See for more information: https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#parallel-synchronization-and-communication-instructions-mbarrier-test-wait-mbarrier-try-wait	2024-02-13 09:50:34 +01:00
Kai Sasaki	b17348c3b5	[mlir][complex] Prevent underflow in complex.abs (#79786 ) (#81092 )	2024-02-11 07:35:19 +09:00
Kolya Panchenko	9f6c00565a	[MLIR][VCIX] Support VCIX intrinsics in LLVMIR dialect (#75875 ) The changeset extends LLVMIR intrinsics with VCIX intrinsics. The VCIX intrinsics allow MLIR users to interact with RISC-V co-processors that are compatible with `XSfvcp` extension Source: https://www.sifive.com/document-file/sifive-vector-coprocessor-interface-vcix-software	2024-02-07 15:23:28 -05:00
Thomas Preud'homme	88c830a1a5	[TOSA] Fix avgpool2d accum in wider type (#80849 ) Truncate result of avgpool when accumulation is done in a wider type than the result element type, such as when doing a f16 avgpool2d with a f32 accumulator type.	2024-02-07 10:03:19 +00:00
Cullen Rhodes	fff86c6111	[mlir][ArmSME] Support 4-way widening outer products (#79288 ) This patch introduces support for 4-way widening outer products. This enables the fusion of 4 'arm_sme.outerproduct' operations that are chained via the accumulator into single widened operations. Changes: - Adds the following operations: - smopa_4way, smops_4way - umopa_4way, umops_4way - sumopa_4way, sumops_4way - sumopa_4way, sumops_4way - Implements conversions for the above ops to intrinsics in ArmSMEToLLVM. - Extends 'arm-sme-outer-product' pass. For a detailed description of these operations see the 'arm_sme.smopa_4way' description.	2024-02-07 08:17:47 +00:00
Marius Brehler	9a87c5d440	[mlir][EmitC] Add support for external functions (#80547 ) This adds a conversion from an externaly defined `func.func`, a `func.func` without function body, to an `emitc.func` with an `extern` specifier.	2024-02-05 16:58:10 +01:00

1 2 3 4 5 ...

1684 Commits