`DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayAttr`) are generated.
Differential Revision: https://reviews.llvm.org/D156684
EmptyTensorElimination is a pre-bufferization transformation that replaces "tensor.empty" ops with "tensor.extract_slice" ops. This revision adds support for cases where the input IR contains "tensor.cast" ops.
Differential Revision: https://reviews.llvm.org/D156167
* Move `foldDynamicIndexList` to `DialectUtils` and simplify function.
* Move `OpWithOffsetSizesAndStridesConstantArgumentFolder` to `ViewLikeInterface` and add documentation.
Differential Revision: https://reviews.llvm.org/D156581
LLVM is not set up in a thread-safe way, which seems to be leading to
race conditions when sending stuff to llvm::nulls in opt builds. Try a
thread-local alternative.
Reviewed By: pzread
Differential Revision: https://reviews.llvm.org/D156421
A literal constant is not emitted as a variable but rather printed inline. The
form used is same as the Attribute emission form.
Differential Revision: https://reviews.llvm.org/D150356
Earlier, in the sparse backward dataflow analysis, data from the results
of an op implementing `RegionBranchOpInterface` was considered to flow
into the operands of every op that did not implement the
`RegionBranchTerminatorOpInterface` but was return-like and present
in a region of the former. It was thus also expected that the number of
results of the former be equal to the number of operands in the latter.
This understanding of dataflow is incorrect and thus this expectation is
also not justified. This commit fixes this incorrect understanding.
This commit ensures that these return-like ops are handled just like the
ops implementing the `RegionBranchTerminatorOpInterface`, which means
that, if this op has a region `A` whose successors are regions `B`, `C`,
and `D`, then data flows from the arguments (successor inputs) of `B`,
`C`, and `D` to the corresponding successor operands of this op.
This fix is also propagated to liveness analysis that earlier relied on
this incorrect implementation of the sparse backward dataflow analysis
framework and corrects some incorrect assumptions made in it.
Also cleaned up some unnecessary comments from the test file.
Issue: https://github.com/llvm/llvm-project/issues/64139.
Signed-off-by: Srishti Srivastava <srishtisrivastava.ai@gmail.com>
Reviewed By: jcai19, matthiaskramm, Mogball
Differential Revision: https://reviews.llvm.org/D156376
[mlir] Add support for custom readProperties/writeProperties methods.
Currently, operations that opt-in to adopt properties will see auto-generated readProperties/writeProperties methods to emit and parse bytecode. If a dialects opts in to use `usePropertiesForAttributes`, those definitions will be generated for the current definition of the op without the possibility to handle attribute versioning.
The patch adds the capability for an operation to define its own read/write methods for the encoding of properties so that versioned operations can handle upgrading properties encodings.
In addition to this, the patch adds an example showing versioning on NamedProperties through the dialect version API exposed by the reader.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D155340
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.
Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.
Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D153383
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.
Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.
Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D153383
Duplicate values in the retained list can just be removed, however, for duplicates in the list of memrefs to deallocate, we also need to check the conditions and if thhey don't match, we need to compute the OR in order to not miss a case leading to a memory leak.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156157
This patch implements a transform op for the FoldArithExtIntoContractionOp
pattern. The pattern folds arith.extf into vector.contract for the
backends with native support for mixed-mode contractions.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156484
This allows for users of the lsp transport libraries to process replies
in parallel, without overlapping/clobbering the output.
Differential Revision: https://reviews.llvm.org/D156295
Introduces a generic parameter type intended for transform dialect use
cases that uses/manipulates the underlying attribute (e.g. op
annotation). Includes a disclaimer that mixing parameter based and
attribute based control is discouraged.
Differential Revision: https://reviews.llvm.org/D155980
Adds representation for `acc routine` under new operation named
`acc.routine`. This operation is associated with a function symbol.
It also gets its own compiler generated synthetic symbol name so
that it can be referenced from the associated function. The clauses
associated with the `acc routine` directive are captured in the
`acc.routine` op.
The linking between the `func.func` and its `acc.routine` declaration
is done through the `acc.routine_info` attribute. In practice, a
single `acc routine` is associated with a function. But the spec does
not specifically restrict this - thus the 1:N relationship between
`func.func` and `acc.routine` allowed in the dialect. Additionally, it
makes sense that multiple acc routines could be used for a single
function depending on loop context - to allow flexible parallelization.
Most acc routine clauses are supported including `gang`, `gang(dim:)`,
`vector`, `worker`, `seq`, `nohost`, and `bind`. The only one not
supported is `device_type`. This is because most other clauses also
miss this and the effort to add support for it needs to be coordinated
and consistent.
Reviewed By: clementval, vzakhari
Differential Revision: https://reviews.llvm.org/D156281
This patch adds optional and variadic operands and results
to IRDL. These are added using the `irdl.variadicity` attribute,
which has to be attached to every `irdl.operands` and `irdl.results`
operations.
For instance:
```mlir
irdl.operands(%0, single %1, optional %2, variadic %3)
```
has 4 operand definitions. The first two are single operands,
the second one is optional, and the last one is variadic.
Note that this patch only adds the variadicities to the definition,
but does not consider them when loading a dialect at runtime. This
will be done in the next patch.
Reviewed By: Mogball, unterumarmung
Differential Revision: https://reviews.llvm.org/D153983
This revision removes the metadata op, that to the best of our
knowledge, has no more uses after switching to a purely attribute based
metadata representation:
https://reviews.llvm.org/D155444https://reviews.llvm.org/D155285https://reviews.llvm.org/D155159
These changes got unlocked after landing distinct attribute support:
https://reviews.llvm.org/D153360,
which enables modeling distinct metadata using attributes. As a result,
all metadata kinds are now represented using attributes. Previously,
there has been a mix of attribute and op based representations.
Having attribute only metadata makes it possible to update the metadata
in-parallel, while updating the global metadata operation has been
a sequential process. The LLVM Dialect inliner already benefits from
this change and now creates new alias scopes and domains during
inlining rather than dropping the no alias information:
https://reviews.llvm.org/D155712
Reviewed By: Dinistro
Differential Revision: https://reviews.llvm.org/D156217
This revision significantly simplifies the specification and implementation of mapping loops to GPU ids.
Each type of mapping (block, warpgroup, warp, thread) now comes with 2 mapping modes:
1. a 3-D "grid-like" mode, subject to alignment considerations on threadIdx.x, on which predication
may occur on a per-dimension 3-D sub-rectangle basis.
2. a n-D linearized mode, on which predication may only occur on a linear basis.
In the process, better size and alignment requirement inference are introduced along with improved runtime verification messages.
The `warp_dims` attribute was deemed confusing and is removed from the transform in favor of better size inference.
Differential Revision: https://reviews.llvm.org/D155941
Adds `apply_patterns.gpu.unroll_vectors_subgroup_mma` which allows
specifying a native MMA shape of `m`, `n`, and `k` to unroll to,
greedily unrolling the inner most dimension of contractions and other
vector operations based on expected usage.
Differential Revision: https://reviews.llvm.org/D156079
This is failing on Windows MSVC builds:
llvm\unittests\Support\ThreadPool.cpp(380): error C2440: 'return': cannot convert from 'Vector' to 'std::vector<llvm::BitVector,std::allocator<llvm::BitVector>>'
with
[
Vector=llvm::SmallVector<llvm::BitVector,0>
]
This extends the existing 'arm_sme.tile_store' op to support all tile
sizes and adds a new op 'arm_sme.tile_load', as well as lowerings from
vector -> custom ops and custom ops -> intrinsics. Currently there's no
lowering for i128.
Depends on D154867
Reviewed By: awarzynski, dcaballe
Differential Revision: https://reviews.llvm.org/D155306
The operand_segment_sizes and result_segment_sizes Attributes are now inlined
in the operation as native propertie. We continue to support building an
Attribute on the fly for `getAttr("operand_segment_sizes")` and setting the
property from an attribute with `setAttr("operand_segment_sizes", attr)`.
A new bytecode version is introduced to support backward compatibility and
backdeployments.
Differential Revision: https://reviews.llvm.org/D155919
The operand_segment_sizes and result_segment_sizes Attributes are now inlined
in the operation as native propertie. We continue to support building an
Attribute on the fly for `getAttr("operand_segment_sizes")` and setting the
property from an attribute with `setAttr("operand_segment_sizes", attr)`.
A new bytecode version is introduced to support backward compatibility and
backdeployments.
Differential Revision: https://reviews.llvm.org/D155919
This reverts commit 2e0e00ed84
and reverts commit a6eb40692c
and reverts commit 585cbe3f63.
15 tests are broken on the mlir-nvidia buildbot:
'cuModuleLoadData(&module, data)' failed with 'CUDA_ERROR_INVALID_SOURCE'
'cuModuleGetFunction(&function, module, name)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuLaunchKernel(function, gridX, gridY, gridZ, blockX, blockY, blockZ, smem, stream, params, extra)' failed with 'CUDA_ERROR_INVALID_HANDLE'
'cuModuleUnload(module)' failed with 'CUDA_ERROR_INVALID_HANDLE'
Enables printf style debugging of matchers through `transform.print`
within the body of a matcher.
Differential Revision: https://reviews.llvm.org/D156078
This work improves how we compile the generated PTX code using the `ptxas` compiler. Currently, we rely on the driver's jit API to compile the PTX code. However, this approach has some limitations. It doesn't always produce the same binary output as the ptxas compiler, leading to potential inconsistencies in the generated Cubin files.
This work introduces a significant improvement by directly utilizing the ptxas compiler for PTX compilation. By doing so, we can achieve more consistent and reliable results in generating cubin files. Key Benefits:
- Using the Ptxas compiler directly ensures that the cubin files generated during the build process remain consistent with CUDA compilation using `nvcc` or `clang`.
- Another advantage of this work is that it allows developers to experiment with different ptxas compilers without the need to change the compiler. Performance among ptxas compiler versions are vary, therefore, one can easily try different ptxas compilers.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155563
Currently, `ROCDL::GridDim*Op` is being translated to `__ockl_get_global_size`, however
to match the meaning of `gpu.grid_dim` it should instead be translated to
`__ockl_get_num_groups`. This change would also make it agree with the meaning
of `gridDimx.*` in HIP, see:
https://github.com/ROCm-Developer-Tools/hipamd/blob/develop/include/hip/amd_detail/amd_hip_runtime.h#L257
Difference between the functions:
```
__ockl_get_global_size = blockDim * numBlocks
__ockl_get_num_groups = numBlocks
```
Reviewed By: krzysz00
Differential Revision: https://reviews.llvm.org/D156009
`changed` was not updated correctly when it was already set to "true" before calling `applyPatternsAndFoldGreedily`.
Differential Revision: https://reviews.llvm.org/D155934
Add a transform that eliminates dead operations. This is useful after certain transforms (such as fusion) that create/clone new IR but leave the original IR in place.
Differential Revision: https://reviews.llvm.org/D155954
when operating integer type tensors, tosa elementwise multiplication
requires the element type of result to be a 32-bit integer rather
than the same type as inputs.
Change-Id: Ifd3d7ebd879be5c6b2c8e23aa6d7ef41f39c6d41
Reviewed By: mgehre-amd
Differential Revision: https://reviews.llvm.org/D154988
- Wrote complete documentation for the `Broadcastable` op trait. This is mostly meant as a thorough description of its previous behavior, with the exception of minor feature updates.
- Restricted legality criteria for a `Broadcastable` op in order to simplify current and future lowering passes and increase efficiency of code generated by those passes. New restriction are: 1) A dynamic dimension in an inferred result is not compatible with a static dimension in the actual result. 2) Broadcast semantics are restricted to input operands and not supported between inferred and actual result shapes.
- Implemented TOSA-to-Linalg lowering support for unary, binary, tertiary element-wise ops. This support is complete for all legal cases described in the `Broadcastable` trait documentation.
- Added unit tests for `tosa.abs`, `tosa.add`, and `tosa.select` as examples of unary, binary, and tertiary ops.
Reviewed By: eric-k256
Differential Revision: https://reviews.llvm.org/D153291