This is a minor step towards deprecating bufferization.alloc_tensor().
It replaces the examples with tensor.empty() and adjusts the underlying
rewriting logic to prepare for this upcoming change.
This commit removes the deallocation capabilities of
one-shot-bufferization. One-shot-bufferization should never deallocate
any memrefs as this should be entirely handled by the
ownership-based-buffer-deallocation pass going forward. This means the
`allow-return-allocs` pass option will default to true now,
`create-deallocs` defaults to false and they, as well as the escape
attribute indicating whether a memref escapes the current region, will
be removed. A new `allow-return-allocs-from-loops` option is added as a
temporary workaround for some bufferization limitations.
This patch adds an NVPTX compilation path that enables JIT compilation
on NVIDIA targets. The following modifications were performed:
1. Adding a format field to the GPU object attribute, allowing the
translation attribute to use the correct runtime function to load the
module. Likewise, a dictionary attribute was added to add any possible
extra options.
2. Adding the `createObject` method to `GPUTargetAttrInterface`; this
method returns a GPU object from a binary string.
3. Adding the function `mgpuModuleLoadJIT`, which is only available for
NVIDIA GPUs, as there is no equivalent for AMD.
4. Adding the CMake flag `MLIR_GPU_COMPILATION_TEST_FORMAT` to specify
the format to use during testing.
Currently, the VectorToLLVM patterns are built into a library along
with the corresponding pass, which also pulls in all the
platform-specific vector dialects (like AMXDialect) to apply all the
vector to LLVM conversions.
This causes dependency bloat when writing libraries - for example the
GPU to LLVM passes, which use the vector to LLVM patterns, don't need
the X86Vector dialect to be present at all.
This commit partitions the library into VectorToLLVM and
VectorToLLVMPass, where the latter pulls in all the other vector
transformations.
Reviewed By: nicolasvasilache, mehdi_amini
Differential Revision: https://reviews.llvm.org/D158287
This is the first commit in a series with the goal to rework the
BufferDeallocation pass. Currently, this pass heavily relies on copies
to perform correct deallocations, which leads to very slow code and
potentially high memory usage. Additionally, there are unsupported cases
such as returning memrefs which this series of commits aims to add
support for as well.
This first commit removes the deallocation capabilities of
one-shot-bufferization.One-shot-bufferization should never deallocate any
memrefs as this should be entirely handled by the buffer-deallocation pass
going forward. This means the allow-return-allocs pass option will
default to true now, create-deallocs defaults to false and they, as well
as the escape attribute indicating whether a memref escapes the current region,
will be removed.
The documentation should w.r.t. these pass option changes should also be
updated in this commit.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D156662
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
Currently, dimlvlmap with identity affine map will be treated as empty
affine map. But the new syntax would treat it as an actual identity
affine map such as {d0} -> {d0}. This mismatch could raise an error when
we are comparing sparse encodings.
Deprecate the `gpu-to-cubin` & `gpu-to-hsaco` passes in favor of the
`TargetAttr` workflow. This patch removes remaining upstream uses of the
aforementioned passes, including the option to use them in `mlir-opt`. A
future patch will remove these passes entirely.
The passes can be re-enabled in `mlir-opt` by adding the CMake flag: `-DMLIR_ENABLE_DEPRECATED_GPU_SERIALIZATION=1`.
The revert happened due to a build bot failure that threw 'CUDA_ERROR_UNSUPPORTED_PTX_VERSION'.
The failure's root cause was a pass using "+ptx76" for compilation and an old CUDA driver
on the bot. This commit relands the patch with "+ptx60".
Original Gh PR: #65768
Original commit message:
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.
The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
Migrate tests referencing `gpu-to-cubin` to the new compilation workflow
using `TargetAttrs`. The `test-lower-to-nvvm` pass pipeline was modified
to use the new compilation workflow to simplify the introduction of
future tests.
The `createLowerGpuOpsToNVVMOpsPass` function was removed, as it didn't
allow for passing all options available in the `ConvertGpuOpsToNVVMOp`
pass.
There are two motivations for this change:
1. It considerably simplifies adding support for the realloc operation to the
new buffer deallocation pass by lowering the realloc such that no
deallocation operation is inserted and the deallocation pass itself can
insert that dealloc
2. The lowering is expressed on a higher level and thus easier to understand,
and the lowerings of the memref operations it is composed of don't have to
be duplicated in the MemRefToLLVM lowering (also see discussion in
https://reviews.llvm.org/D133424)
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D159430
sparse_tensor ops cannot be bufferized with One-Shot Bufferize. (They can only be analyzed.) The sparse compiler does the actual lowering to memref. Produce a proper error message instead of crashing.
This fixes#61311.
Differential Revision: https://reviews.llvm.org/D158728
These methods are needed for use with `Diagnostic::operator<<` etc.
The definitions follow the pattern of `Diagnostic::str` by simply wrapping the underlying `print(raw_ostream)` method. Although there is some overhead for constructing the `std::string`, this seems like the overall most-efficient option: since this overhead only occurs on the error path (under the current intended usage). An alternative approach would be to have one method construct a `Twine` directly, and then have the print method pass the twine to the stream; however, that would mean introducing the overhead of twine construction on the common/happy path of simply printing things out.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D157643
These new methods help clean up some code for doing LvlExpr-analysis during DimExpr-inference.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D157647
These checks have been fired when comparing same elements like comp(a, a). Let's fix it. I don't have commit rights.
Danila Kutenin
kutdanila@yandex.ru
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D152152
Replaced the "NEW_SYNTAX" with the more readable "map"
(which we may, or may not keep). Minor improvement in
keyword parsing, migrated a few more examples over.
Reviewed By: Peiming, yinying-lisa-li
Differential Revision: https://reviews.llvm.org/D158325
Registering the SparsificationAndBufferization into a proper TD pass
has the advantage that it can be invoked and tested in isolation. This
change also moves some bufferization specific set up from the pipeline
file into the pass file, keeping the logic more locally.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D158219
Alphabetical order, less whitespace
removed some verbose comments for readability
(which are preserved in the history if ever needed)
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D158114
This revision is needed to support bufferization of `cf.br`/`cf.cond_br`. It will also be useful for better analysis of loop ops.
This revision generalizes `getAliasingOpResults` to `getAliasingValues`. An OpOperand can now not only alias with OpResults but also with BlockArguments. In the case of `cf.br` (will be added in a later revision): a `cf.br` operand will alias with the corresponding argument of the destination block.
If an op does not implement the `BufferizableOpInterface`, the analysis in conservative. It previously assumed that an OpOperand may alias with each OpResult. It now assumes that an OpOperand may alias with each OpResult and each BlockArgument of the entry block.
Differential Revision: https://reviews.llvm.org/D157957
Consistent order of ops and related methods.
Also, renamed SpGEMMGetSizeOp to SpMatGetSizeOp
since this is a general utility for sparse matrices,
not specific to GEMM ops only.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D157922
Renames the old `DimLvlExpr::getContext` to `tryGetContext` to make it clear that the result could be a nullptr. Also adds new delegation methods `DimSpec::tryGetContext` and `LvlSpec::getContext`.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D157163
Improves the conversion from `DimLvlMap` to STEA, in order to correct rank-mismatch issues in the roundtrip tests.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D157162
This commit makes `DimLvlMap::print` consistent with `DimLvlMapParser` with respect to both the forward-decls per se, as well as the all-or-none constraint on LvlVar-bindings in LvlSpecs.
Reviewed By: aartbik
Differential Revision: https://reviews.llvm.org/D156768