This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
* Always use the auto-generated `getInitArgs` function. Remove the
hand-written `getInitOperands` duplicate.
* Remove `hasIterOperands` and `getNumIterOperands`. The names were
inconsistent because the "arg" is called `initArgs` in TableGen. Use
`getInitArgs().size()` instead.
* Fix verification around ops with no results.
This change removes the requirement that the row stride be statically known when
converting `vector.transfer_read` and `vector.transfer_write` to distributed
SIMT operations in the `nvgpu` lowering path. It also adds a check to verify
that the last dimension of the source memref is statically known to have stride
1 since this is assumed in the conversion logic. No other change should be
required since the generated `vector.load` operations are never created across
dimensions other than the last. The routines for checking preconditions on
`vector.transfer_read/write` are moved to under nvgpu utilities.
The change is NFC with respect to the GPU dialect lowering path.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D155753
This patch is part of a larger initiative aimed at fixing floating-point `max` and `min` operations in MLIR: https://discourse.llvm.org/t/rfc-fix-floating-point-max-and-min-operations-in-mlir/72671.
This commit addresses Task 1.2 of the mentioned RFC. By renaming these operations, we align their names with LLVM intrinsics that have corresponding semantics.
This commit adds support for arith.extf in the supported list of
elementwise ops for subgroup MMA ops, and enables lowering to
SPIR-V.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D156847
`DenseI64ArrayAttr` provides a better API than `I64ArrayAttr`. E.g., accessors returning `ArrayRef<int64_t>` (instead of `ArrayAttr`) are generated.
Differential Revision: https://reviews.llvm.org/D156684
The old code used to materialize constants as ops, immediately folded them into the resulting affine map and then deleted the constant ops again. Instead, directly fold the attributes into the affine map. Furthermore, all helpers accept `OpFoldResult` instead of `Value` now. This makes the code at call sites more efficient, because it is no longer necessary to materialize a `Value`, just to be able to use these helper functions.
Note: The API has changed (accepts OpFoldResult instead of Value), otherwise this change is NFC.
Differential Revision: https://reviews.llvm.org/D153324
Add an options object to allow control of the slice computation (for
both forward and backward slice). This makes the ABI stable, and also
allows avoiding an assert that makes the slice analysis unusable for
operations with multiple blocks.
Reviewed By: hanchung, nicolasvasilache
Differential Revision: https://reviews.llvm.org/D151520
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
Pushed a stale commit for the same review in my previous commit.
I am updating the main-line with the latest commit including
review commits. Apologies for the redundant commit.
Differential Revision: https://reviews.llvm.org/D147749
This patch removes the historical lhs and rhs masks in vector.contract,
now that vector.mask supports vector.contract and the lhs and rhs masks
are barely supported by all the vector.contract lowerings and
transformations.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D144430
This pattern is not specific to nvgpu; I intend to use in SPIR-V codegen. `VectorTransforms` seems like a more generally useful place.
In addition:
- Fix a bug in the second condition (the dimensions were swapped for RHS).
- Add tests.
- Add support for externally provided filter functions, similar to other vector transforms.
- Prefer to transpose before zero/sign-extending inputs.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D145638
Plain `getVectorType()` can be quite confusing and error-prone
given that, well, vector ops always work on vector types, and
it can commonly involve both source and result vectors. So this
commit makes various such accessor methods to be explicit w.r.t.
source or result vectors.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D144159
In file included from /data/llvm-project/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp:13:
/data/llvm-project/mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h:15:1: error: class 'LogicalResult' was previously declared as a struct; this is valid, but may result in linker errors under the Microsoft C++ ABI [-Werror,-Wmismatched-tags]
class LogicalResult;
^
/data/llvm-project/mlir/include/mlir/Support/LogicalResult.h:26:22: note: previous use is here
struct [[nodiscard]] LogicalResult {
^
/data/llvm-project/mlir/include/mlir/Conversion/VectorToGPU/VectorToGPU.h:15:1: note: did you mean struct here?
class LogicalResult;
^~~~~
struct
/data/llvm-project/mlir/lib/Conversion/VectorToGPU/VectorToGPU.cpp:724:5: error: ignoring return value of function declared with 'nodiscard' attribute [-Werror,-Wunused-result]
rewriter.notifyMatchFailure(
^~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.
This revision performs a bunch of cleanups and tracks free-flowing IR mutations.
APIs are systematized around RewriterBase and relevant debug messages are added.
Deliberate use of OpBuilder::InsertionGuard is added where needed.
Differential Revision: https://reviews.llvm.org/D143738
Unsigned integer types are supported in subgroup mma ops by matching
against arith.extui ops. This allows for subgroup_mma_compute ops with
mixed signedness which requires later conversions to handle this. SPIR-V
cooperative matrix ops support this while the lowering to WMMA does not.
Differential Revision: https://reviews.llvm.org/D143922
The code for unranked memref descriptors assumed that
sizeof(!llvm.ptr) == lizeof(!llvm.ptr<N>) for all address spaces N.
This is not always true (ex. the AMDGPU compiler backend has
sizeof(!llvm.ptr) = 64 bits but sizeof(!llvm.ptr<5>) = 32 bits, where
address space 5 is used for stack allocations). While this is merely
an overallocation in the case where a non-0 address space has pointers
smaller than the default, the existing code could cause OOB memory
accesses when sizeof(!llvm.ptr<N>) > sizeof(!llvm.ptr).
So, add an address spaces parameter to computeSizes in order to
partially resolve this class of bugs. Note that the LLVM data layout
in the conversion passes is currently set to "" and not constructed
from the MLIR data layout or some other source, but this could change
in the future.
Depends on D142159
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D141293
Remapping memory spaces is a function often needed in type
conversions, most often when going to LLVM or to/from SPIR-V (a future
commit), and it is possible that such remappings may become more
common in the future as dialects take advantage of the more generic
memory space infrastructure.
Currently, memory space remappings are handled by running a
special-purpose conversion pass before the main conversion that
changes the address space attributes. In this commit, this approach is
replaced by adding a notion of type attribute conversions
TypeConverter, which is then used to convert memory space attributes.
Then, we use this infrastructure throughout the *ToLLVM conversions.
This has the advantage of loosing the requirements on the inputs to
those passes from "all address spaces must be integers" to "all
memory spaces must be convertible to integer spaces", a looser
requirement that reduces the coupling between portions of MLIR.
ON top of that, this change leads to the removal of most of the calls
to getMemorySpaceAsInt(), bringing us closer to removing it.
(A rework of the SPIR-V conversions to use this new system will be in
a folowup commit.)
As a note, one long-term motivation for this change is that I would
eventually like to add an allocaMemorySpace key to MLIR data layouts
and then call getMemRefAddressSpace(allocaMemorySpace) in the
relevant *ToLLVM in order to ensure all alloca()s, whether incoming or
produces during the LLVM lowering, have the correct address space for
a given target.
I expect that the type attribute conversion system may be useful in
other contexts.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D142159
This is loading from 2-D memref, in addition to D139655 where we
load from 1-D memref cases.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D140136
This is now possible with transpose semantics on subgroup MMA
load ops.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D139655
Add support for loading, computing, and storing `gpu.subgroup` WMMA ops
in transpose mode as well. Update the GPU to NVVM lowerings to support
`transpose` mode and update integration tests as well.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D139021
This patch mechanically replaces None with std::nullopt where the
compiler would warn if None were deprecated. The intent is to reduce
the amount of manual work required in migrating from Optional to
std::optional.
This is part of an effort to migrate from llvm::Optional to
std::optional:
https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Enables transposed gpu.subgroup_mma_load_matrix and updates the lowerings in Vector to GPU and GPU to SPIRV. Needed to enable B transpose matmuls lowering to wmma ops.
Taken over from author: stanley-nod <stanley@nod-labs.com>
Reviewed By: ThomasRaoux, antiagainst
Differential Revision: https://reviews.llvm.org/D138770
This patch handles native `mma.sync` sizes and enables issuing `ldmatrix` on
largest possible tiles for matrixB. It requires handling
`vector.extract_strided_slice` from vector to ngpu lowering.
Differential Revision: https://reviews.llvm.org/D135749
The ConvertVectorToGpu pass implementation contained a small private
support library for performing various calculations during conversion
between `vector` and `nvgpu.mma.sync` and `nvgpu.ldmatrix` operations.
The support library is moved under `Dialect/NVGPU/Utils` because the
functions have wider utility. Some documentation comments are added or
improved.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D135303
This is the first step in replacing interator_type from strings with enums in Vector and Linalg dialect. This change adds IteratorTypeAttr and uses it in ContractionOp.
To avoid breaking all the tests, print/parse code has conversion between string and enum for now.
There is a shared code in StructuredOpsUtils.h that expects iterator types to be strings. To break this dependancy, this change forks helper function `isParallelIterator` and `isReductionIterator` to utils in both dialects and adds `getIteratorTypeNames()` to support backward compatibility with StructuredGenerator.
In the later changes, I plan to add a similar enum attribute to Linalg.
Differential Revision: https://reviews.llvm.org/D133696
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
The patch introduces the required changes to update the pass declarations and definitions to use the new autogenerated files and allow dropping the old infrastructure.
Reviewed By: mehdi_amini, rriddle
Differential Review: https://reviews.llvm.org/D132838
Adds optional attribute to support tensor cores on F32 datatype by lowering to `mma.sync` with TF32 operands. Since, TF32 is not a native datatype in LLVM we are adding `tf32Enabled` as an attribute to allow the IR to be aware of `MmaSyncOp` datatype. Additionally, this patch adds placeholders for nvgpu-to-nvgpu transformation targeting higher precision tf32x3.
For mma.sync on f32 input using tensor cores there are two possibilites:
(a) tf32 (1 `mma.sync` per warp-level matrix-multiply-accumulate)
(b) tf32x3 (3 `mma.sync` per warp-level matrix-multiply-accumulate)
Typically, tf32 tensor core acceleration comes at a cost of accuracy from missing precision bits. While f32 has 23 precision bits, tf32 has only 10 precision bits. tf32x3 aims to recover the precision bits by splitting each operand into two tf32 values and issue three `mma.sync` tensor core operations.
Reviewed By: ThomasRaoux
Differential Revision: https://reviews.llvm.org/D130294
This patch removes the `type` field from `Attribute` along with the
`Attribute::getType` accessor.
Going forward, this means that attributes in MLIR will no longer have
types as a first-class concept. This patch lays the groundwork to
incrementally remove or refactor code that relies on generic attributes
being typed. The immediate impact will be on attributes that rely on
`Attribute` containing a type, such as `IntegerAttr`,
`DenseElementsAttr`, and `ml_program::ExternAttr`, which will now need
to define a type parameter on their storage classes. This will save
memory as all other attribute kinds will no longer contain a type.
Moreover, it will not be possible to generically query the type of an
attribute directly. This patch provides an attribute interface
`TypedAttr` that implements only one method, `getType`, which can be
used to generically query the types of attributes that implement the
interface. This interface can be used to retain the concept of a "typed
attribute". The ODS-generated accessor for a `type` parameter
automatically implements this method.
Next steps will be to refactor the assembly formats of certain operations
that rely on `parseAttribute(type)` and `printAttributeWithoutType` to
remove special handling of type elision until `type` can be removed from
the dialect parsing hook entirely; and incrementally remove uses of
`TypedAttr`.
Reviewed By: lattner, rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D130092
This one required more changes than ideal due to overlapping generated name
with different return types. Changed getIndexingMaps to getIndexingMapsArray to
move it out of the way/highlight that it returns (more expensively) a
SmallVector and uses the prefixed name for the Attribute.
Differential Revision: https://reviews.llvm.org/D129919
For the conversion to nvgpu `mma.sync` and `ldmatrix` pathways, the code
was missing support for the `i4` data type. While fixing this, another
bug was discoverd that caused the number of ldmatrix tiles calculated for
certain operand types and configurations to be incorrect. This change
fixes both issues and adds additional tests.
Differential Revision: https://reviews.llvm.org/D128074
This aligns the SCF dialect file layout with the majority of the dialects.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D128049
This change adds a transformation and pass to the NvGPU dialect that
attempts to optimize reads/writes from a memref representing GPU shared
memory in order to avoid bank conflicts. Given a value representing a
shared memory memref, it traverses all reads/writes within the parent op
and, subject to suitable conditions, rewrites all last dimension index
values such that element locations in the final (col) dimension are
given by
`newColIdx = col % vecSize + perm[row](col/vecSize,row)`
where `perm` is a permutation function indexed by `row` and `vecSize`
is the vector access size in elements (currently assumes 128bit
vectorized accesses, but this can be made a parameter). This specific
transformation can help optimize typical distributed & vectorized accesses
common to loading matrix multiplication operands to/from shared memory.
Differential Revision: https://reviews.llvm.org/D127457