Treat integer range for vector type as union of ranges of individual
elements. With this semantics, most arith ops on vectors will work out
of the box, the only special handling needed for constants and vector
elements manipulation ops.
The end goal of these changes is to be able to optimize vectorized index
calculations.
- GangPrivate and GangFirstPrivate renamed to just Private and
Firstprivate respectively. This is makes compute ops consistent with the
loop op (and also with the acc spec wording for the clause).
- Added getBody to all compute ops
- Verifier for firstprivate ops / recipes is enabled
Currently, the lowering for vector.step lives
under a folder. This is not ideal if we want
to do transformation on it and defer the
materizaliztion of the constants much later.
This commits adds a rewrite pattern that
could be used by using
`transform.structured.vectorize_children_and_apply_patterns`
transform dialect operation.
Moreover, the rewriter of vector.step is also
now used in -convert-vector-to-llvm pass where
it handles scalable and non-scalable types as
LLVM expects it.
As a consequence of removing the vector.step
lowering as its folder, linalg vectorization
will keep vector.step intact.
This commit fixes the failure in the original PR when building with
shared libs. The problem is that `visitUsedValuesDefinedAbove` is
defined in `MLIRTransformUtils`, but that lib depends on this lib
(`MLIRAnalysis`). To fix, I dropped the use of
`visitUsedValuesDefinedAbove` and use `Region::walk` to traverse values
defined above.
Reapplies PR https://github.com/llvm/llvm-project/pull/113478
Reverts PR https://github.com/llvm/llvm-project/pull/114432
This reverts commit a9a8351.
Unifies parsing and printing for DLTI attributes. Introduces a format of
`#dlti.attr<key1 = val1, ..., keyN = valN>` syntax for all queryable
DLTI attributes similar to that of the DictionaryAttr, while retaining
support for specifying key-value pairs with `#dlti.dl_entry` (whether to
retain this is TBD).
As the new format does away with most of the boilerplate, it is much easier
to parse for humans. This makes an especially big difference for nested
attributes.
Updates the DLTI-using tests and includes fixes for misc error checking/
error messages.
This is intended as a fast pattern rewrite driver for the cases when a
simple walk gets the job done but we would still want to implement it in
terms of rewrite patterns (that can be used with the greedy pattern
rewrite driver downstream).
The new driver is inspired by the discussion in
https://github.com/llvm/llvm-project/pull/112454 and the LLVM Dev
presentation from @matthias-springer earlier this week.
This limitation comes with some limitations:
* It does not repeat until a fixpoint or revisit ops modified in place
or newly created ops. In general, it only walks forward (in the
post-order).
* `matchAndRewrite` can only erase the matched op or its descendants.
This is verified under expensive checks.
* It does not perform folding / DCE.
We could probably relax some of these in the future without sacrificing
too much performance.
This change modifies `getBackwardSlice` to track values captures by the
regions of each operation that it traverses. Ignoring values captured
from a parent region may lead to an incomplete program slice. However,
there seems to be logic that depends on not traversing captured values,
so this change preserves the default behavior by hiding this logic
behind the `omitUsesFromAbove` flag.
`DIGenericSubrange` is used when the dimensions of the arrays are
unknown at build time (e.g. assumed-rank arrays in Fortran). It has same
`lowerBound`, `upperBound`, `count` and `stride` fields as in
`DISubrange` and its translation looks quite similar as a result.
---------
Co-authored-by: Tobias Gysi <tobias.gysi@nextsilicon.com>
The `ValueDecomposer` in `DecomposeCallGraphTypes` was a workaround
around missing 1:N support in the dialect conversion. Since #113032, the
dialect conversion infrastructure supports 1:N type conversions and 1:N
target materializations. The `ValueDecomposer` class is no longer
needed. (However, target materializations must still be inserted
manually, until we fully merge the 1:1 and 1:N drivers.)
Note for LLVM integration: Register 1:N target materializations on the
type converter instead of "decompose value conversions" on the
`ValueDecomposer`.
This commit adds support for bufferizing external functions that have no
body. Such functions were previously rejected by One-Shot Bufferize if
they returned a tensor value.
This commit is in preparation of removing the deprecated
`func-bufferize` pass. That pass can bufferize external functions.
Also update a few comments.
Previously, the pass only supported emulation of loading vector sizes
that are multiples of the emulated data type. This patch expands its
support for emulating sizes that are not multiples of byte sizes. In
such cases, the element values are packed back-to-back to preserve
memory space.
To give a concrete example: if an input has type `memref<3x3xi2>`, it is
actually occupying 3 bytes in memory, with the first 18 bits storing the
values and the last 6 bits as padding. The slice of `vector<3xi2>` at
index `[2, 0]` is stored in memory from bit 12 to bit 18. To properly
load the elements from bit 12 to bit 18 from memory, first load byte 2
and byte 3, and convert it to a vector of `i2` type; then extract bits 4
to 10 (element index 2-5) to form a `vector<3xi2>`.
A limitation of this patch is that the linearized index of the unaligned
vector has to be known at compile time. Extra code needs to be emitted
to handle it if the condition does not hold.
The following ops are updated:
* `vector::LoadOp`
* `vector::TransferReadOp`
* `vector::MaskedLoadOp`
At the moment, `GenericPadOpVectorizationPattern` implements two
orthogonal transformations:
1. Rewrites `tensor::PadOp` into a sequence of `tensor::EmptyOp`,
`linalg::FillOp` and `tensor::InsertSliceOp`.
2. Vectorizes (where possible) `tensor::InsertSliceOp` (see
`tryVectorizeCopy`).
This patch splits `GenericPadOpVectorizationPattern` into two separate
patterns:
1. `GeneralizePadOpPattern` for the first transformation (note that
currently `GenericPadOpVectorizationPattern` inherits from
`GeneralizePadOpPattern`).
2. `InsertSliceVectorizePattern` to vectorize `tensor::InsertSliceOp`.
With this change, we gain the following:
* a clear separation between pre-processing and vectorization
transformations/stages,
* a path to support masked vectorisation for `tensor.insert_slice`
(with a dedicated pattern for vectorization, it is much easier to
specify the input vector sizes used in masking),
* more opportunities to vectorize `tensor.insert_slice`.
Note for downstream users:
--------------------------
If you were using `populatePadOpVectorizationPatterns`, following this
change you will also have to add
`populateInsertSliceVectorizationPatterns`.
Finer implementation details:
-----------------------------
1. The majority of changes in this patch are copy & paste + some edits.
1.1. The only functional change is that the vectorization of
`tensor.insert_slice` is now broadly available (as opposed to being
constrained to the pad vectorization pattern:
`GenericPadOpVectorizationPattern`).
1.2. Following-on from the above, `@pad_and_insert_slice_dest` is
updated. As expected, the input `tensor.insert_slice` Op is no
longer "preserved" and instead gets vectorized successfully.
2. The `linalg.fill` case in `getConstantPadVal` works under the
assumption that only _scalar_ source values can be used. That's
consistent with the definition of the Op, but it's not tested at the
moment. Hence a test case in Linalg/invalid.mlir is added.
3. The behaviour of the two TD vectorization Ops,
`transform.structured.vectorize_children_and_apply_patterns` and
`transform.structured.vectorize` is preserved.
The dialect conversion-based bufferization passes have been migrated to
One-Shot Bufferize about two years ago. To clean up the code base, this
commit removes the `scf-bufferize` pass, one of the few remaining parts
of the old infrastructure. Most bufferization passes have already been
removed.
Note for LLVM integration: If you depend on this pass, migrate to
One-Shot Bufferize or copy the pass to your codebase.
Recently, we added an intrinsic for the elect.sync PTX instruction (PR
104780). This patch updates the corresponding Op in NVVM Dialect
to lower to the intrinsic instead of inline-ptx.
The existing test under Conversion/ is migrated to check for the new
pattern. A separate test is added to verify the lowered intrinsic under
the Target/ directory.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>
I'm planning to migrate StringRef to std::string_view
eventually. Since std::string_view does not have slice, this patch
migrates:
slice(0, N) to substr(0, N)
slice(N, StringRef::npos) to substr(N)
The 1:N type converter derived from the 1:1 type converter and extends
it with 1:N target materializations. This commit merges the two type
converters and stores 1:N target materializations in the 1:1 type
converter. This is in preparation of merging the 1:1 and 1:N dialect
conversion infrastructures.
1:1 target materializations (producing a single `Value`) will remain
valid. An additional API is added to the type converter to register 1:N
target materializations (producing a `SmallVector<Value>`). Internally,
all target materializations are stored as 1:N materializations.
The 1:N type converter is removed.
Note for LLVM integration: If you are using the `OneToNTypeConverter`,
simply switch all occurrences to `TypeConverter`.
---------
Co-authored-by: Markus Böck <markus.boeck02@gmail.com>
This PR simply wraps `populatePadOpVectorizationPatterns` into a new
Transform Dialect Op: `apply_patterns.linalg.pad_vectorization`.
This change makes it possible to run (and test) the corresponding
patterns _without_:
`transform.structured.vectorize_children_and_apply_patterns`.
Note that the Op above only supports non-masked vectorisation (i.e. when
the inputs are static), so, effectively, only fixed-width vectorisation
(as opposed to scalable vectorisation). As such, this change is required
to construct vectorization pipelines for tensor.pad targeting scalable
vectors.
To test the new Op and the corresponding patterns, I added
"vectorization-pad-patterns.mlir" - most tests have been extracted from
"vectorization-with-patterns.mlir".
This commit simplifies the result type of materialization functions.
Previously: `std::optional<Value>`
Now: `Value`
The previous implementation allowed 3 possible return values:
- Non-null value: The materialization function produced a valid
materialization.
- `std::nullopt`: The materialization function failed, but another
materialization can be attempted.
- `Value()`: The materialization failed and so should the dialect
conversion. (Previously: Dialect conversion can roll back.)
This commit removes the last variant. It is not particularly useful
because the dialect conversion will fail anyway if all other
materialization functions produced `std::nullopt`.
Furthermore, in contrast to type conversions, at least one
materialization callback is expected to succeed. In case of a failing
type conversion, the current dialect conversion can roll back and try a
different pattern. This also used to be the case for materializations,
but that functionality was removed with #107109: failed materializations
can no longer trigger a rollback. (They can just make the entire dialect
conversion fail without rollback.) With this in mind, it is even less
useful to have an additional error state for materialization functions.
This commit is in preparation of merging the 1:1 and 1:N type
converters. Target materializations will have to return multiple values
instead of a single one. With this commit, we can keep the API simple:
`SmallVector<Value>` instead of `std::optional<SmallVector<Value>>`.
Note for LLVM integration: All 1:1 materializations should return
`Value` instead of `std::optional<Value>`. Instead of `std::nullopt`
return `Value()`.
Fix a number of dependencies issue to build mlir-linalg-ods-yaml-gen
host binary which make a cross-build using the Make generator fail.
Namely:
- do not use binary path for the custom target created when
LLVM_USE_HOST_TOOLS is true;
- use target name instead of name of variable holding the target name
for add_custom_target and set_target_properties in setup_host_tool();
- remove dependency on target defined in different directory in
add_linalg_ods_yaml_gen() since add_custom_target DEPENDS can only be
used on "files and outputs of custom commands created with
add_custom_command() command calls in the same directory";
- remove unneeded dependency on ${MLIR_LINALG_ODS_YAML_GEN_EXE}, the
target dependency will ensure the binary will be built.
Note that we keep using ${MLIR_LINALG_ODS_YAML_GEN_EXE} in the COMMAND
rather than use ${MLIR_LINALG_ODS_YAML_GEN_TARGET} because when
LLVM_NATIVE_TOOL_DIR is used the latter is an empty string.
Testing-wise, all three codepaths in get_host_tool_path() were tested
with both GNU Make and Ninja generators:
- cross-compiling with LLVM_NATIVE_TOOL_DIR checks the if path;
- cross-compiling without LLVM_NATIVE_TOOL_DIR checks the elseif path;
- native build without LLVM_NATIVE_TOOL_DIR checks the else path.
Renames LegalizeData to LegalizeDataValues since this pass fixes up SSA
values. LegalizeData suggested that it fixed data mapping.
This change also adds support to fix up ssa values for data clause
operations. Effectively, compute regions within a data region use the
ssa values from data operations also. The ssa values within data regions
but not within compute regions are not updated.
This change is to support the requirement in the OpenACC spec which
notes that a visible data clause is not just one on the current compute
construct but on the lexically containing data construct or visible
declare directive.
This PR adds an OpenMP dialect related pass for FIR/HLFIR which creates
`MapInfoOp` instances for certain privatized symbols. For example, if an
allocatable variable is used in a private clause attached to a
`omp.target` op, then the allocatable variable's descriptor will be
needed on the device (e.g. GPU). This descriptor needs to be separately
mapped onto the device. This pass creates the necessary `omp.map.info`
ops for this.
The hoistRedundantVectorTransfers function does not verification of loop
bounds when hoisting vector transfers. This is not safe in general,
since it is possible that the loop will have zero trip count. This PR
uses ValueBounds to verify that the lower bound is less than the upper
bound of the loop before hoisting. Trip count verification is currently
behind an option `verifyNonZeroTrip`, which is false by default.
Zero trip count loops can arise in GPU code generation, where a loop
bound can be dependent on a thread id. If not all threads execute the
loop body, then hoisting out of the loop can cause these threads to
execute the transfers when they are not supposed to.
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
In the insert_slice bufferization interface implementation, the
destination tensor is not considered read if the full tensor is
overwritten by the slice. This PR adds the same check for
tensor.parallel_insert_slice.
Adds two new StaticValueUtils:
- `isAllConstantIntValue` checks if an array of `OpFoldResult` are all
equal to a passed `int64_t` value.
- `areConstantIntValues` checks if an array of `OpFoldResult` are all
equal to a passed array of `int64_t` values.
fixes https://github.com/llvm/llvm-project/issues/112435
---------
Signed-off-by: Max Dawkins <max.dawkins@gmail.com>
Currently we have `MLIRContext::getRegisteredOperations` which returns
all operations for the given context, with the addition of
`MLIRContext::getRegisteredOperationsByDialect` we can now retrieve the
same for a given dialect class.
Closes#111591
This patch moves the part of operation verifiers dependent on the
contents of their regions to the corresponding `verifyRegions` method.
This ensures these are only triggered after the operations in the region
have themselved already been verified in advance, avoiding checks based
on invalid nested operations.
The `LoopWrapperInterface` is also updated so that its verifier runs
after operations in the region of ops with this interface have already
been verified.
This commit adds support for the following PTX predefined special
registers
* warpid
* nwarpid
* smid
* nsmid
* gridid
* lanemask.*
* globaltimer
* envreg* And added lit tests under nvvmir.mlir
`UnsignedWhenEquivalent` doesn't really need any dialect conversion
features and switching it normal patterns makes it more composable with
other patterns-based transformations (and probably faster).
Print an error such as the following one before terminating program
execution.
```
mlir/test/Dialect/SparseTensor/convert_dense2sparse.mlir:26:8: remark: location of op
%0 = sparse_tensor.convert %arg0 : tensor<?xi32> to tensor<?xi32, #SparseVector>
^
LLVM ERROR: Failed to infer result type(s):
"sparse_tensor.positions"(...) {} : (index) -> ( ??? )
(stack trace follows)
```
The pack_paddings attribute has been renamed to nofold_flags in
https://github.com/llvm/llvm-project/pull/111036. There are still some
`packPadding` remaining unchanged. This PR rename those to keep
consistent.
This patch adds operand bundle support for `llvm.intr.assume`.
This patch actually contains two parts:
- `llvm.intr.assume` now accepts operand bundle related attributes and
operands. `llvm.intr.assume` does not take constraint on the operand
bundles, but obviously only a few set of operand bundles are meaningful.
I plan to add some of those (e.g. `aligned` and `separate_storage` are
what interest me but other people may be interested in other operand
bundles as well) in future patches.
- The definitions of `llvm.call`, `llvm.invoke`, and
`llvm.call_intrinsic` actually define `op_bundle_tags` as an operation
property. It turns out this approach would introduce some unnecessary
burden if applied equally to the intrinsic operations because properties
are not available through `Operation *` but we have to operate on
`Operation *` during the import/export of intrinsics, so this PR changes
it from a property to an array attribute.
This patch relands commit d8fadad07c.