Added lowering support for IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses for
OMP TARGET directive and added related tests for these changes.
IS_DEVICE_PTR and HAS_DEVICE_ADDR clauses apply to OMP TARGET directive
OpenMP spec states
`The **is_device_ptr** clause indicates that its list items are device
pointers.`
`The **has_device_addr** clause indicates that its list items already
have device addresses and therefore they may be directly accessed from a
target device.`
Whereas USE_DEVICE_PTR and USE_DEVICE_ADDR clauses apply to OMP TARGET
DATA directive and OpenMP spec for them states
`Each list item in the **use_device_ptr** clause results in a new list
item that is a device pointer that refers to a device address`
`Each list item in a **use_device_addr** clause that is present in the
device data environment is treated as if it is implicitly mapped by a
map clause on the construct with a map-type of alloc`
This MR adds the `lower-vector-multi-reduction` pass to lower the
vector.multi_reduction operation.
While the Transform Dialect includes an operation,
`transform.apply_patterns.vector.lower_multi_reduction`, intended for a
similar purpose, its utility is limited to projects that have adopted
the Transform Dialect. Recognizing that not all projects are equipped to
integrate this dialect, the proposed pass serves as a vital standalone
alternative. It ensures that projects solely dependent on the
traditional pass infrastructure can also benefit from the optimized
lowering of `multi_reduction` operation.
---------
Co-authored-by: Xiaolei Shi <xiaoleis@nvidia.com>
This patch introduces a set of composable structures grouping the MLIR
operands associated to each OpenMP clause. This makes it easier to keep
the MLIR representation for the same clause consistent throughout all
operations that accept it.
The relevant clause operand structures are grouped into per-operation
structures using a mixin pattern and used to define new operation
constructors. These constructors can be used to avoid having to get the
order of a possibly large list of operands right.
Missing clauses are documented as TODOs, as well as operands which are
part of the relevant operation's operand structure but cannot be
attached to the associated operation yet, due to missing op arguments to
its MLIR definition.
A follow-up patch will update Flang lowering to make use of these
structures, simplifying the passing of information from clause
processing to operation-generating functions and also simplifying the
creation of operations through the use of the new operation
constructors.
Fix typo bug in AffineExprVisitor for the WalkResult return case. This
didn't show up immmediately because most walks in the tree didn't
use walk result.
At present, large ElementsAttr is unconditionally printed with a hex
string. This means that in IR large constant values often look like:
dense<"0x000000000004000000080000000004000000080000000..."> :
tensor<10x10xi32>
Hoisting hex printing control to the user level for tooling means that
one can disable the feature and get human-readable values when
necessary:
dense<[16, 32, 48, 500...]> : tensor<10x10xi32>
Note: AsmPrinterOptions::printElementsAttrWithHexIfLarger is not always
possible to be used as it requires that one exposes MLIR's command-line
options in user tooling (including an actual compiler).
Co-authored-by: Harald Rotuna <harald.razvan.rotuna@intel.com>
This patch adds the `convertInstruction` and `getSupportedInstructions`
to `LLVMImportInterface`, allowing any non-LLVM dialect to specify how
to import LLVM IR instructions and overriding the default import of LLVM instructions.
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.
Example:
```
%0 = scf.if ... -> index {
scf.yield %a : index
} else {
scf.yield %b : index
}
```
The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)
Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b
This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
Note: This is a re-upload of #85895, which was reverted. The bug that
caused the failure was fixed in #87859.
Emitting trivial getters that amount to `(*this)->getOperand(1)`
out-of-line or `getProperties().foo` is a pretty significant performance
hit on these basic MLIR APIs for manipulating ops (3-4x). Emit them
inline (without adding additional dependencies to header files).
This commit removes the no longer required bitcast inserting pattern in
LLVM dialect's type consistency pattern. This was previously required to
enable Mem2Reg and SROA to promote accesses that had different types.
Recent changes to both passes added direct support for this feature to
them, so the pattern has no further use.
In scalable code it is very common to have constant multiples of vscale,
e.g. `4 * vscale`. This updates `arith.muli` to pretty print the result
name in cases like this, so `4 * vscale` would be `%c4_vscale`.
This makes reading IR dumps of scalable code a little nicer.
This commit adds support for `scf.if` to `ValueBoundsConstraintSet`.
Example:
```
%0 = scf.if ... -> index {
scf.yield %a : index
} else {
scf.yield %b : index
}
```
The following constraints hold for %0:
* %0 >= min(%a, %b)
* %0 <= max(%a, %b)
Such constraints cannot be added to the constraint set; min/max is not
supported by `IntegerRelation`. However, if we know which one of %a and
%b is larger, we can add constraints for %0. E.g., if %a <= %b:
* %0 >= %a
* %0 <= %b
This commit required a few minor changes to the
`ValueBoundsConstraintSet` infrastructure, so that values can be
compared while we are still in the process of traversing the IR/adding
constraints.
As part of this extension this change also does some general cleanup
1) Make all the methods take `RewriterBase` as arguments instead of
creating their own builders that tend to crash when used within
pattern rewrites
2) Split `coalesePerfectlyNestedLoops` into two separate methods, one
for `scf.for` and other for `affine.for`. The templatization didnt
seem to be buying much there.
Also general clean up of tests.
Add `requiresReplacedValues` and `visitReplacedValues` methods to
`PromotableOpInterface`. These methods allow `PromotableOpInterface` ops
to transforms definitions mutated by a `store`.
This change is necessary to correctly handle the promotion of
`LLVM_DbgDeclareOp`.
---------
Co-authored-by: Théo Degioanni <30992420+Moxinilian@users.noreply.github.com>
Currently, by-ref reductions will allocate the per-thread reduction
variable in the initialization region. Adding a cleanup region allows
that allocation to be undone. This will allow flang to support reduction
of arrays stored on the heap.
This conflation of allocation and initialization in the initialization
should be fixed in the future to better match the OpenMP standard, but
that is beyond the scope of this patch.
This commit changes the API of `ValueBoundsConstraintSet`: the stop
condition is now passed to the constructor instead of `processWorklist`.
That makes it easier to add items to the worklist multiple times and
process them in a consistent manner. The current
`ValueBoundsConstraintSet` is passed as a reference to the stop
function, so that the stop function can be defined before the the
`ValueBoundsConstraintSet` is constructed.
This change is in preparation of adding support for branches.
Revise the printing functionality of TimingManager to accommodate
various output formats. At present, TimingManager is limited to
outputting data solely in plain text format. To overcome this
limitation, I have introduced an abstract class that serves as the
foundation for printing. This approach allows users to implement
additional output formats by extending this abstract class. As part of
this update, I have integrated support for JSON as a new output format,
enhancing the ease of parsing for subsequent processing scripts.
modernize-use-override ClangTidy check.
This warning appears on overridden virtual functions not marked with override or
final keywords or marked with more than one of virtual, override, final.
- Remove trailing type from value attributes as emitc.opaque attributes
are untyped.
- Replace invalid trailing * in opaque type by wrapping it into an
!emitc.ptr.
This commit extends the data layout subsystem with accessors for the
endianness. The implementation follows the structure implemented for
alloca, global, and program memory spaces.
Some support code, e.g. llvm/Support/Endian.h, uses
llvm::support::detail, but the format-related code uses llvm::detail. On
VS2019, when a C++ file includes both headers, a `detail::` from
`namespace llvm { ... }` becomes ambiguous.
44253a9c breaks TensorFlow and
[JAX](https://github.com/google/jax/actions/runs/8507773013/job/23300219405)
build because of this.
Since llvm::X::detail seems like a cleaner solution and is used in other
places as well (e.g. llvm::yaml::detail), we should probably migrate all
llvm::detail usages to llvm::X::detail.
Composite pass allows to run sequence of passes in the loop until fixed
point or maximum number of iterations is reached. The usual candidates
are canonicalize+CSE as canonicalize can open more opportunities for CSE
and vice-versa.
Turn `RewriterBase::replaceAllUsesWith` into a non-templatized
implementation, so that it can be made virtual and be overridden in the
`ConversionPatternRewriter` in a subsequent change.
This change is in preparation of adding dialect conversion support for
`replaceAllUsesWith`.
Before this change: `notifyOperationReplaced` was triggered when calling
`RewriteBase::replaceOp`.
After this change: `notifyOperationReplaced` is triggered when
`RewriterBase::replaceAllOpUsesWith` or `RewriterBase::replaceOp` is
called.
Until now, every `notifyOperationReplaced` was always sent together with
a `notifyOperationErased`, which made that `notifyOperationErased`
callback irrelevant. More importantly, when a user called
`RewriterBase::replaceAllOpUsesWith`+`RewriterBase::eraseOp` instead of
`RewriterBase::replaceOp`, no `notifyOperationReplaced` callback was
sent, even though the two notations are semantically equivalent. As an
example, this can be a problem when applying patterns with the transform
dialect because the `TrackingListener` will only see the
`notifyOperationErased` callback and the payload op is dropped from the
mappings.
Note: It is still possible to write semantically equivalent code that
does not trigger a `notifyOperationReplaced` (e.g., when op results are
replaced one-by-one), but this commit already improves the situation a
lot.
There is no good way to report detailed errors from inside
`Pass::initializeOptions` function as context may not be available at
this point and writing directly to `llvm::errs()` is not composable.
See
https://github.com/llvm/llvm-project/pull/87166#discussion_r1546426763
* Add error handler callback to `Pass::initializeOptions`
* Update `PassOptions::parseFromString` to support custom error stream
instead of using `llvm::errs()` directly.
* Update default `Pass::initializeOptions` implementation to propagate
error string from `parseFromString` to new error handler.
* Update `MapMemRefStorageClassPass` to report error details using new
API.
Introduce 3 new optional attributes to the `transform.print` ops:
* `assume_verified`
* `use_local_scope`
* `skip_regions`
The primary motivation is to allow printing on large inputs that
otherwise take forever to print and verify. For the full context, see
this IREE issue: https://github.com/openxla/iree/issues/16901.
Also add some tests and fix the op description.
Add rounding mode attribute to `arith`. This attribute can be used in
different FP `arith` operations to control rounding mode. Rounding modes
correspond to IEEE 754-specified rounding modes. Use in `arith.truncf` folding.
As this is not supported in dialects other than LLVM, conversion should fail for
now in case this attribute is present.
---------
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Both `Transpose` and `Broadcast` specify the `SameVariadicOperandSize`
trait. However neither has a variadic operand let alone more than one.
This is likely a relic from copying the boilerplate of the `Reduce`
definition.
This speeds up registered op creation by 10-11% by allowing lookup by
TypeID instead of StringRef.
This can break your build/tests at runtime with an error that you're creating
an unregistered operation that you have registered. If so you are likely using
a class inheriting from the "real" operation. See for example in this patch the
case of:
class ConstantIndexOp : public arith::ConstantOp {
If one is using `builder.create<ConstantIndexOp>()` they actually create an
`arith.constant` operation, but the builder will fetch the TypeID for
the `ConstantIndexOp` class which does not correspond to any registered
operation. To fix it the `ConstantIndexOp` class got this addition:
static ::mlir::TypeID resolveTypeID() { return TypeID::get<ConstantOp>(); }
Note that even though the sparse runtime support lib always uses SoA
storage for COO storage (and provides correct codegen by means of views
into this storage), in some rare cases we need the true physical SoA
storage as a coordinate buffer. This PR provides that functionality by
means of a (costly) coordinate buffer call.
Since this is currently only used for testing/debugging by means of the
sparse_tensor.print method, this solution is acceptable. If we ever want
a performing version of this, we should truly support AoS storage of COO
in addition to the SoA used right now.
These non-finite math ops are supported by SPIR-V but not by WGSL.
Assume finite floating point values and expand these ops into `false`.
Previously, this worked by adding fast math flags during conversion from
arith to spirv, but this got removed in
https://github.com/llvm/llvm-project/pull/86578.
Also do some misc cleanups in the surrounding code.
Adds support for scalable vectors to patterns defined in
VectorLineralize.cpp.
Linearization is disable in 2 notable cases:
* vectors with more than 1 scalable dimension (we cannot represent
vscale^2),
* vectors initialised with arith.constant that's not a vector splat
(such arith.constant Ops cannot be flattened).
Adds support for fusing two scf.for loops occurring in the same block.
Uses the rudimentary checks already in place for scf.forall (like the
target loop's operands being dominated by the source loop).
- Fixes a bug in the dominance check whereby it was checked that values
in the target loop themselves dominated the source loop rather than the
ops that define these operands.
- Renames the LoopFuseSibling op to LoopFuseSiblingOp.
- Updates LoopFuseSiblingOp's description.
- Adds tests for using LoopFuseSiblingOp on scf.for loops, including one
which fails without the fix for the dominance check.
- Adds tests checking the different failure modes of the dominance
checker.
- Adds test for case whereby scf.yield is automatically generated when
there are no loop-carried variables.
Currently, the builtins used for implementing `va_list` handling
unconditionally take their arguments as unqualified `ptr`s i.e. pointers
to AS 0. This does not work for targets where the default AS is not 0 or
AS 0 is not a viable AS (for example, a target might choose 0 to
represent the constant address space). This patch changes the builtins'
signature to take generic `anyptr` args, which corrects this issue. It
is noisy due to the number of tests affected. A test for an upstream
target which does not use 0 as its default AS (SPIRV for HIP device
compilations) is added as well.
As discussed in one of the previous TOSA community meetings, we would
like to allow for more integer types in the TOSA dialect to enable more
use cases.
For strict standards conformance, the TosaValidation pass can be used.
Follow up PRs will extend conversions from TOSA where needed.