Delete the backslash. It was there to compile tablegen file. It looks like space also works fine.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D155474
This revision fixes `hasTensorSemantics` and `hasBufferSemantics` for vector transfer ops, which may have a vector operand. `VectorType` implements `ShapedType` and such operands do not affect whether an op has tensor or buffer semantics. Also implement `DestinationStyleOpInterface` on `TransferReadOp` so that `hasTensorSemantics`/`hasBufferSemantics` can be called. (The op has no inits, but this makes it symmetric to `TransferWriteOp`.)
Differential Revision: https://reviews.llvm.org/D155469
This work introduce `cp.async.bulk.tensor.shared.cluster.global` in NVVM dialect that executes load using TMA.
Depends on D155056
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155060
Add a simple transform operation to the NVGPU extension that performs
software pipelining of copies to shared memory. The functionality is
extremely minimalistic in this version and only supports copies from
global to shared memory inside an `scf.for` loop with either
`vector.transfer` or `nvgpu.device_async_copy` operations when
pipelining preconditions are already satisfied in the IR. This is the
minimally useful version that uses the more general loop pipeliner in an
NVGPU-specific way. Further extensions and orthogonalizations will be
necessary.
This required a change to the loop pipeliner itself to properly
propagate errors should the predicate generator fail.
This is loosely inspired from the vesion in IREE, but has less unsafe
assumptions and more principled way of communicating decisions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155223
* Move passes to `Transforms` directory.
* Add `Utils.h` (will be utilized in a subsequent change).
Differential Revision: https://reviews.llvm.org/D155427
reshape(reshape(x)) -> reshape(x) can be directly written as a fold instead of a canonicalization,
to help other passes cleanup while they work.
This initially broke ReshapeConverterExpand/Collapse, which relies on creating foldable reshapes and a carefully crafted
benefit priority of patterns.
I turned this into a single pattern on reshapes, which does expand and/or collapse as needed in one go.
Differential Revision: https://reviews.llvm.org/D155266
* Rename functions with underscore to camel case.
* Return C++ bools of "in_bounds" values instead of an `ArrayAttr`.
Differential Revision: https://reviews.llvm.org/D155277
Using MLIR attributes instead of metadata has many advantages:
* No indirection: Attributes can simply refer to each other seemlessly without having to use the indirection of `SymbolRefAttr`. This also gives us correctness by construction in a lot of places as well
* Multithreading safe: The Attribute infrastructure gives us thread-safety for free. Creating operations and inserting them into a block is not thread-safe. This is a major use case for e.g. the inliner in MLIR which runs in parallel
* Easier to create: There is no need for a builder or a metadata region
This patch therefore does exactly that. It leverages the new distinct attributes to create distinct access groups in a deterministic and threadsafe manner.
Differential Revision: https://reviews.llvm.org/D155285
Using MLIR attributes instead of metadata has many advantages:
* No indirection: Attributes can simply refer to each other seemlessly without having to use the indirection of `SymbolRefAttr`. This also gives us correctness by construction in a lot of places as well
* Multithreading save: The Attribute infrastructure gives us thread-safety for free. Creating operations and inserting them into a block is not thread-safe. This is a major use case for e.g. the inliner in MLIR which runs in parallel
* Easier to create: There is no need for a builder or a metadata region
This patch therefore does exactly that. It leverages the new distinct attributes to create distinct alias domains and scopes in a deterministic and threadsafe manner.
Differential Revision: https://reviews.llvm.org/D155159
It also unifies the computation of StridedLayoutAttr. If the stride is
static known value, we can just use it.
Differential Revision: https://reviews.llvm.org/D155017
When targeting NVIDIA GPUs, seeing the generated PTX is important. Currently, we don't have simple way to do it.
This work adds dump-ptx to gpu-to-cubin pass. One can use it like `gpu-to-cubin{chip=sm_90 features=+ptx80 dump-ptx}`.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155166
This patch implements an early outlining transform of omp.target operations in
flang. The pass is needed because optimizations may cross target op region
boundaries, but with the outlining the resulting functions only contain a
single omp.target op plus a func.return, so there should not be any opportunity
to optimize across region boundaries.
The patch also adds an interface to be able to store and retrieve the parent
function name of the original target operation. This is needed to be able to
create correct kernel function names when lowering to LLVM-IR.
Reviewed By: kiranchandramohan, domada
Differential Revision: https://reviews.llvm.org/D154879
`SubElementInterface::replaceImmediateSubElementsImpl` specializes tuples so that
arguments are forwarded to type getter. However currently pairs are not supported even though
an example in documents uses a pair as a key type. This patch adds support for pairs as well.
Reviewed By: Mogball
Differential Revision: https://reviews.llvm.org/D155043
Remove Tosa_Tensor1Dto4D and Tosa_TensorUpto4D in the Tosa Dialect
and added level checks to TosaValidation pass to validate per spec.
Signed-off-by: Tai Ly <tai.ly@arm.com>
Change-Id: Icd32137e9f8051f99994cee9f388f20c1a840f4b
Reviewed By: eric-k256
Differential Revision: https://reviews.llvm.org/D154273
Generalize `extractFromI64ArrayAttr` to `extractFromIntegerArrayAttr`, so that arbitrary integer/bool types can be extracted.
Differential Revision: https://reviews.llvm.org/D154974
Support extra concrete class declarations and definitions under NativeTrait that get injected into the class that specifies the trait. Extra declarations and definitions can be passed in as template arguments for NativeOpTraitNativeAttrTrait and NativeTypeTrait.
Usage examples of this feature include:
- Creating a wrapper Trait for authoring inferReturnTypes with the OpAdaptor by specifying necessary Op specific declarations and definitions directly in the trait
- Refactoring the InferTensorType trait
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D154731
The `static_(num_threads|tile_sizes)` attributes of this op are
`DefaultValuedOptionalAttr`s, so they can be constructed *without* such
an attribute. In other words, the following is a valid op (note the
absense of the `static_num_threads` attribute):
"builtin.module"() ({
"transform.sequence"() <{failure_propagation_mode = 1 : i32, operand_segment_sizes = array<i32: 0, 0>}> ({
^bb0(%arg0: !pdl.operation, %arg1: !transform.op<"linalg.matmul">, %arg2: !transform.op<"linalg.elemwise_binary">):
%0 = "transform.structured.match"(%arg0) <{ops = ["test.dummy"]}> : (!pdl.operation) -> !pdl.operation
%1:2 = "transform.structured.tile_to_forall_op"(%arg1, %0) <{operand_segment_sizes = array<i32: 1, 0, 0, 0, 1>}> : (!transform.op<"linalg.matmul">, !pdl.operation) -> (!transform.op<"scf.forall">, !transform.op<"linalg.matmul">)
"transform.yield"() : () -> ()
}) : () -> ()
}) : () -> ()
However, the custom printing directive converted those to an `ArrayRef`,
which crashes if done on an empty `ArrayAttr`. This patch changes the
signature such that no automatic conversion takes place and extends the
test to test for existinnce of the attribute.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155062
getPtx used to return `const char*`. It is not flexible when one needs to build string in the function. This work changes return type.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D155056
This adds operations for binary multiplicative arithmetic operators to
EmitC. The input and output arguments for the remainder operator are
restricted to index (emitted as size_t), integers and the EmitC opaque
types (as the operator can be overloaded for a custom type). The
multiplication and division operator further support floating point
numbers.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D154846
This patch is a following for the previous patch https://reviews.llvm.org/D151519.
With this patch, vector.load op with narrow bitwidth (e.g., i4) can be converted to
supported wider bitwidth (e.g., i8).
Reviewed By: hanchung
Differential Revision: https://reviews.llvm.org/D154178
To complement the bf16 expansion and truncation patterns added to
ExpandOps, define a pass that replaces, for any arithmetic operation
op,
%y = arith.op %v0, %v1, ... : T
with
%e0 = arith.expf %v0 : T to U
%e1 = arith.expf %v1 : T to U
...
%y.exp = arith.op %e0, %e1, ... : U
%y = arith.truncf %y.exp : U to T
This allows for "emulating" floating-point operations not supported on
a given target (such as bfloat operations or most arithmetic on 8-bit
floats) by extending those types to supported ones, performing the
arithmetic operation, and then truncating back to the original
type (which ensures appropriate rounding behavior).
The lowering of the extf and truncf ops introduced by this
transformation should be handled by subsequent passes.
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D154539
This is the counterpart to the forward dense dataflow analysis and
integrates into the dataflow framework. The implementation follows the
structure of existing dataflow analyses.
Reviewed By: Mogball, phisiart
Differential Revision: https://reviews.llvm.org/D154713
`mbarrier` is a barrier created in shared memory that supports different flavors of synchronizing threads other than `__syncthreads`, for more information see below.
https://docs.nvidia.com/cuda/parallel-thread-execution/#parallel-synchronization-and-communication-instructions-mbarrier
This work adds initial Ops wrt `mbarrier` to nvgpu dialect.
First, it introduces to two types:
`mbarrier.barrier` that is barrier object in shared memory
`mbarrier.barrier.token` that is token
It introduces following Ops:
`mbarrier.create` creates `mbarrier.barrier`
`mbarrier.init` initializes `mbarrier.barrier`
`mbarrier.arrive` performs arrive-on `mbarrier.barrier` returns `mbarrier.barrier.token`
`mbarrier.arrive.nocomplete` performs arrive-on (non-blocking) `mbarrier.barrier` returns `mbarrier.barrier.token`
`mbarrier.test_wait` waits on `mbarrier.barrier` and `mbarrier.barrier.token`
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D154090
Add a new option that allows users to specify a memcpy op: "memref.tensor_store", "memref.copy" or "linalg.copy".
Differential Revision: https://reviews.llvm.org/D154968
This unit attribute indicates to the bufferization that the resulting buffer will not be written to by another op.
Differential Revision: https://reviews.llvm.org/D154967
Return all ops that were generated as part of the bufferization, so that users do not have to match them in the enclosing op.
Differential Revision: https://reviews.llvm.org/D154966
Define `llvm.intr.var.annotation`, `llvm.intr.ptr.annotation` and
`llvm.intr.annotation` in the llvm dialect as `llvm.var.annotation`,
`llvm.ptr.annotation` and `llvm.annotation` counterparts.
Signed-off-by: Victor Perez <victor.perez@codeplay.com>
Differential Revision: https://reviews.llvm.org/D154842