This new option is set to `false` by default. It should be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`.
Two faulty canonicalization patterns were detected and fixed with this change.
Differential Revision: https://reviews.llvm.org/D140873
In many cases, the the number of workgroups (the grid size) and the
number of workitems within each group (the block size) that a GPU
kernel will be launched with are known. For example, if gpu.launch is
called with constant block and grid sizes, we know that those are the
only possible sizes that will be used to launch that kernel. In other
cases, a custom code-generation pipeline that eventually produces GPU
kernels may know the launch dimensions of those kernels, or at least
may be able to provide an upper bound on them.
Other GPU programming systems, such as OpenCL, allow capturing such
information to enable compiler optimizations - see
reqd_work_group_size, but MLIR currently has no mechanism for doing so.
This set of attributes is the first step in enabling optimizations
based on the known launch dimensions of kernels. It extends the kernel
outline pass to set these bounds on kernels with constant launch
dimensions and extends integer range inference for GPU index
operations to account for the bounds when they are known.
Subsequent revisions will use this data when lowering GPU operations
to the ROCDL dialect.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D139865
This is part of an effort to migrate from llvm::Optional to
std::optional. This patch changes the way mlir-tblgen generates .inc
files, and modifies tests and documentation appropriately. It is a "no
compromises" patch, and doesn't leave the user with an unpleasant mix of
llvm::Optional and std::optional.
A non-trivial change has been made to ControlFlowInterfaces to split one
constructor into two, relating to a build failure on Windows.
See also: https://discourse.llvm.org/t/deprecating-llvm-optional-x-hasvalue-getvalue-getvalueor/63716
Signed-off-by: Ramkumar Ramachandra <r@artagnon.com>
Differential Revision: https://reviews.llvm.org/D138934
Reland D139447, D139471 With flang actually working
- FunctionOpInterface: make get/setFunctionType interface methods
This patch removes the concept of a `function_type`-named type attribute
as a requirement for implementors of FunctionOpInterface. Instead, this
type should be provided through two interface methods, `getFunctionType`
and `setFunctionTypeAttr` (*Attr because functions may use different
concrete function types), which should be automatically implemented by
ODS for ops that define a `$function_type` attribute.
This also allows FunctionOpInterface to materialize function types if
they don't carry them in an attribute, for example.
Importantly, all the function "helper" still accept an attribute name to
use in parsing and printing functions, for example.
- FunctionOpInterface: arg and result attrs dispatch to interface
This patch removes the `arg_attrs` and `res_attrs` named attributes as a
requirement for FunctionOpInterface and replaces them with interface
methods for the getters, setters, and removers of the relevent
attributes. This allows operations to use their own storage for the
argument and result attributes.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D139736
This patch removes the `arg_attrs` and `res_attrs` named attributes as a
requirement for FunctionOpInterface and replaces them with interface
methods for the getters, setters, and removers of the relevent
attributes. This allows operations to use their own storage for the
argument and result attributes.
Depends on D139471
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139472
This patch removes the concept of a `function_type`-named type attribute
as a requirement for implementors of FunctionOpInterface. Instead, this
type should be provided through two interface methods, `getFunctionType`
and `setFunctionTypeAttr` (*Attr because functions may use different
concrete function types), which should be automatically implemented by
ODS for ops that define a `$function_type` attribute.
This also allows FunctionOpInterface to materialize function types if
they don't carry them in an attribute, for example.
Importantly, all the function "helper" still accept an attribute name to
use in parsing and printing functions, for example.
Reviewed By: rriddle, lattner
Differential Revision: https://reviews.llvm.org/D139447
`DeviceMappingAttrInterface` is implemented as unifiying mechanism for thread mapping. A code generator could use any attribute that implements this interface to lower `scf.foreach_thread` to device specific code. It is allowed to choose its own mapping and interpretation.
Currently, GPU transform dialect supports only `GPUThreadMapping` and `GPUBlockMapping`; however, other mappings should to be supported as well. This change addresses this issue. It decouples gpu transform dialect from the `GPUThreadMapping` and `GPUBlockMapping`. Now, they can work any other mapping.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D138020
The foldMemRefCast method is defined in memref namespace; the
foldTensorCast method is defined in tensor namespace. This revision
deletes the dup code and use the unified methods.
Reviewed By: dcaballe
Differential Revision: https://reviews.llvm.org/D136379
Introduce `subgroup_reduce` operation, similar to `all_reduce`, but operating on subgroup scope instead of workgroup.
It is intended as low-level building block for more high level abstractions (e.g for workgroup-wide `all_reduce` ops).
Only introduce version taking reduce operation enum for simplicity sake.
Differential Revision: https://reviews.llvm.org/D135323
This allows for incrementally updating the old API usages without
needing to update everything at once. These will be left on Both
for a little bit and then flipped to prefixed when all APIs have been
updated.
Differential Revision: https://reviews.llvm.org/D134386
GPUFuncOpLowering moves the body out of gpu.func op and erases it. An empty gpu.func may fail verification but should not crash it. Verification of an erased op is triggered e.g. with debug printing on.
Reviewed By: akuegel
Differential Revision: https://reviews.llvm.org/D132446
This reland includes changes to the Python bindings.
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.
Depends on D131801
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D131803
Switch variadic operand and result segment size attributes to use the
dense i32 array. Dense integer arrays were introduced primarily to
represent index lists. They are a better fit for segment sizes than
dense elements attrs.
Depends on D131738
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D131702
Erase gpu.memcpy op when only uses of dest are
the memcpy op in question, its allocation and deallocation
ops.
Reviewed By: bondhugula, csigg
Differential Revision: https://reviews.llvm.org/D124257
Move async copy operations to NVGPU as they only exist on NV target and are
designed to match ptx semantic. This allows us to also add more fine grain
caching hint attribute to the op.
Add hint to bypass L1 and hook it up to NVVM op.
Differential Revision: https://reviews.llvm.org/D125244
MLIR has a common pattern for "arguments" that uses syntax
like `%x : i32 {attrs} loc("sourceloc")` which is implemented
in adhoc ways throughout the codebase. The approach this uses
is verbose (because it is implemented with parallel arrays) and
inconsistent (e.g. lots of things drop source location info).
Solve this by introducing OpAsmParser::Argument and make addRegion
(which sets up BlockArguments for the region) take it. Convert the
world to propagating this down. This means that we correctly
capture and propagate source location information in a lot more
cases (e.g. see the affine.for testcase example), and it also
simplifies much code.
Differential Revision: https://reviews.llvm.org/D124649
The asm parser had a notional distinction between parsing an
operand (like "%foo" or "%4#3") and parsing a region argument
(which isn't supposed to allow a result number like #3).
Unfortunately the implementation has two problems:
1) It didn't actually check for the result number and reject
it. parseRegionArgument and parseOperand were identical.
2) It had a lot of machinery built up around it that paralleled
operand parsing. This also was functionally identical, but
also had some subtle differences (e.g. the parseOptional
stuff had a different result type).
I thought about just removing all of this, but decided that the
missing error checking was important, so I reimplemented it with
a `allowResultNumber` flag on parseOperand. This keeps the
codepaths unified and adds the missing error checks.
Differential Revision: https://reviews.llvm.org/D124470
When Location tracking support for block arguments was added, we
discussed various approaches to threading support for this through
function-like argument parsing. At the time, we added a parallel array
of locations that could hold this. It turns out that that approach was
verbose and error prone, roughly no one adopted it.
This patch takes a different approach, adding an optional source
locator to the UnresolvedOperand class. This fits much more naturally
into the standard structure we use for representing locators, and gives
all the function like dialects locator support for free (e.g. see the
test adding an example for the LLVM dialect).
Differential Revision: https://reviews.llvm.org/D124188
Add async dependencies support for gpu.launch op: this allows specifying
a list of async tokens ("streams") as dependencies for the launch.
Update the GPU kernel outlining pass lowering to propagate async
dependencies from gpu.launch to gpu.launch_func op. Previously, a new
stream was being created and destroyed for a kernel launch. The async
deps support allows the kernel launch to be serialized on an existing
stream.
Differential Revision: https://reviews.llvm.org/D123499
NFC. Drop trailing end of line white space in GPU async ops' printer
whenever the list of async deps is empty.
Reviewed By: mehdi_amini, rriddle
Differential Revision: https://reviews.llvm.org/D123754
Fold away gpu.memcpy op when only uses of dest are
the memcpy op in question, its allocation and deallocation
ops.
Reviewed By: bondhugula
Differential Revision: https://reviews.llvm.org/D121279
I am not sure about the meaning of Type in the name (was it meant be interpreted as Kind?), and given the importance and meaning of Type in the context of MLIR, its probably better to rename it. Given the comment in the source code, the suggestion in the GitHub issue and the final discussions in the review, this patch renames the OperandType to UnresolvedOperand.
Fixes https://github.com/llvm/llvm-project/issues/54446
Differential Revision: https://reviews.llvm.org/D122142
This removes any potential confusion with the `getType` accessors
which correspond to SSA results of an operation, and makes it
clear what the intent is (i.e. to represent the type of the function).
Differential Revision: https://reviews.llvm.org/D121762
In this CL, update the function name of verifier according to the
behavior. If a verifier needs to access the region then it'll be updated
to `verifyRegions`.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D120373
Add verifier for gpu.alloc op to verify if the dimension operand counts
and symbol operand counts are same as their memref counterparts.
Differential Revision: https://reviews.llvm.org/D117427
Add new operations to the gpu dialect to represent device side
asynchronous copies. This also add the lowering of those operations to
nvvm dialect.
Those ops are meant to be low level and map directly to llvm dialects
like nvvm or rocdl.
We can further add higher level of abstraction by building on top of
those operations.
This has been discuss here:
https://discourse.llvm.org/t/modeling-gpu-async-copy-ampere-feature/4924
Differential Revision: https://reviews.llvm.org/D119191