When establishing the correspondence between transform values and
payload operations or parameters, check that the latter are non-null and
report errors. This was previously allowed for exotic cases of partially
successfull transformations with "apply each" trait, but was dangerous.
The "apply each" implementation was reworked to remove the need for this
functionality, so this can now be hardned to avoid null pointer
dereferences.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141142
This commit introduces branch weight attributes to the LLVM::CallOp and
LLVM::InvokeOp and adds both import and export of them.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D141122
This patch adds a `SymbolTableAnalysis` that can be used with the
analysis manager. It contains a symbol table collection. This analysis
allows symbol tables to be preserved across passes so that they do not
need to be recomputed. The analysis assumes it remains valid because
most transformations automatically keep symbol tables up-to-date using
its `insert` and `erase` methods.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D139666
Add a folder for LogicalNotEqual when rhs is false. This pattern shows
up after lowering to SPIRV.
Differential Revision: https://reviews.llvm.org/D141163
This change introduces new LLVMIR dialect operations to represent
TBAA root, type descriptor and access tag metadata nodes.
For the purpose of importing TBAA metadata from LLVM IR it only
supports the current version of TBAA format described in
https://llvm.org/docs/LangRef.html#tbaa-metadata (i.e. size-aware
representation introduced in D41501 is not supported).
TBAA attribute support is only added for LLVM::LoadOp and LLVM::StoreOp.
Support for intrinsics operations (e.g. LLVM::MemcpyOp) may be added later.
The TBAA attribute is represented as an array of access tags, though,
LLVM IR supports only single access tag per memory accessing instruction.
I implemented it as an array anticipating similar support in LLVM IR
to combine TBAA graphs with different roots for Flang - one of the options
described in https://docs.google.com/document/d/16kKZVmI585wth01VSaJAqZMZpoX68rcdBmgfj0kNAt0/edit#heading=h.jzzheaz9vqac
It should be easy to restrict MLIR operation to a single access tag,
if we end up using a different approach for Flang.
Differential Revision: https://reviews.llvm.org/D140768
Conv3D has an existing linalg operation for floating point. Adding a quantized
variant and corresponding lowering from TOSA. Numerical correctness was validated
using the TOSA conformance tests.
Reviewed By: jpienaar
Differential Revision: https://reviews.llvm.org/D140919
In several cases, the splitting may be known to be a noop, i.e., produce
no second part. Thread this information through the transform utilities
to the transform dialect, and differentiate it from the error state.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D141138
Adapt the implementation of TransformEachOpTrait to the existence of
parameter values recently introduced into the transform dialect. In
particular, allow `applyToOne` hooks to return a list containing a mix
of `Operation *` that will be associated with handles and `Attribute`
that will be associated with parameter values by the trait
implementation of the transform interface's `apply` method.
Disentangle the "transposition" of the list of per-payload op partial
results to decrease its overall complexity and detemplatize the code
that doesn't really need templates. This removes the poorly documented
special handling for single-result ops with TransformEachOpTrait that
could have assigned null pointer values to handles.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140979
It was originally placed in TransformInterfaces for convenience, but it
is really a generic utility. It may also create an include cycle between
TransformTypes and TransformInterfaces if the latter needs to include
the former because the former uses the failure util.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140978
This makes it more consistent with the recently added
TransformParamTypeInterface.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140977
Introduce a new kind of values into the transform dialect -- parameter
values. These values have a type implementing the new
`TransformParamTypeInterface` and are associated with lists of
attributes rather than lists of payload operations. This mechanism
allows one to wrap numeric calculations, typically heuristics, into
transform operations separate from those at actually applying the
transformation. For example, tile size computation can be now separated
from tiling itself, and not hardcoded in the transform dialect. This
further improves the separation of concerns between transform choice and
implementation.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D140976
This commit introduces the function_entry_count metadata field to the
LLVMFuncOp and adds both the corresponding import and export
funtionalities.
The import of the function metadata uses the same infrastructure as the
instruction metadata, i.e., it dispatches through a dialect interface.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D141001
This is needed because WGSL does not yet support extended multiplication
ops.
Set up pattern/pass stuff and handle the first op: `UMulExtended`.
`SMulExtended` handling will go to a separate patch.
Issue: https://github.com/llvm/llvm-project/issues/59563
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D140995
Comment is stale now that kDynamic is defined as intmin instead of -1.
Confirmed that implementation in `parseDimensionListRanked` uses kDynamic.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D140994
Return failure if the import of a global variable fails and add a
test case to check the emitted error message. Additionally, convert
the globals in iteration order and do not process them recursively
when translating a constant expression referencing it. Additionally,
use the module location rather unknown location.
Reviewed By: Dinistro
Differential Revision: https://reviews.llvm.org/D140966
This new option is set to `false` by default. It should be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`.
Two faulty canonicalization patterns were detected and fixed with this change.
Differential Revision: https://reviews.llvm.org/D140873
This commit adds support for importing the magic globals "global_ctors"
and "global_dtors" from LLVM IR to the LLVM IR dialect. The import
fails when these globals have a non-null data pointer, as this can
currently not be represented in the corresponding MLIR operations.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D140877
Move code from SCF to Affine: Add a new helper function `simplifyConstrainedMinMaxOp` to Affine/Analysis/Utils.h. `canonicalizeMinMaxOp` was originally designed for loop peeling, but it is not SCF-specific and can be used to simplify any affine.min/max ops.
Various functions in SCF/Transforms are simplified by dropping unnecessary parameters.
Differential Revision: https://reviews.llvm.org/D140962
AbstractDenseDataFlowAnalysis::visitOperation controls how the dataflow
analysis proceeds around control flow. In particular, conservative
assumptions are made about call operations which can prevent some
analysis from succeeding.
The motivating case for this change is https://reviews.llvm.org/D140415,
for which it is correct and necessary for the lattice to be preserved
after call operations.
Some renaming was necessary to avoid confusion with
DenseDataFlowAnalysis::visitOperation.
AbstractDenseDataFlowAnalysis::visitRegionBranchOperation and
DenseDataFlowAnalysis::visitOperationImpl are also made protected
to allow implementation of AbstractDenseDataFlowAnalysis::visitOperation,
although I did not need these to be virtual.
Differential Revision: https://reviews.llvm.org/D140879
When the number of elements of two shapes are not equal, a Reshape operation cannot be used to transfer one into another
Function findIntermediateShape(...) can cause out-of-boundaries operator[] call if the abovementioned condition strikes
The test-case I used now causes no error as its root-cause was an issue in Tosa dialect with padded Conv2D operations lowering which is already solved in commit 69c984b6
Reviewed By: rsuderman
Differential Revision: https://reviews.llvm.org/D140013
Noticed one of the ops had its arguments overridden (two consecutive let
statements) so fixed that and then went through fitting to file to 80
col/making document paragraphs more consistent.
This revision extends the LLVMImportDialectInterface to make the import
of LLVM IR instruction-level metadata extensible. It extends the
signature of the existing dialect interface to provide a method to
import specific metadata kinds and attach them to the imported
operation. The conversion function can rely on the ModuleImport class
to perform support tasks.
The revision implements the second part of the
"extensible llvm ir import" rfc:
https://discourse.llvm.org/t/rfc-extensible-llvm-ir-import/67256/6
The interface method names changed a bit compared to the suggested
design. The hook to set the instruction level metadata is now called
setMetadataAttrs and takes the metadata kind as an additional parameter.
We do not hand in the original LLVM IR instruction since it is not used
at this point. Importing named module-level meta data can be added in a
later stage after gaining some experience with this extension mechanism.
Depends on D140374
Reviewed By: ftynse, Dinistro
Differential Revision: https://reviews.llvm.org/D140556
There's currently no way to get accurate cube roots in the math dialect.
powf(x, 1/3.0) is too inaccurate in some cases.
Reviewed By: akuegel
Differential Revision: https://reviews.llvm.org/D140842
1. When converting from the GPU dialect to the ROCDL dialect, if the
function that contains a gpu.thread_id or gpu.block_id op is annotated
with gpu.known_{block,grid}_size, use that size to set a "range"
attribute on the corresponding rocdl intrinsic so that the LLVM
frontend can optimize based on that range information.
1b. When translating from the rocdl dialect to LLVM IR, use the
"range" attribute, if present, to set !range metadata on the relevant
function call.
2. Deprecate the old rocdl.max_flat_work_group_size attribute, which
was used in a tensorflow backend. Instead, use
rocdl.flat_work_group_size going forward to allow kernel generators to
specify the minimum and maximum work group sizes a kernel may be
launched with in one attribute, thus more closely matching the backend.
3. When translating from gpu.func to llvm.func within gpu-to-rocdl,
copy the known_block_size attribute as rocdl.reqd_work_group_size to
enable further translations to set the corresponding metadata on the
LLVM IR function. Also, set the rocdl.flat_work_group_size attribute
to ensure that the reqd_work_group_size metadata and the
amdgpu-flat-work-group-size metadata are consistent.
3b. Extend the ROCDL to LLVM IR translation to set the
!reqd_work_group_size metadata on LLVM functions
Also update tests and add functions to the ROCDL dialect to ensure
attribute names are used consistently.
Depends on D139865
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D139866
The revision introduces the LLVMImportDialectInterface to make the
import of LLVM IR intrinsics extensible. It uses a dialect interface
that enables external projects to provide their own conversion functions
for custom intrinsics. These conversion functions can rely on the
ModuleImport class to perform support tasks such as mapping LLVM
values to MLIR values or for converting types between the two worlds.
The implementation largely mirrors the export implementation. One major
difference is the dispatch to the appropriate dialect interface, since
LLVM IR intrinsics have no direct association to an MLIR dialect. The
dialect interfaces thus have to publish the supported intrinsics to
ensure incoming conversion calls are dispatched to the right dialect
interface.
The revision implements the extensible intrinsic import discussed as
part of the "extensible llvm ir import" rfc:
https://discourse.llvm.org/t/rfc-extensible-llvm-ir-import/67256/6
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D140374
Explicit instantiations should be declared. Found with -Wundefined-func-template.
Reviewed By: dblaikie, rriddle
Differential Revision: https://reviews.llvm.org/D140594
The greedy pattern rewriter consists of two nested loops. `config.maxIterations` (which configurable on the CanonicalizerPass) controls the maximum number of iterations of the outer loop.
```
/// This specifies the maximum number of times the rewriter will iterate
/// between applying patterns and simplifying regions. Use `kNoLimit` to
/// disable this iteration limit.
int64_t maxIterations = 10;
```
This change adds `config.maxNumRewrites` which controls the maximum number of pattern rewrites within an iteration. (It effectively control the maximum number of iterations of the inner loop.)
This flag is meant for debugging and useful in cases where one or multiple faulty patterns can be applied indefinitely, resulting in an infinite loop.
Differential Revision: https://reviews.llvm.org/D140525
In many cases, the the number of workgroups (the grid size) and the
number of workitems within each group (the block size) that a GPU
kernel will be launched with are known. For example, if gpu.launch is
called with constant block and grid sizes, we know that those are the
only possible sizes that will be used to launch that kernel. In other
cases, a custom code-generation pipeline that eventually produces GPU
kernels may know the launch dimensions of those kernels, or at least
may be able to provide an upper bound on them.
Other GPU programming systems, such as OpenCL, allow capturing such
information to enable compiler optimizations - see
reqd_work_group_size, but MLIR currently has no mechanism for doing so.
This set of attributes is the first step in enabling optimizations
based on the known launch dimensions of kernels. It extends the kernel
outline pass to set these bounds on kernels with constant launch
dimensions and extends integer range inference for GPU index
operations to account for the bounds when they are known.
Subsequent revisions will use this data when lowering GPU operations
to the ROCDL dialect.
Reviewed By: antiagainst
Differential Revision: https://reviews.llvm.org/D139865