This commit makes reductions part of the terminator. Instead of
`scf.yield`, `scf.reduce` now terminates the body of `scf.parallel` ops.
`scf.reduce` may contain an arbitrary number of reductions, with one
region per reduction.
Example:
```mlir
%init = arith.constant 0.0 : f32
%r:2 = scf.parallel (%iv) = (%lb) to (%ub) step (%step) init (%init, %init)
-> f32, f32 {
%elem_to_reduce1 = load %buffer1[%iv] : memref<100xf32>
%elem_to_reduce2 = load %buffer2[%iv] : memref<100xf32>
scf.reduce(%elem_to_reduce1, %elem_to_reduce2 : f32, f32) {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.addf %lhs, %rhs : f32
scf.reduce.return %res : f32
}, {
^bb0(%lhs : f32, %rhs: f32):
%res = arith.mulf %lhs, %rhs : f32
scf.reduce.return %res : f32
}
}
```
`scf.reduce` operations can no longer be interleaved with other ops in
the body of `scf.parallel`. This simplifies the op and makes it possible
to assign the `RecursiveMemoryEffects` trait to `scf.reduce`. (This was
not possible before because the op was not a terminator, causing the op
to be DCE'd.)
This commit implements `LoopLikeOpInterface` on `scf.while`. This
enables LICM (and potentially other transforms) on `scf.while`.
`LoopLikeOpInterface::getLoopBody()` is renamed to `getLoopRegions` and
can now return multiple regions.
Also fix a bug in the default implementation of
`LoopLikeOpInterface::isDefinedOutsideOfLoop()`, which returned "false"
for some values that are defined outside of the loop (in a nested op, in
such a way that the value does not dominate the loop). This interface is
currently only used for LICM and there is no way to trigger this bug, so
no test is added.
This is generated by running
```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```
Reviewed By: rriddle, dcaballe
Differential Revision: https://reviews.llvm.org/D138866
Pack and Unpack return new tensors within which the individual elements
are reshuffled according to the packing specification. This has the
consequence of modifying the canonical order in which a given operator
(i.e., Matmul) accesses the individual elements. After bufferization,
this typically translates to increased access locality and cache
behavior improvement, e.g., eliminating cache line splitting.
Co-authored-by: Mahesh Ravishankar <ravishankarm@google.com>
Co-authored-by: Han-Chung Wang <hanchung@google.com>
RFC: https://discourse.llvm.org/t/rfc-tensor-pack-and-tensor-unpack/66408/1
Reviewed By: nicolasvasilache, rengolin, hanchung
Differential Revision: https://reviews.llvm.org/D138119
for (I = Start; I < End; I += 1) always terminates so mark
{scf|affine}.for as RecursivelySpeculatable when step is known to be
1.
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D136376
These operations have undefined behavior if the index is not less than the rank of the source tensor / memref, so they cannot be freely speculated like they were before this patch. After this patch we speculate them only if we can prove that they don't have UB.
Depends on D135505.
Reviewed By: mravishankar
Differential Revision: https://reviews.llvm.org/D135748
This patch takes the first step towards a more principled modeling of undefined behavior in MLIR as discussed in the following discourse threads:
1. https://discourse.llvm.org/t/semantics-modeling-undefined-behavior-and-side-effects/4812
2. https://discourse.llvm.org/t/rfc-mark-tensor-dim-and-memref-dim-as-side-effecting/65729
This patch in particular does the following:
1. Introduces a ConditionallySpeculatable OpInterface that dynamically determines whether an Operation can be speculated.
2. Re-defines `NoSideEffect` to allow undefined behavior, making it necessary but not sufficient for speculation. Also renames it to `NoMemoryEffect`.
3. Makes LICM respect the above semantics.
4. Changes all ops tagged with `NoSideEffect` today to additionally implement ConditionallySpeculatable and mark themselves as always speculatable. This combined trait is named `Pure`. This makes this change NFC.
For out of tree dialects:
1. Replace `NoSideEffect` with `Pure` if the operation does not have any memory effects, undefined behavior or infinite loops.
2. Replace `NoSideEffect` with `NoSideEffect` otherwise.
The next steps in this process are (I'm proposing to do these in upcoming patches):
1. Update operations like `tensor.dim`, `memref.dim`, `scf.for`, `affine.for` to implement a correct hook for `ConditionallySpeculatable`. I'm also happy to update ops in other dialects if the respective dialect owners would like to and can give me some pointers.
2. Update other passes that speculate operations to consult `ConditionallySpeculatable` in addition to `NoMemoryEffect`. I could not find any other than LICM on a quick skim, but I could have missed some.
3. Add some documentation / FAQs detailing the differences between side effects, undefined behavior, speculatabilty.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D135505
Many tests still depend on specific names of SSA values (!!).
This commit is a best effort cleanup that will set the stage for adding some pretty SSA result names.
Changes the algorithm of LICM to support graph regions (no guarantee of topologically sorted order). Also fixes an issue where ops with recursive side effects and regions would not be hoisted if any nested ops used operands that were defined within the nested region.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D122465
LICM checks that nested ops depend only on values defined outside
before performing hoisting.
However, it specifically omits to check for terminators which can
lead to SSA violations.
This revision fixes the incorrect behavior.
Differential Revision: https://reviews.llvm.org/D116657
Precursor: https://reviews.llvm.org/D110200
Removed redundant ops from the standard dialect that were moved to the
`arith` or `math` dialects.
Renamed all instances of operations in the codebase and in tests.
Reviewed By: rriddle, jpienaar
Differential Revision: https://reviews.llvm.org/D110797
This commit introduced a cyclic dependency:
Memref dialect depends on Standard because it used ConstantIndexOp.
Std depends on the MemRef dialect in its EDSC/Intrinsics.h
Working on a fix.
This reverts commit 8aa6c3765b.
Create the memref dialect and move several dialect-specific ops without
dependencies to other ops from std dialect to this dialect.
Moved ops:
AllocOp -> MemRef_AllocOp
AllocaOp -> MemRef_AllocaOp
DeallocOp -> MemRef_DeallocOp
MemRefCastOp -> MemRef_CastOp
GetGlobalMemRefOp -> MemRef_GetGlobalOp
GlobalMemRefOp -> MemRef_GlobalOp
PrefetchOp -> MemRef_PrefetchOp
ReshapeOp -> MemRef_ReshapeOp
StoreOp -> MemRef_StoreOp
TransposeOp -> MemRef_TransposeOp
ViewOp -> MemRef_ViewOp
The roadmap to split the memref dialect from std is discussed here:
https://llvm.discourse.group/t/rfc-split-the-memref-dialect-from-std/2667
Differential Revision: https://reviews.llvm.org/D96425
This revision refactors the way that attributes/types are considered when generating aliases. Instead of considering all of the attributes/types of every operation, we perform a "fake" print step that prints the operations using a dummy printer to collect the attributes and types that would actually be printed during the real process. This removes a lot of attributes/types from consideration that generally won't end up in the final output, e.g. affine map attributes in an `affine.apply`/`affine.for`.
This resolves a long standing TODO w.r.t aliases, and helps to have a much cleaner textual output format. As a datapoint to the latter, as part of this change several tests were identified as testing for the presence of attributes aliases that weren't actually referenced by the custom form of any operation.
To ensure that this wouldn't cause a large degradation in compile time due to the second full print, I benchmarked this change on a very large module with a lot of operations(The file is ~673M/~4.7 million lines long). This file before this change take ~6.9 seconds to print in the custom form, and ~7 seconds after this change. In the custom assembly case, this added an average of a little over ~100 miliseconds to the compile time. This increase was due to the way that argument attributes on functions are structured and how they get printed; i.e. with a better representation the negative impact here can be greatly decreased. When printing in the generic form, this revision had no observable impact on the compile time. This benchmarking leads me to believe that the impact of this change on compile time w.r.t printing is closely related to `print` methods that perform a lot of additional/complex processing outside of the OpAsmPrinter.
Differential Revision: https://reviews.llvm.org/D90512
Previously they were separated into "instance" and "kind" aliases, and also required that the dialect know ahead of time all of the instances that would have a corresponding alias. This approach was very clunky and not ergonomic to interact with. The new approach is to provide the dialect with an instance of an attribute/type to provide an alias for, fully replacing the original split approach.
Differential Revision: https://reviews.llvm.org/D89354
All ops of the SCF dialect now use the `scf.` prefix instead of `loop.`. This
is a part of dialect renaming.
Differential Revision: https://reviews.llvm.org/D79844
Summary:
This is to allow optimizations like loop invariant code motion to work
on the ParallelOp.
Additional small cleanup on the ForOp implementation of
LoopLikeInterface and the test file of loop-invariant-code-motion.
Differential Revision: https://reviews.llvm.org/D77128
Summary:
The old interface was a temporary stopgap to allow for implementing simple LICM that took side effects of region operations into account. Now that MLIR has proper support for specifying memory effects, this interface can be deleted.
Differential Revision: https://reviews.llvm.org/D74441
Summary: The current syntax for AffineMapAttr and IntegerSetAttr conflict with function types, making it currently impossible to round-trip function types(and e.g. FuncOp) in the IR. This revision changes the syntax for the attributes by wrapping them in a keyword. AffineMapAttr is wrapped with `affine_map<>` and IntegerSetAttr is wrapped with `affine_set<>`.
Reviewed By: nicolasvasilache, ftynse
Differential Revision: https://reviews.llvm.org/D72429
Change the AsmPrinter to number values breadth-first so that values in adjacent regions can have the same name. This allows for ModuleOp to contain operations that produce results. This also standardizes the special name of region entry arguments to "arg[0-9+]" now that Functions are also operations.
PiperOrigin-RevId: 257225069
In most places, this is just a name change (with the exception of affine.dma_start swapping the operand positions of its tag memref and num_elements operands).
Significant code changes occur here:
*) Vectorization: LoopAnalysis.cpp, Vectorize.cpp
*) Affine Transforms: Transforms/Utils/Utils.cpp
PiperOrigin-RevId: 256395088
These were likely added in error because of confusion about the flag when it was just called "-verify". The extra flag doesn't cause much harm, but it does make mlir-opt do more work and clutter the RUN line
PiperOrigin-RevId: 254037016
This name has caused some confusion because it suggests that it's running op verification (and that this verification isn't getting run by default).
PiperOrigin-RevId: 254035268
Trying to activate both LLVM and MLIR passes in mlir-cpu-runner showed name collisions when registering pass names.
One possible way of disambiguating that should also work across dialects is to prepend the dialect name to the passes that specifically operate on that dialect.
With this CL, mlir-cpu-runner tests still run when both LLVM and MLIR passes are registered
--
PiperOrigin-RevId: 246539917