Operation memref.reinterpret_cast was accept input like:
%out = memref.reinterpret_cast %in to offset: [%offset], sizes: [10],
strides: [1]
: memref<?xf32> to memref<10xf32>
A problem arises: while lowering, the true offset of %out is %offset,
but its data type indicates an offset of 0. Permitting this
inconsistency can result in incorrect outcomes, as certain pass might
erroneously extract the offset from the data type of %out.
This patch fixes this by enforcing that the return value's data type
aligns
with the input parameter.
This patch adds operand bundle support for `llvm.intr.assume`.
This patch actually contains two parts:
- `llvm.intr.assume` now accepts operand bundle related attributes and
operands. `llvm.intr.assume` does not take constraint on the operand
bundles, but obviously only a few set of operand bundles are meaningful.
I plan to add some of those (e.g. `aligned` and `separate_storage` are
what interest me but other people may be interested in other operand
bundles as well) in future patches.
- The definitions of `llvm.call`, `llvm.invoke`, and
`llvm.call_intrinsic` actually define `op_bundle_tags` as an operation
property. It turns out this approach would introduce some unnecessary
burden if applied equally to the intrinsic operations because properties
are not available through `Operation *` but we have to operate on
`Operation *` during the import/export of intrinsics, so this PR changes
it from a property to an array attribute.
This patch relands commit d8fadad07c.
This patch adds operand bundle support for `llvm.intr.assume`.
This patch actually contains two parts:
- `llvm.intr.assume` now accepts operand bundle related attributes and
operands. `llvm.intr.assume` does not take constraint on the operand
bundles, but obviously only a few set of operand bundles are meaningful.
I plan to add some of those (e.g. `aligned` and `separate_storage` are
what interest me but other people may be interested in other operand
bundles as well) in future patches.
- The definitions of `llvm.call`, `llvm.invoke`, and
`llvm.call_intrinsic` actually define `op_bundle_tags` as an operation
property. It turns out this approach would introduce some unnecessary
burden if applied equally to the intrinsic operations because properties
are not available through `Operation *` but we have to operate on
`Operation *` during the import/export of intrinsics, so this PR changes
it from a property to an array attribute.
The `memref.alloca` lowering computed the allocation size incorrectly
when there were 0 dimensions.
Previously:
```
memref.alloca() : memref<10x0x2xf32>
--> llvm.alloca 20xf32
```
Now:
```
memref.alloca() : memref<10x0x2xf32>
--> llvm.alloca 0xf32
```
From the `llvm.alloca` documentation:
```
Allocating zero bytes is legal, but the returned pointer may not be unique.
```
Remove a TODO in the dialect conversion code base when materializing
unresolved conversions:
```
// FIXME: Determine a suitable insertion location when there are multiple
// inputs.
```
The implementation used to select an insertion point as follows:
- If the cast has exactly one operand: right after the definition of the
SSA value.
- Otherwise: right before the cast op.
However, it is not necessary to change the insertion point. Unresolved
materializations (`UnrealizedConversionCastOp`) are built during
`buildUnresolvedArgumentMaterialization` or
`buildUnresolvedTargetMaterialization`. In the former case, the op is
inserted at the beginning of the block. In the latter case, only one
operand is supported in the dialect conversion, and the op is inserted
right after the definition of the SSA value. I.e., the
`UnrealizedConversionCastOp` is already inserted at the right place and
it is not necessary to change the insertion point for the resolved
materialization op.
Note: The IR change changes slightly because the
`unrealized_conversion_cast` ops at the beginning of a block are no
longer doubly-inverted (by setting the insertion to the beginning of the
block when inserting the `unrealized_conversion_cast` and again when
inserting the resolved conversion op). All affected test cases were
fixed by using `CHECK-DAG` instead of `CHECK`.
Also improve the quality of multiple test cases that did not check for
the correct operands.
Note: This commit is in preparation of decoupling the
argument/source/target materialization logic of the type converter from
the dialect conversion (to reduce its complexity and make that
functionality usable from a new dialect conversion driver).
memref.extract_aligned_pointer_as_index currently does not support
unranked inputs. This lack of support interferes with the folding
operations in the expand-strided-metadata pass.
%r = memref.reinterpret_cast %arg0 to
offset: [0],
sizes: [],
strides: [] : memref<*xf32> to memref<f32>
%i = memref.extract_aligned_pointer_as_index %r : memref<f32> -> index
Patterns like this occur when bufferizing operations on unranked
tensors.
This change modifies the extract_aligned_pointer_as_index operation to
support unranked inputs with corresponding support in the MemRef->LLVM
conversion.
Co-authored-by: Spenser Bauman <sabauma@fastmail>
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
---------
Signed-off-by: Gaurav Shukla<gaurav.shukla@amd.com>
Signed-off-by: Gaurav Shukla <gaurav.shukla@amd.com>
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
This patch generalizes tensor.expand_shape and memref.expand_shape to
consume the output shape as a list of SSA values. This enables us to
implement generic reshape operations with dynamic shapes using
collapse_shape/expand_shape pairs.
The output_shape input to expand_shape follows the static/dynamic
representation that's also used in `tensor.extract_slice`.
Differential Revision: https://reviews.llvm.org/D140821
Co-authored-by: Ramiro Leal-Cavazos <ramiroleal050@gmail.com>
I believe the semantics should be the same, but this saves 1 op and simplifies the code.
For example, the following two instructions:
```
%2 = cmp sgt %0, %1
%3 = select %2, %0, %1
```
Are equivalent to:
```
%2 = maxsi %0 %1
```
Data layout queries may be issued for types whose size exceeds the range
of 32-bit integer as well as for types that don't have a size known at
compile time, such as scalable vectors. Use best practices from LLVM IR
and adopt `llvm::TypeSize` for size-related queries and `uint64_t` for
alignment-related queries.
See #72678.
memref.atomic_rmw will fail to convert for memref types that have an offset because they do not have identity maps. This restriction is overly conservative, so this changes the restriction to only strided memref types.
Fixes#70160
The issue is resolved by:
1. Changing the call to address space conversion to use the correct
return type, preventing the code from moving past the if and into the
crashing optional dereference.
2. Adding handling to the AllocLikeOp rewriter for the case where the
underlying buffer allocation fails.
This revision replaces the LLVM dialect NullOp by the recently
introduced ZeroOp. The ZeroOp is more generic in the sense that it
represents zero values of any LLVM type rather than null pointers only.
This is a follow to https://github.com/llvm/llvm-project/pull/65508
memref.copy gets lowered to a function call sometimes, this function
is passed the element size of the memref in bytes as an argument.
The element size passed to the copyMemRef() function call can be
miscalculated if the LLVM IR uses aligned access to the memory.
This can be fixed by using llvm.getelementptr to calculate the element
size natively. This is also done in the other lowering path that lowers
to an intrinsic.
Fix https://github.com/llvm/llvm-project/issues/64072
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D156126
The lowering pattern to LLVM for memref.transpose has a bug where
instead of transposing from (source) -> (dest) it actually transposes
(dest) -> (source). This patch fixes the bug and updates the test.
Fix https://github.com/llvm/llvm-project/issues/65145
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D159290
There are two motivations for this change:
1. It considerably simplifies adding support for the realloc operation to the
new buffer deallocation pass by lowering the realloc such that no
deallocation operation is inserted and the deallocation pass itself can
insert that dealloc
2. The lowering is expressed on a higher level and thus easier to understand,
and the lowerings of the memref operations it is composed of don't have to
be duplicated in the MemRefToLLVM lowering (also see discussion in
https://reviews.llvm.org/D133424)
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D159430
Also a new pass option `ConvertToLLVMPass` to populate only patterns from the specified dialects. This is needed because the existing test cases expect that only ops from certain dialects are lowered. (E.g., "arith-to-llvm" expects that only "arith" ops are lowered but not "func" ops.)
Differential Revision: https://reviews.llvm.org/D157627
Most `*-to-llvm` conversion patterns require a type converter. This
revision adds a type converter to the
`populateConvertToLLVMConversionPatterns` function and implements the
interface for the MemRef dialect.
Differential Revision: https://reviews.llvm.org/D157387
This commit changes intrinsics that have immarg parameter attributes to
model these parameters as attributes, instead of operands. Using
operands only works if the operation is an `llvm.mlir.constant`,
otherwise the exported LLVMIR is invalid.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D151692
With this change, more `memref.copy` will be lowered to the efficient `memcpy`. For example,
```
memref.copy %subview, %alloc : memref<1x576xf32, strided<[704, 1]>> to memref<1x576xf32>
```
Differential Revision: https://reviews.llvm.org/D150448
This patch pushes the computation of the start address of a memref in one
place (a method in MemRefDescriptor.)
This allows all the (indirect) users of this method to produce the start
address in the same way.
Thanks to this change, we expose more CSEs opportunities and thanks to
that, the backend is able to properly find the `llvm.assume` expression
related to the base address as demonstrated in the added test.
Differential Revision: https://reviews.llvm.org/D148947
`memref.assume_alignment` annotates the alignment of the source buffer
not the base pointer.
Put diffrently, prior to this patch `memref.assume_alignment` would lower
to `llvm.assume %buffer.base.isAligned(X)` whereas what we want is
`llvm.assume (%buffer.base + %buffer.offset).isAligned(X)`.
In other words, we were missing to include the offset in the expression
checked by the `llvm.assume`.
Differential Revision: https://reviews.llvm.org/D148930
This is permitted by the op, but the current lowering generates invalid IR.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D144090
Although specifying an index that is out of bounds for both `memref.dim`
and `tensor.dim` produces an undefined behavior, this is still valid IR.
In particular, we could expose an out of bound index because of some
optimizations, for instance as demonstrated with
https://github.com/llvm/llvm-project/issues/60295, and this shouldn't
cause the compiler to abort.
This patch removes the overzealous verifier checks and properly handles
out of bound indices (as in it doesn't crash the compiler, but still
produces UB).
This fixes https://github.com/llvm/llvm-project/issues/60295.
Note: That `shape.dim` has a similar problem but we're not supposed to
produce UB in this case. Instead we're supposed to propagate an error in
the resulting value and I don't know how to do that at the moment. Hence I
left this part out of the patch.
Differential Revision: https://reviews.llvm.org/D143999
Address space casts are present in common MLIR targets (LLVM, SPIRV).
Some planned rewrites (such as one of the potential fixes to the fact
that the AMDGPU backend requires alloca() to live in address space 5 /
the GPU private memory space) may require such casts to be inserted
into MLIR code, where those address spaces could be represented by
arbitrary memory space attributes.
Therefore, we define memref.memory_space_cast and its lowerings.
Depends on D141293
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D141148
The code for unranked memref descriptors assumed that
sizeof(!llvm.ptr) == lizeof(!llvm.ptr<N>) for all address spaces N.
This is not always true (ex. the AMDGPU compiler backend has
sizeof(!llvm.ptr) = 64 bits but sizeof(!llvm.ptr<5>) = 32 bits, where
address space 5 is used for stack allocations). While this is merely
an overallocation in the case where a non-0 address space has pointers
smaller than the default, the existing code could cause OOB memory
accesses when sizeof(!llvm.ptr<N>) > sizeof(!llvm.ptr).
So, add an address spaces parameter to computeSizes in order to
partially resolve this class of bugs. Note that the LLVM data layout
in the conversion passes is currently set to "" and not constructed
from the MLIR data layout or some other source, but this could change
in the future.
Depends on D142159
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D141293
Remapping memory spaces is a function often needed in type
conversions, most often when going to LLVM or to/from SPIR-V (a future
commit), and it is possible that such remappings may become more
common in the future as dialects take advantage of the more generic
memory space infrastructure.
Currently, memory space remappings are handled by running a
special-purpose conversion pass before the main conversion that
changes the address space attributes. In this commit, this approach is
replaced by adding a notion of type attribute conversions
TypeConverter, which is then used to convert memory space attributes.
Then, we use this infrastructure throughout the *ToLLVM conversions.
This has the advantage of loosing the requirements on the inputs to
those passes from "all address spaces must be integers" to "all
memory spaces must be convertible to integer spaces", a looser
requirement that reduces the coupling between portions of MLIR.
ON top of that, this change leads to the removal of most of the calls
to getMemorySpaceAsInt(), bringing us closer to removing it.
(A rework of the SPIR-V conversions to use this new system will be in
a folowup commit.)
As a note, one long-term motivation for this change is that I would
eventually like to add an allocaMemorySpace key to MLIR data layouts
and then call getMemRefAddressSpace(allocaMemorySpace) in the
relevant *ToLLVM in order to ensure all alloca()s, whether incoming or
produces during the LLVM lowering, have the correct address space for
a given target.
I expect that the type attribute conversion system may be useful in
other contexts.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D142159
This is the first patch in a series of patches part of this RFC: https://discourse.llvm.org/t/rfc-switching-the-llvm-dialect-and-dialect-lowerings-to-opaque-pointers/68179
This patch adds the ability to lower the memref dialect to the LLVM Dialect with the use of opaque pointers instead of typed pointers. The latter are being phased out of LLVM and this patch is part of an effort to phase them out of MLIR as well. To do this, we'll need to support both typed and opaque pointers in lowering passes, to allow downstream projects to change without breakage.
The gist of changes required to change a conversion pass are:
* Change any `LLVM::LLVMPointerType::get` calls to NOT use an element type if opaque pointers are to be used.
* Use the `build` method of `llvm.load` with the explicit result type. Since the pointer does not have an element type anymore it has to be specified explicitly.
* Use the `build` method of `llvm.getelementptr` with the explicit `basePtrType`. Ditto to above, we have to now specify what the element type is so that GEP can do its indexing calculations
* Use the `build` method of `llvm.alloca` with the explicit `elementType`. Ditto to the above, alloca needs to know how many bytes to allocate through the element type.
* Get rid of any `llvm.bitcast`s
* Adapt the tests to the above. Note that `llvm.store` changes syntax as well when using opaque pointers
I'd like to note that the 3 `build` method changes work for both opaque and typed pointers, so unconditionally using the explicit element type form is always correct.
For the testsuite a practical approach suggested by @ftynse was taken: I created a separate test file for testing the typed pointer lowering of Ops. This mostly comes down to checking that bitcasts have been created at the appropiate places, since these are required for typed pointer support.
Differential Revision: https://reviews.llvm.org/D143268
alloc uses either `malloc` or a plugable allocation function for allocating the required memory. Both of these functions always return a `llvm.ptr<i8>`, aka a pointer in the default address space. When allocating for a memref in a different memory space however, no address space cast is created, leading to invalid LLVM IR being generated.
This is currently not caught by the verifier since the pointer to the memory is always bitcast which currently lacks a verifier disallowing address space casts. Translating to actual LLVM IR would cause the verifier to go off, since bitcast cannot translate from one address space to another: https://godbolt.org/z/3a1z97rc9
This patch fixes that issue by generating an address space cast if the address space of the allocation function does not match the address space of the resulting memref.
Not sure whether this is actually a real life problem. I found this issue while converting the pass to using opaque pointers which gets rid of all the bitcasts and hence caused type errors without the address space cast.
Differential Revision: https://reviews.llvm.org/D143341
`llvm.load` op has nonTemporal field which is missing for `memref.load` and `memref.store`. This revision first adds nonTemporal field to memref's load/store op, then it lowers the field to llvm.load/store ops.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D142616
This revision adapts the printers and parsers of the LLVM Dialect
AtomicRMWOp, AtomicCmpXchgOp, CallOp, and InvokeOp to support both
opaque and typed pointers by printing the pointer types explicitly.
Previously, the printers and parser of these operations silently assumed
typed pointers. This assumption is problematic if a lowering or the
LLVM IR import produce LLVM Dialect with opaque pointers and the IR is
then printed and parsed, for example, when running mlir-translate. In
LLVM IR itself all tests with typed pointers are already gone. It is
thus important to start switching to opaque pointers.
This revision can be seen as a preparation step for the switch of the
LLVM Dialect to opaque pointers. Once printing and parsing works
seamlessly, all lowerings to LLVM Dialect can be switched to produce
opaque pointers. After a transition period, LLVM Dialect itself can by
simplified to support opaque pointers only.
Reviewed By: ftynse, Dinistro
Differential Revision: https://reviews.llvm.org/D142884
Since the recent MemRef refactoring that centralizes the lowering of
complex MemRef operations outside of the conversion framework, the
MemRefToLLVM pass doesn't directly convert these complex operations.
Instead, to fully convert the whole MemRef dialect space, MemRefToLLVM
needs to run after `expand-strided-metadata`.
Make this more obvious by changing the name of the pass and the option
associated with it from `convert-memref-to-llvm` to
`finalize-memref-to-llvm`.
The word "finalize" conveys that this pass needs to run after something
else and that something else is documented in its tablegen description.
This is a follow-up patch related to the conversation at:
https://discourse.llvm.org/t/psa-you-need-to-run-expand-strided-metadata-before-memref-to-llvm-now/66956/14
Differential Revision: https://reviews.llvm.org/D142463
collapse/expand_shape are supposed to be expanded before we hit the
lowering code.
The expansion is done with the pass called expand-strided-metadata.
This patch is NFC in spirit but not in practice because
expand-strided-metadata won't try to accomodate for "invalid" strides
for dynamic sizes that are 1 at runtime.
The previous code was broken in that respect too, but differently: it
handled only the case of row-major layouts.
That whole part is being reworked separately.
Differential Revision: https://reviews.llvm.org/D136483
This reverts commit d0650d1089.
Original commit message:
Subviews are supposed to be expanded before we hit the lowering
code.
The expansion is done with the pass called
expand-strided-metadata.
Add a test that demonstrate how these passes can be linked up to achieve
the desired lowering.
This patch is NFC in spirit but not in practice because `subview` gets
lowered into `reinterpret_cast(extract_strided_metadata, <some math>)`
which lowers in two memref descriptors (one for `reinterpert_cast` and
one for `extract_strided_metadata`), which creates some noise of the
form: `extractvalue(unrealized_cast(extractvalue[0]))[0]` that is
currently not simplified within MLIR but that is really just noop in
that case.
Differential Revision: https://reviews.llvm.org/D136377
This reverts commit c8e15afa4c.
This breaks some integration tests, see
https://lab.llvm.org/buildbot/#/builders/220/builds/10446
I have to update a bunch of RUN lines in the tests to use the new
lowering scheme. Nothing complicated but let's keep the build clean
while I'm fixing that.
Subviews are supposed to be expanded before we hit the lowering
code.
The expansion is done with the pass called
expand-strided-metadata.
Add a test that demonstrate how these passes can be linked up to achieve
the desired lowering.
This patch is NFC in spirit but not in practice because `subview` gets
lowered into `reinterpret_cast(extract_strided_metadata, <some math>)`
which lowers in two memref descriptors (one for `reinterpert_cast` and
one for `extract_strided_metadata`), which creates some noise of the
form: `extractvalue(unrealized_cast(extractvalue[0]))[0]` that is
currently not simplified within MLIR but that is really just noop in
that case.
Differential Revision: https://reviews.llvm.org/D136377
This is generated by running
```
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.td
sed --in-place 's/[[:space:]]\+$//' mlir/**/*.mlir
```
Reviewed By: rriddle, dcaballe
Differential Revision: https://reviews.llvm.org/D138866
The first result of the extract_strided_metadata operation is a MemRef,
not a naked pointer.
This patch fixes the lowering of this operation in MemRefToLLVM so that
we properly materialize the full MemRef structure and not just the base,
naked, pointer.
Differential Revision: https://reviews.llvm.org/D137364
In D134622 the printed form of a pass manager is changed to include the
name of the op that the pass manager is anchored on. This updates the
`-pass-pipeline` argument format to include the anchor op as well, so
that the printed form of a pipeline can be directly passed to
`-pass-pipeline`. In most cases this requires updating
`-pass-pipeline='pipeline'` to
`-pass-pipeline='builtin.module(pipeline)'`.
This also fixes an outdated assert that prevented running a
`PassManager` anchored on `'any'`.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D134900
The `extract_strided_metadata` operation literally breaks down a memory
descriptor into its different components.
Teach the MemRefToLLVM conversion framework this fact.
Differential Revision: https://reviews.llvm.org/D136304
The MemRef to LLVM conversion pass emits `llvm.alloca` operations to promote MemRef descriptors to the stack when lowering `memref.copy` operations for operands which do not have a contiguous layout in memory. The original stack position is never restored after the allocations, which creates an issue when the copy operation is embedded into a loop with a high trip count, ultimately resulting in a segmentation fault due to the stack growing too large.
Below is as a minimal example illustrating the issue:
```
module {
func.func @main() {
%arg0 = memref.alloc() : memref<32x64xi64>
%arg1 = memref.alloc() : memref<16x32xi64>
%lb = arith.constant 0 : index
%ub = arith.constant 100000 : index
%step = arith.constant 1 : index
%slice = memref.subview %arg0[16,32][16,32][1,1] :
memref<32x64xi64> to memref<16x32xi64, #map>
scf.for %i = %lb to %ub step %step {
memref.copy %slice, %arg1 :
memref<16x32xi64, #map> to memref<16x32xi64>
}
return
}
}
```
When running the code above, e.g., with mlir-cpu-runner, the execution crashes with a segmentation fault:
```
$ mlir-opt \
--convert-scf-to-cf \
--convert-memref-to-llvm \
--convert-func-to-llvm
--convert-cf-to-llvm \
--reconcile-unrealized-casts <file> | \
mlir-cpu-runner \
-e main -entry-point-result=void \
--shared-libs=$PWD/build/lib/libmlir_c_runner_utils.so
[...]
Segmentation fault
```
This patch causes the code lowering a `memref.copy` operation in the MemRefToLLVM pass to emit a pair of matching `llvm.intr.stacksave` and `llvm.intr.stackrestore` operations around the promotion of memory descriptors and the subsequent call to `memrefCopy` in order to restore the stack to its original position after the call.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D135756
One constant generated in MemRefToLLVM had a hardcoded bitwidth of
64 bits. The fix uses the typeConverter to create a constant that
matches the bitwidth of the provided by the data layout. The issue was
detected in an attempt to add a verifier to the LLVM ICmp operation that
checks that the types of the compared arguments match.
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D135775