This revision adds support for empty tensor elimination to
"bufferization.materialize_in_destination" by implementing the
`SubsetInsertionOpInterface`.
Furthermore, the One-Shot Bufferize conflict detection is improved for
"bufferization.materialize_in_destination".
Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.
Increase resource alignment in test to 32 bytes.
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
Since ownership based buffer deallocation requires a few passes to be run in a somewhat fixed sequence, it makes sense to have a pipeline for convenience (and to reduce the number of transform ops to represent default deallocation).
This new interface allows operations to implement custom handling of ownership values and insertion of dealloc operations which is useful when an op cannot implement the interfaces supported by default by the buffer deallocation pass (e.g., because they are not exactly compatible or because there are some additional semantics to it that would render the default implementations in buffer deallocation invalid, or because no interfaces exist for this
kind of behavior and it's not worth introducing one plus a default implementation in buffer deallocation). Additionally, it can also be used to provide more efficient handling for a specific op than the interface based default
implementations can.
Add a new Buffer Deallocation pass with the intend to replace the old
one. For now it is added as a separate pass alongside in order to allow
downstream users to migrate over gradually. This new pass has the goal
of inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.
This commit generalizes the special
tensor.extract_slice/tensor.insert_slice bufferization rules to tensor
subset ops.
Ops that insert a tensor into a tensor at a specified subset (e.g.,
tensor.insert_slice, tensor.scatter) can implement the
`SubsetInsertionOpInterface`.
Apart from adding a new op interface (extending the API), this change is
NFC. The only ops that currently implement the new interface are
tensor.insert_slice and tensor.parallel_insert_slice, and those ops were
are supported by One-Shot Bufferize.
Since buffer deallocation requires a few passes to be run in a somewhat fixed
sequence, it makes sense to have a pipeline for convenience (and to reduce the
number of transform ops to represent default deallocation).
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D159432
This new interface allows operations to implement custom handling of ownership
values and insertion of dealloc operations which is useful when an op cannot
implement the interfaces supported by default by the buffer deallocation pass
(e.g., because they are not exactly compatible or because there are some
additional semantics to it that would render the default implementations in
buffer deallocation invalid, or because no interfaces exist for this kind of
behavior and it's not worth introducing one plus a default implementation in
buffer deallocation). Additionally, it can also be used to provide more
efficient handling for a specific op than the interface based default
implementations can.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158756
Add a new Buffer Deallocation pass replacing the old one with the goal of
inserting fewer clone operations and supporting additional use-cases.
Please refer to the Buffer Deallocation section in the updated
Bufferization.md file for more information on how this new pass works.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D158421
The OpenACC standard specifies an `atomic` construct in section 2.12 (of
3.3 spec), used to ensure that a specific location is accessed or
updated atomically. Four different clauses are allowed: `read`, `write`,
`update`, or `capture`. If no clause appears, it is as if `update` is
used.
The OpenMP specification defines the same clauses for `omp atomic`. The
types of expression and the clauses in the OpenACC spec match the OpenMP
spec exactly. The main difference is that the OpenMP specification is a
superset - it includes clauses for `hint` and `memory order`. It also
allows conditional expression statements. But otherwise, the expression
definition matches.
Thus, for OpenACC, we refactor and reuse the OpenMP implementation as
follows:
* The atomic operations are duplicated in OpenACC dialect. This is
preferable so that each language's semantics are precisely represented
even if specs have divergence.
* However, since semantics overlap, a common interface between the
atomic operations is being added. The semantics for the interfaces are
not generic enough to be used outside of OpenACC and OpenMP, and thus
new folders were added to hold common pieces of the two dialects.
* The atomic interfaces define common accessors (such as getting `x` or
`v`) which match the OpenMP and OpenACC specs. It also adds common
verifiers intended to be called by each dialect's operation verifier.
* The OpenMP write operation was updated to use `x` and `expr` to be
consistent with its other operations (that use naming based on spec).
The frontend lowering necessary to generate the dialect can also be
reused. This will be done in a follow up change.
`type_traits` and `utility` refer to each other in their
implementations. Also `type_traits` starts to become too big to be
manageable. This PR splits each function into individual files. FTR this
is [how libcxx handles large headers as
well](https://github.com/llvm/llvm-project/tree/main/libcxx/include/__type_traits).
The reland adds two missing functions : is_destructible_v and is_reference_v
`type_traits` and `utility` refer to each other in their
implementations. Also `type_traits` starts to become too big to be
manageable. This PR splits each function into individual files. FTR this
is [how libcxx handles large headers as
well](https://github.com/llvm/llvm-project/tree/main/libcxx/include/__type_traits).
There are two motivations for this change:
1. It considerably simplifies adding support for the realloc operation to the
new buffer deallocation pass by lowering the realloc such that no
deallocation operation is inserted and the deallocation pass itself can
insert that dealloc
2. The lowering is expressed on a higher level and thus easier to understand,
and the lowerings of the memref operations it is composed of don't have to
be duplicated in the MemRefToLLVM lowering (also see discussion in
https://reviews.llvm.org/D133424)
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D159430
Incorporated two header files directly into other since
other parts were used (and it makes it hard to find the
definitions). Removed TODOs that are less likely to be done.
Reviewed By: yinying-lisa-li
Differential Revision: https://reviews.llvm.org/D159381
Incorporated two header files directly into other since
other parts were used (and it makes it hard to find the
definitions). Removed TODOs that are less likely to be done.
Reviewed By: Peiming
Differential Revision: https://reviews.llvm.org/D159330