This partially reverts #66380. The assertion that the underlying buffer
of an EncodingReader is aligned to any required alignments for resource
sections. Resources know their own alignment and pad their buffers
accordingly, but the bytecode reader doesn't know that ahead of time.
Consequently, it cannot give the resource EncodingReader a base buffer
aligned to the maximum required alignment.
A simple example from the test fails without this:
```mlir
module @TestDialectResources attributes {
bytecode.test = dense_resource<resource> : tensor<4xi32>
} {}
{-#
dialect_resources: {
builtin: {
resource: "0x2000000001000000020000000300000004000000",
resource_2: "0x2000000001000000020000000300000004000000"
}
}
```
Before this change, the `ByteCode` test failed on CentOS 7 with
devtoolset-9, because strings happen to be only 8 byte aligned. In
general though, strings have no alignment guarantee.
Increase resource alignment in test to 32 bytes.
Adjust test to sufficiently align buffer.
Add test to check error when buffer is insufficiently aligned.
Leaving this field unitialized could led to crashes when it'll diverge from the
IRNumbering phase.
Differential Revision: https://reviews.llvm.org/D156965
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.
Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.
Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D153383
[mlir] Expose a mechanism to provide a callback for encoding types and attributes in MLIR bytecode.
Two callbacks are exposed, respectively, to the BytecodeWriterConfig and to the ParserConfig. At bytecode parsing/printing, clients have the ability to specify a callback to be used to optionally read/write the encoding. On failure, fallback path will execute the default parsers and printers for the dialect.
Testing shows how to leverage this functionality to support back-deployment and backward-compatibility usecases when roundtripping to bytecode a client dialect with type/attributes dependencies on upstream.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D153383
Zero region operations return true for both isBeforeAllRegions and
isAfterAllRegions when using WalkStage. The bytecode walk only
expects region holding operations in the after regions path, so
guard against that.
We currently only support lazy loading for regions that
statically implement the IsolatedFromAbove trait, but that
limits the amount of operations that can be lazily loaded. This review
lifts that restriction by computing which operations have isolated
regions when numbering, allowing any operation to be lazily loaded
as long as it doesn't use values defined above.
Differential Revision: https://reviews.llvm.org/D156199
We currently encode each region as a separate section, but
the reader expects all of the regions to be in the same section.
This updates the writer to match the behavior that the reader
expects.
Differential Revision: https://reviews.llvm.org/D156198
The operand_segment_sizes and result_segment_sizes Attributes are now inlined
in the operation as native propertie. We continue to support building an
Attribute on the fly for `getAttr("operand_segment_sizes")` and setting the
property from an attribute with `setAttr("operand_segment_sizes", attr)`.
A new bytecode version is introduced to support backward compatibility and
backdeployments.
Differential Revision: https://reviews.llvm.org/D155919
The operand_segment_sizes and result_segment_sizes Attributes are now inlined
in the operation as native propertie. We continue to support building an
Attribute on the fly for `getAttr("operand_segment_sizes")` and setting the
property from an attribute with `setAttr("operand_segment_sizes", attr)`.
A new bytecode version is introduced to support backward compatibility and
backdeployments.
Differential Revision: https://reviews.llvm.org/D155919
At the moment, only the trailing dimensions in the vector type can be
scalable, i.e. this is supported:
vector<2x[4]xf32>
and this is not allowed:
vector<[2]x4xf32>
This patch extends the vector type so that arbitrary dimensions can be
scalable. To this end, an array of bool values is added to every vector
type to denote whether the corresponding dimensions are scalable or not.
For example, for this vector:
vector<[2]x[3]x4xf32>
the following array would be created:
{true, true, false}.
Additionally, the current syntax:
vector<[2x3]x4xf32>
is replaced with:
vector<[2]x[3]x4xf32>
This is primarily to simplify parsing (this way, the parser can easily
process one dimension at a time rather than e.g. tracking whether
"scalable block" has been entered/left).
NOTE: The `isScalableDim` parameter of `VectorType` (introduced in this
patch) makes `numScalableDims` redundant. For the time being,
`numScalableDims` is preserved to facilitate the transition between the
two parameters. `numScalableDims` will be removed in one of the
subsequent patches.
This change is a part of a larger effort to enable scalable
vectorisation in Linalg. See this RFC for more context:
* https://discourse.llvm.org/t/rfc-scalable-vectorisation-in-linalg/
Differential Revision: https://reviews.llvm.org/D153372
The bytecode reader currently assumes all regions can be lazy
loaded, which breaks reading any non-isolated region. This patch
fixes that by properly handling nested non-lazy regions, and only
considers isolated regions as lazy.
Differential Revision: https://reviews.llvm.org/D153795
This makes the bytecode reader/writer work on big-endian platforms.
The only problem was related to encoding of multi-byte integers,
where both reader and writer code make implicit assumptions about
endianness of the host platform.
This fixes the current test failures on s390x, and in addition allows
to remove the UNSUPPORTED markers from all other bytecode-related
test cases - they now also all pass on s390x.
Also adding a GFAIL_SKIP to the MultiModuleWithResource unit test,
as this still fails due to an unrelated endian bug regarding
decoding of external resources.
Differential Revision: https://reviews.llvm.org/D153567
Reviewed By: mehdi_amini, jpienaar, rriddle
Currently desired bytecode version is clamped to the maximum. This allows requesting bytecode versions that do not exist. We have added callsite validation for this in StableHLO to ensure we don't pass an invalid version number, probably better if this is managed upstream. If a user wants to use the current version, then omitting `setDesiredBytecodeVersion` is the best way to do that (as opposed to providing a large number).
Adding this check will also properly error on older version numbers as we increment the minimum supported version. Silently claming on minimum version would likely lead to unintentional forward incompatibilities.
Separately, due to bytecode version being `int64_t` and using methods to read/write uints, we can generate payloads with invalid version numbers:
```
mlir-opt file.mlir --emit-bytecode --emit-bytecode-version=-1 | mlir-opt
<stdin>:0:0: error: bytecode version 18446744073709551615 is newer than the current version 5
```
This is fixed with version bounds checking as well.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D151838
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1007:39: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
if (failed(propReader.parseVarInt(count)))
^~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:191:39: note: passing argument to parameter 'result' here
LogicalResult parseVarInt(uint64_t &result) {
^
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1020:44: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
if (failed(offsetsReader.parseVarInt(dataSize)) ||
^~~~~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:191:39: note: passing argument to parameter 'result' here
LogicalResult parseVarInt(uint64_t &result) {
^
2 errors generated.
This is adding a new interface (`BytecodeOpInterface`) to allow operations to
opt-in skipping conversion to attribute and serializing properties to native
bytecode.
The scheme relies on a new section where properties are stored in sequence
{ size, serialize_properties }, ...
The operations are storing the index of a properties, a table of offset is
built when loading the properties section the first time.
This is a re-commit of 837d1ce0dc which conflicted with another patch upgrading
the bytecode and the collision wasn't properly resolved before.
Differential Revision: https://reviews.llvm.org/D151065
This reverts commit ca5a12fd69
and follow-up fixes:
df34c288c407dc906883ab80ad0095837d1ce0dc
The first commit was incomplete and broken, I'll prepare a new version
later, in the meantime pull this work out of tree.
This was an oversight in the development of bytecode version 5, which was
caught by downstream StableHLO compatibility tests.
Differential revision: https://reviews.llvm.org/D151531
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1007:39: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
if (failed(propReader.parseVarInt(count)))
^~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:191:39: note: passing argument to parameter 'result' here
LogicalResult parseVarInt(uint64_t &result) {
^
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1033:41: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
if (failed(dialectReader.readVarInt(propertiesIdx)))
^~~~~~~~~~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:926:38: note: passing argument to parameter 'result' here
LogicalResult readVarInt(uint64_t &result) override {
^
2 errors generated.
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:1033:41: error: non-const lvalue reference to type 'uint64_t' (aka 'unsigned long long') cannot bind to a value of unrelated type 'size_t' (aka 'unsigned long')
if (failed(dialectReader.readVarInt(propertiesIdx)))
^~~~~~~~~~~~~
/Users/jiefu/llvm-project/mlir/lib/Bytecode/Reader/BytecodeReader.cpp:926:38: note: passing argument to parameter 'result' here
LogicalResult readVarInt(uint64_t &result) override {
^
1 error generated.
This is adding a new interface (`BytecodeOpInterface`) to allow operations to
opt-in skipping conversion to attribute and serializing properties to native
bytecode.
The scheme relies on a new section where properties are stored in sequence
{ size, serialize_properties }, ...
The operations are storing the index of a properties, a table of offset is
built when loading the properties section the first time.
Back-deployment to version prior to 4 are relying on getAttrDictionnary() which
we intend to deprecate and remove: that is putting a de-factor end-of-support
horizon for supporting deployments to version older than 4.
Differential Revision: https://reviews.llvm.org/D151065
For block arg locs a common case is no/uknown location (where the producer
signifies they don't care about blockarg location). Also avoid needing to
dynamically resize opnames during parsing.
Assumed to be post lazy loading change, so chose version 3.
Differential Revision: https://reviews.llvm.org/D151038
The bytecode reader didn't handle properly the case where resource names
conflicted and were renamed, leading to orphan handles in the IR as well
as overwriting the exiting resources.
Differential Revision: https://reviews.llvm.org/D151408
At the moment we accept (in tests) unregistered dialects and in particular:
"new_processor_id_and_range"()
where there is no `.` separator. We probably will remove support for this
from the parser, but for now we're adding compatibility support in the
reader.
Differential Revision: https://reviews.llvm.org/D151386
This patch implements a mechanism to read/write use-list orders from/to the mlir bytecode format. When producing bytecode, use-list orders are appended to each value of the IR. When reading bytecode, use-lists orders are loaded in memory and used at the end of parsing to sort the existing use-list chains.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D149755
IsolatedRegions are emitted in sections in order for the reader to be
able to skip over them. A new class is exposed to manage the state and
allow the readers to load these IsolatedRegions on-demand.
Differential Revision: https://reviews.llvm.org/D149515
The MLIR classes Type/Attribute/Operation/Op/Value support
cast/dyn_cast/isa/dyn_cast_or_null functionality through llvm's doCast
functionality in addition to defining methods with the same name.
This change begins the migration of uses of the method to the
corresponding function call as has been decided as more consistent.
Note that there still exist classes that only define methods directly,
such as AffineExpr, and this does not include work currently to support
a functional cast/isa call.
Caveats include:
- This clang-tidy script probably has more problems.
- This only touches C++ code, so nothing that is being generated.
Context:
- https://mlir.llvm.org/deprecation/ at "Use the free function variants
for dyn_cast/cast/isa/…"
- Original discussion at https://discourse.llvm.org/t/preferred-casting-style-going-forward/68443
Implementation:
This first patch was created with the following steps. The intention is
to only do automated changes at first, so I waste less time if it's
reverted, and so the first mass change is more clear as an example to
other teams that will need to follow similar steps.
Steps are described per line, as comments are removed by git:
0. Retrieve the change from the following to build clang-tidy with an
additional check:
https://github.com/llvm/llvm-project/compare/main...tpopp:llvm-project:tidy-cast-check
1. Build clang-tidy
2. Run clang-tidy over your entire codebase while disabling all checks
and enabling the one relevant one. Run on all header files also.
3. Delete .inc files that were also modified, so the next build rebuilds
them to a pure state.
4. Some changes have been deleted for the following reasons:
- Some files had a variable also named cast
- Some files had not included a header file that defines the cast
functions
- Some files are definitions of the classes that have the casting
methods, so the code still refers to the method instead of the
function without adding a prefix or removing the method declaration
at the same time.
```
ninja -C $BUILD_DIR clang-tidy
run-clang-tidy -clang-tidy-binary=$BUILD_DIR/bin/clang-tidy -checks='-*,misc-cast-functions'\
-header-filter=mlir/ mlir/* -fix
rm -rf $BUILD_DIR/tools/mlir/**/*.inc
git restore mlir/lib/IR mlir/lib/Dialect/DLTI/DLTI.cpp\
mlir/lib/Dialect/Complex/IR/ComplexDialect.cpp\
mlir/lib/**/IR/\
mlir/lib/Dialect/SparseTensor/Transforms/SparseVectorization.cpp\
mlir/lib/Dialect/Vector/Transforms/LowerVectorMultiReduction.cpp\
mlir/test/lib/Dialect/Test/TestTypes.cpp\
mlir/test/lib/Dialect/Transform/TestTransformDialectExtension.cpp\
mlir/test/lib/Dialect/Test/TestAttributes.cpp\
mlir/unittests/TableGen/EnumsGenTest.cpp\
mlir/test/python/lib/PythonTestCAPI.cpp\
mlir/include/mlir/IR/
```
Differential Revision: https://reviews.llvm.org/D150123
We were querying the wrong EncReader along some paths that resulted in
failures depending on if one encountered an Attribute from an unloaded
dialect before encountering an operation from that dialect.
Also fix error where we were able to emit "custom" form for an attribute
without custom form in TestDialect.
Differential Revision: https://reviews.llvm.org/D150260
Can't return a well-formed IR output while enabling version to be bumped
up during emission. Previously it would return min version but
potentially invalid IR which was confusing, instead make it return
error and abort immediately instead.
Differential Revision: https://reviews.llvm.org/D149569
Add method to set a desired bytecode file format to generate. Change
write method to be able to return status including the minimum bytecode
version needed by reader. This enables generating an older version of
the bytecode (not dialect ops, attributes or types). But this does not
guarantee that an older version can always be generated, e.g., if a
dialect uses a new encoding only available at later bytecode version.
This clamps setting to at most current version.
Differential Revision: https://reviews.llvm.org/D146555
Replace references to enumerate results with either result_pairs
(reference wrapper type) or structured bindings. I did not use
structured bindings everywhere as it wasn't clear to me it would
improve readability.
This is in preparation to the switch to zip semantics which won't
support non-const lvalue reference to elements:
https://reviews.llvm.org/D144503.
I chose to use values instead of const lvalue-refs because MLIR is
biased towards avoiding `const` local variables. This won't degrade
performance because currently `result_pair` is cheap to copy (size_t
+ iterator), and in the future, the enumerator iterator dereference
will return temporaries anyway.
Reviewed By: dblaikie
Differential Revision: https://reviews.llvm.org/D146006
A dialect can opt-in to handle versioning through the
`BytecodeDialectInterface`. Few hooks are exposed to the dialect to allow
managing a version encoded into the bytecode file. The version is loaded
lazily and allows to retrieve the version information while parsing the input
IR, and gives an opportunity to each dialect for which a version is present
to perform IR upgrades post-parsing through the `upgradeFromVersion` method.
Custom Attribute and Type encodings can also be upgraded according to the
dialect version using readAttribute and readType methods.
There is no restriction on what kind of information a dialect is allowed to
encode to model its versioning. Currently, versioning is supported only for
bytecode formats.
Reviewed By: rriddle, mehdi_amini
Differential Revision: https://reviews.llvm.org/D143647
`parseAttribute` and `parseType` require null-terminated strings as
input, but this isn't great considering the argument type is
`StringRef`. This changes them to copy to a null-terminated buffer by
default, with a `isKnownNullTerminated` flag added to disable the
copying.
closes#58964
Reviewed By: rriddle, kuhar, lattner
Differential Revision: https://reviews.llvm.org/D145182
Currently these functions report errors directly to stderr, this updates
them to use diagnostics instead. This also makes partially-consumed
strings an error if the `numRead` parameter isn't provided (the
docstrings already claimed this happened, but it didn't.)
While here I also tried to reduce the number of overloads by switching
to using default parameters.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D144804
Check if rhs is the dialect to be ordered first, ensuring that
we don't inadvertantly order something before it by
falling back to pure number comparison.
This only shows up depending on the implementation of
stable_sort. This was hit in a build of MSVC that was
checking for strict ordering.
This patch drops the ZeroBehavior parameter from bit counting
functions like countLeadingZeros. ZeroBehavior specifies the behavior
when the input to count{Leading,Trailing}Zeros is zero and when the
input to count{Leading,Trailing}Ones is all ones.
ZeroBehavior was first introduced on May 24, 2013 in commit
eb91eac9fb. While that patch did not
state the intention, I would guess ZeroBehavior was for performance
reasons. The x86 machines around that time required a conditional
branch to implement countLeadingZero<uint32_t> that returns the 32 on
zero:
test edi, edi
je .LBB0_2
bsr eax, edi
xor eax, 31
.LBB1_2:
mov eax, 32
That is, we can remove the conditional branch if we don't care about
the behavior on zero.
IIUC, Intel's Haswell architecture, launched on June 4, 2013,
introduced several bit manipulation instructions, including lzcnt and
tzcnt, which eliminated the need for the conditional branch.
I think it's time to retire ZeroBehavior as its utility is very
limited. If you care about compilation speed, you should build LLVM
with an appropriate -march= to take advantage of lzcnt and tzcnt.
Even if not, modern host compilers should be able to optimize away
quite a few conditional branches because the input is often known to
be nonzero from dominating conditional branches.
Differential Revision: https://reviews.llvm.org/D141798
Similar to how `makeArrayRef` is deprecated in favor of deduction guides, do the
same for `makeMutableArrayRef`.
Once all of the places in-tree are using the deduction guides for
`MutableArrayRef`, we can mark `makeMutableArrayRef` as deprecated.
Differential Revision: https://reviews.llvm.org/D141814
The bytecode reader currently has no mechanism that allows for directly referencing
data from the input buffer safely. This commit adds shared_ptr<SourceMgr> overloads
that provide an explicit and safe way of extending the lifetime of the input. The usage of
these new overloads is adopted in all of our tooling, and is implicitly used in the filename
only parser methods.
Differential Revision: https://reviews.llvm.org/D139366
This addresses the TODO in the code previously and checks that the address of `dataIt` is properly aligned to the requested alignment.
Reviewed By: rriddle
Differential Revision: https://reviews.llvm.org/D137855