Adds two new CMake functions to query the host system:
* `check_hwcap`,
* `check_emulator`.
Together, these functions are used to check whether a given set of MLIR
integration tests require an emulator. If yes, then the corresponding
CMake var that defies the required emulator executable is also checked.
`check_hwcap` relies on ELF_HWCAP for discovering CPU features from
userspace on Linux systems. This is the recommended approach for Arm
CPUs running on Linux as outlined in this blog post:
* https://community.arm.com/arm-community-blogs/b/operating-systems-blog/posts/runtime-detection-of-cpu-features-on-an-armv8-a-cpu
Other operating systems (e.g. Android) and CPU architectures will
most likely require some other approach. Right now these new hooks are
only used for SVE and SME integration tests.
This relands #86489 with the following changes:
* Replaced:
`set(hwcap_test_file ${CMAKE_BINARY_DIR}/${CMAKE_FILES_DIRECTORY}/hwcap_check.c)`
with:
`set(hwcap_test_file ${CMAKE_BINARY_DIR}/temp/hwcap_check.c)`
The former would trigger an infinite loop when running `ninja`
(after the initial CMake configuration).
* Fixed commit msg. Previous one was taken from the initial GH PR
commit rather than the final re-worked solution (missed this when
merging via GH UI).
* A couple more NFCs/tweaks.
References to headings need to be preceded with a slash. Also,
references to headings on the same page do not need to contain the name
of the document (omitting the document name means if the name changes
the links will still be valid).
I double checked the links by building [the
website](https://github.com/llvm/mlir-www):
```shell
./mlir-www-helper.sh --install-docs ../llvm-project website
cd website && hugo serve
```
Integration tests for ArmSME require an emulator (there's no hardware
available). Make sure that CMake complains if `MLIR_RUN_ARM_SME_TESTS`
is set while `ARM_EMULATOR_EXECUTABLE` is empty.
I'm also adding a note in the docs for future reference.
- Fixed OpenACC's spec link format
- Add missed `OpenACCPasses.md` into Passes.md
- Add missed `MyExtensionCh4.md` into Ch4.md of tutorial of transform
As part of the renaming the Standard dialect to Func dialect, *support*
for the `func.constant` operation was added to the emitter. However, the
emitter cannot emit function types. Hence the emission for a snippet
like
```
%0 = func.constant @myfn : (f32) -> f32
func.func private @myfn(%arg0: f32) -> f32 {
return %arg0 : f32
}
```
failes with `func.mlir:1:6: error: cannot emit type '(f32) -> f32'`.
This removes `func.constant` from the emitter.
* Split out `MeshDialect.h` form `MeshOps.h` that defines the dialect
class. Reduces include clutter if you care only about the dialect and
not the ops.
* Expose functions `getMesh` and `collectiveProcessGroupSize`. There
functions are useful for outside users of the dialect.
* Remove unused code.
* Remove examples and tests of mesh.shard attribute in tensor encoding.
Per the decision that Spmdization would be performed on sharding
annotations and there will be no tensors with sharding specified in the
type.
For more info see this RFC comment:
https://discourse.llvm.org/t/rfc-sharding-framework-design-for-device-mesh/73533/81
Introduce a new extension for simple print-debugging of the transform
dialect scripts. The initial version of this extension consists of two
ops that are printing the payload objects associated with transform
dialect values. Similar ops were already available in the test extenion
and several downstream projects, and were extensively used for testing.
After PR#75548, the OpenACC documentation on the MLIR website has a few
issues. This change corrects them:
- Renames OpenACC.md to OpenACCDialect.md so that links remain
unchanged. In its current state, the links to
https://mlir.llvm.org/docs/Dialects/OpenACCDialect/ no longer work.
- Since the old OpenACCDialect.md (the one with operation definitions)
is being included in the new file, rename the old file to prevent name
ambiguity.
- A header is needed in the .md file, otherwise the index on website is
not properly created.
- Add a new section before including the operations .md file because
otherwise the separation is not clear.
This document captures the design philosophy of the acc dialect. It also
shares the rationale behind the design and implementation of various
operations - and ties that back to the dialect design goals.
Co-authored-by: Valentin Clement <clementval@gmail.com>
Co-authored-by: Slava Zakharin <szakharin@nvidia.com>
This PR improves the documentation for the `gpu-lower-to-nvvm-pipeline`
(as it was remaning item for #75775)
- Changes pipeline `gpu-lower-to-nvvm` -> `gpu-lower-to-nvvm-pipeline`
- Adds a section in GPU Dialect in website. It clarifies the pipeline's
functionality in lowering primary dialects to NVVM targets.
This renames the `emitc.call` op to `emitc.call_opaque` as the existing
call op does not refer to the callee by symbol. The rename allows to
introduce a new call op alongside with a future `emitc.func` op to model
and facilitate functions and function calls.
The auto-generated summaries were hard to read (and pretty unhelpful), a
SME tile was:
```
vector<[16]x[16]xi8> of 8-bit signless integer values or vector<[8]x[8]xi16> of 16-bit signless integer values or vector<[4]x[4]xi32> of 32-bit signless integer values or vector<[2]x[2]xi64> of 64-bit signless integer values or vector<[1]x[1]xi128> of 128-bit signless integer values or vector<[8]x[8]xf16> of 16-bit float values or vector<[8]x[8]xbf16> of bfloat16 type values or vector<[4]x[4]xf32> of 32-bit float values or vector<[2]x[2]xf64> of 64-bit float values
```
...and a SVE vector was:
```
of ranks 1scalable vector of 8-bit signless integer or 16-bit signless integer or 32-bit signless integer or 64-bit signless integer or 128-bit signless integer or 16-bit float or bfloat16 type or 32-bit float or 64-bit float values of length 16/8/4/2/1
```
Note: The descriptions added here won't yet be shown on the MLIR docs
(only the short summaries), but this should be easy to enable like it
was for attribute descriptions in #67009.
A table of contents (TOC) is also added to the ArmSME docs page to make
it easier to navigate.
Add an emitc.for op to the EmitC dialect as a lowering target for
scf.for, replacing its current direct translation to C; The translator
now handles emitc.for instead.
The vector.extract assembly format currently only contains the source
type, for example:
%1 = vector.extract %0[1] : vector<3x7x8xf32>
it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:
%1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>
Add an emitc.if op to the EmitC dialect. A new convert-scf-to-emitc
pass replaces the existing direct translation of scf.if to C; The
translator now handles emitc.if instead.
The emitc.if op doesn't return any value and its then/else regions are
terminated with a new scf.yield op. Values returned by scf.if are
lowered using emitc.variable ops, assigned to in the then/else regions
using a new emitc.assign op.
This revision replaces the LLVM dialect NullOp by the recently
introduced ZeroOp. The ZeroOp is more generic in the sense that it
represents zero values of any LLVM type rather than null pointers only.
This is a follow to https://github.com/llvm/llvm-project/pull/65508
Adds documentation to the GPU dialect docs giving a general overview of the new
compilation mechanism introduced in the patch series ending in D154153.
Reviewed By: mehdi_amini
Differential Revision: https://reviews.llvm.org/D157461
Introduces a generic parameter type intended for transform dialect use
cases that uses/manipulates the underlying attribute (e.g. op
annotation). Includes a disclaimer that mixing parameter based and
attribute based control is discouraged.
Differential Revision: https://reviews.llvm.org/D155980
A distinct attribute associates a referenced attribute with a unique
identifier. Every call to its create function allocates a new
distinct attribute instance. The address of the attribute instance
temporarily serves as its unique identifier. Similar to the names
of SSA values, the final unique identifiers are generated during
pretty printing.
Examples:
#distinct = distinct[0]<42.0 : f32>
#distinct1 = distinct[1]<42.0 : f32>
#distinct2 = distinct[2]<array<i32: 10, 42>>
This mechanism is meant to generate attributes with a unique
identifier, which can be used to mark groups of operations
that share a common properties such as if they are aliasing.
The design of the distinct attribute ensures minimal memory
footprint per distinct attribute since it only contains a reference
to another attribute. All distinct attributes are stored outside of
the storage uniquer in a thread local store that is part of the
context. It uses one bump pointer allocator per thread to ensure
distinct attributes can be created in-parallel.
Reviewed By: rriddle, Dinistro, zero9178
Differential Revision: https://reviews.llvm.org/D153360
The transform dialect has been around for a while and is sufficiently
stable at this point. Add the first three chapters of the tutorial
describing its usage and extension.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D151491
- Added missing TensorTransformOps to the Transform doc
- Added missing AMDGPUPasses to the Passes doc
- Place `async dialect` in alphabetical order in the Passes doc
Reviewed By: ftynse
Differential Revision: https://reviews.llvm.org/D150341
Add a set of transform operations into the "structured" extension of the
Transform dialect that allow one to select transformation targets more
specifically than the currently available matching. In particular, add
the mechanism for identifying the producers of operands (input and init
in destination-passing style) and users of results, as well as
mechanisms for reasoning about the shape of the iteration space.
Additionally, add several transform operations to manipulate parameters
that could be useful to implement more advanced selectors. Specifically,
new operations let one produce and compare parameter values to implement
shape-driven transformations.
New operations are placed in separate files to decrease compilation
time. Some relayering of the extension is necessary to avoid repeated
generation of enums.
Depends on D148013
Depends on D148014
Depends on D148015
Reviewed By: chelini
Differential Revision: https://reviews.llvm.org/D148017
This patch introduces the poison constant from LLVM in the LLVM IR dialect. It also adds import and export support for it, along with roundtrip tests.
Reviewed By: gysit
Differential Revision: https://reviews.llvm.org/D146631
Introduce support for the third kind of values in the transform dialect:
value handles. Similarly to operation handles, value handles are
pointing to a set of values in the payload IR. This enables
transformation to be targeted at specific values, such as individual
results of a multi-result payload operation without indirecting through
the producing op or block arguments that previously could not be easily
addressed. This is expected to support a broad class of memory-oriented
transformations such as selective bufferization, buffer assignment, and
memory transfer management.
Value handles are functionally similar to operation handles and require
similar implementation logic. The most important change concerns the
handle invalidation mechanism where operation and value handles can
affect each other.
This patch includes two cleanups that make it easier to introduce value
handles:
- `RaggedArray` structure that encapsulates the SmallVector of
ArrayRef backed by flat SmallVector logic, frequently used in the
transform interfaces implementation;
- rewrite the tests that associated payload handles with an integer
value `reinterpret_cast`ed as a pointer, which were a frequent
source of confusion and crashes when adding more debugging
facilities that can inspect the payload.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D143385
- Modified single-quote to back-quote at op name, etc.
- Remove a duplicated `affine.store` op's doc
- Fix broken links
- Move Syntax of `StoreOp` and `LoadOp` from Affine.md to AffineOps.td
Reviewed By: bondhugula, dcaballe
Differential Revision: https://reviews.llvm.org/D142858
`applyTransforms` now takes an optional mapping to be associated with
trailing block arguments of the top-level transform op, in addition to
the payload root. This allows for more advanced forms of communication
between C++ code and the transform dialect interpreter, in particular
supplying operations without having to re-match them during
interpretation.
Reviewed By: shabalin
Differential Revision: https://reviews.llvm.org/D142559