This reverts commit f681225700. That
commit changed the organization of the tests of the transform dialect
interpreter but did not take into account some tests that were added in
the meantime.
A recent commit (#69190) broke the bazel builds. Turns out that Bazel
uses symlinks for providing the test files, which the path expansion of
the module loading mechanism did not handle correctly. This PR fixes
that.
It also reorganizes the tests better: It puts all `.mlir` files that are
included by some other test into a common `include` folder. This greatly
simplifies the definition of the dependencies between the different
`.mlir` files in Bazel's `BUILD` file. The commit also adds a comment to
all included files why these aren't tested themselves direclty and uses
the `%{fs-sep}` expansion for paths more consistently. Finally, it
uncomments all but one of the tests excluded in Bazel because they seem
to run now. (The remaining one includes a file that it itself a test, so
it would have to live *in* and *outside* of the `include` folder.)
This cleans up all external entry points that will have to deal with
non-permutations, making any subsequent refactoring much more local to
the lib files.
Recent changes (https://github.com/llvm/llvm-project/pull/66930)
disabled vector transfer ops hoisting with view-like intermediate ops.
The recommended way is to fold subview ops into transfer op indices
before invoking hoisting. That would mean now we see transfer op indices
involving dynamic values, instead of static constant values before with
subview ops. Therefore hoisting won't kick in anymore. This breaks
downstream users.
To fix it, this commit enables hoisting transfer ops with dynamic
indices by using `ValueBoundsConstraintSet` to prove ranges are disjoint
in `isDisjointTransferIndices`. Given that utility is used in many
places including op folders, right now we introduce a flag to it and
only set as true for "heavy" transforms in hoisting and load-store
forwarding.
This revision provides the ability to use an arbitrary named sequence op
as
the entry point to a transform dialect strategy.
It is also a step towards better transform dialect usage in pass
pipelines
that need to preload a transform library rather thanparse it on the fly.
The interpreter itself is significantly simpler than its testing
counterpart
by avoiding payload/debug root tags and multiple shared modules.
In the process, the NamedSequenceOp::apply function is adapted to allow
it
being an entry point.
NamedSequenceOp is **not** extended to take the PossibleTopLevelTrait at
this
time, because the implementation of the trait is specific to allowing
one
top-level dangling op with a region such as SequenceOp or
AlternativesOp.
In particular, the verifier of PossibleTopLevelTrait does not allow for
an
empty body, which is necessary to declare a NamedSequenceOp that gets
linked
in separately before application.
In the future, we should dispense with the PossibleTopLevelTrait
altogether
and always enter the interpreter with a NamedSequenceOp.
Lastly, relevant TD linking utilities are moved to
TransformInterpreterUtils
and reused from there.
The `release` flag is misleading and its semantics are not well defined.
Originally this was meant to allow for different `LIBC_NAMESPACE`
depending on whether the code was considered stabled and released or
unstable. It appears that we may have a canary environment that is
neither released or dev. As a consequence we move the `LIBC_NAMESPACE`
definition to its own file and each environment can override this file
with whatever makes sense.
It's no longer possible to submit bitcode apps to the Apple App Store.
The tools
used to create xar archived bitcode sections inside MachO files have
been
discontinued. Additionally, the xar APIs have been deprecated since
macOS 12,
so this change removes unnecessary code from objdump and all
dependencies on
libxar.
This fixes rdar://116600767
This revision introduces a MapRef, which will support a future
generalization beyond permutations (e.g. block sparsity). This revision
also unifies the conversion/codegen paths for the sparse_tensor.new
operation from file (eg. the readers). Note that more unification is
planned as well as general affine dim2lvl and lvl2dim (all marked with
TODOs).
This PR introduces a new Op called `warpgroup.mma.store` to the NVGPU
dialect of MLIR. The purpose of this operation is to facilitate storing
fragmanted result(s) `nvgpu.warpgroup.accumulator` produced by
`warpgroup.mma` to the given memref.
An example of fragmentated matrix is given here :
https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#wgmma-64n16-d
The `warpgroup.mma.store` does followings:
1) Takes one or more `nvgpu.warpgroup.accumulator` type (fragmented
results matrix)
2) Calculates indexes per thread in warp-group and stores the data into
give memref.
Here's an example usage:
```
// A warpgroup performs GEMM, results in fragmented matrix
%result1, %result2 = nvgpu.warpgroup.mma ...
// Stores the fragmented result to memref
nvgpu.warpgroup.mma.store [%result1, %result2], %matrixD :
!nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>,
!nvgpu.warpgroup.accumulator< fragmented = vector<64x128xf32>>
to memref<128x128xf32,3>
```
Implementing expm1 function for double precision based on exp function
algorithm:
- Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where:
* hi is an integer
* mid1 * 2^-6 is an integer
* mid2 * 2^-12 is an integer
* |lo| < 2^-13 + 2^-30
- Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi *
(2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) )
- We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of
(e^lo - 1) / lo in double precision
- If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of
(e^lo - 1) / lo in double double precision
- If the Ziv accuracy test still fails, we re-evaluate everything in
128-bit precision.