Commit Graph

107 Commits

Author SHA1 Message Date
Prashant Kumar
5b702be1e8 [mlir][math] Convert math.fpowi to math.powf in case of non constant (#87472)
Convert math.fpowi to math.powf by converting dtype of power operand to
floating point.
2024-04-03 22:19:26 +05:30
Prashant Kumar
10a57f3aff [mlir][math] Expand powfI operation for constant power operand. (#87081)
-- Convert `math.fpowi` to a series of `arith.mulf` operations.
-- If the power is negative, we divide the result by 1.
2024-04-01 13:18:27 +05:30
srcarroll
d39ac3a8e0 [mlir][math] Reland 58ef9bec07 (#85436)
The previous implementation decomposes tanh(x) into
`(exp(2x) - 1)/(exp(2x)+1), x < 0`
`(1 - exp(-2x))/(1 + exp(-2x)), x >= 0`
This is fine as it avoids overflow with the exponential, but the whole
decomposition is computed for both cases unconditionally, then the
result is chosen based off the sign of the input. This results in doing
two expensive exp computations.

The proposed change avoids doing the whole computation twice by
exploiting the reflection symmetry `tanh(-x) = -tanh(x)`. We can
"normalize" the input to be positive by setting `y = sign(x) * x`, where
the sign of `x` is computed as `sign(x) = (float)(x > 0) * (-2) + 1`.
Then compute `z = tanh(y) `with the decomposition above for `x >=0` and
"denormalize" the result `z * sign(x)` to retain the sign. The reason it
is done this way is that it is very amenable to vectorization.

This method trades the duplicate decomposition computations (which takes
5 instructions including an extra expensive exp and div) for 4 cheap
instructions to compute the signs value

`arith.cmpf `(which is a pre-existing instruction in the previous impl)
`arith.sitofp`
`arith.mulf`
`arith.addf`
and 1 more instruction to get the right sign in the result
5. `arith.mulf`. 
Moreover, numerically, this implementation will yield the exact same
results as the previous implementation.

As part of the relanding, a casting issue from the original commit has
been fixed, i.e. casting bool to float with `uitofp`. Additionally a
correctness test with `mlir-cpu-runner` has been added.
2024-03-17 11:23:30 -05:00
Benjamin Maxwell
e74bcecd36 [mlir][math] Propagate scalability in polynomial approximation (#84949)
This simply updates the rewrites to propagate the scalable flags (which
as they do not alter the vector shape, is pretty simple).

The added tests are simply scalable versions of the existing vector
tests.
2024-03-15 20:08:52 +00:00
srcarroll
f75d164eea Revert "[mlir][math] Implement alternative decomposition for tanh (#8… (#85429)
…5025)"

This reverts commit 58ef9bec07.

There is a bool to float casting issue that needs to be sorted out to
make sure this is target independent
2024-03-15 14:39:57 -05:00
srcarroll
58ef9bec07 [mlir][math] Implement alternative decomposition for tanh (#85025)
The previous implementation decomposes `tanh(x)` into
`(exp(2x) - 1)/(exp(2x)+1), x < 0`
`(1 - exp(-2x))/(1 + exp(-2x)), x >= 0`
This is fine as it avoids overflow with the exponential, but the whole
decomposition is computed for both cases unconditionally, then the
result is chosen based off the sign of the input. This results in doing
two expensive `exp` computations.

The proposed change avoids doing the whole computation twice by
exploiting the reflection symmetry `tanh(-x) = -tanh(x)`. We can
"normalize" the input to be positive by setting `y = sign(x) * x`, where
the sign of `x` is computed as `sign(x) = (float)(x > 0) * (-2) + 1`.
Then compute `z = tanh(y)` with the decomposition above for `x >=0` and
"denormalize" the result `z * sign(x)` to retain the sign. The reason it
is done this way is that it is very amenable to vectorization.

This method trades the duplicate decomposition computations (which takes
5 instructions including an extra expensive `exp` and `div`) for 4 cheap
instructions to compute the signs value
1. `arith.cmpf` (which is a pre-existing instruction in the previous
impl)
2. `arith.sitofp`
3. `arith.mulf`
4. `arith.addf`

and 1 more instruction to get the right sign in the result
5. `arith.mulf`. Moreover, numerically, this implementation will yield
the exact same results as the previous implementation.
2024-03-14 19:18:56 -05:00
Krzysztof Drewniak
05e85e4fc5 [mlir][Math] Add pass to legalize math functions to f32-or-higher (#78361)
Since most of the operations in the `math` dialect don't have
low-precision implementations, add the -math-legalize-to-f32 pass that
goes through and brackets low-precision math funcitons (like `math.sin
%0 : f16`) with `arith.extf` and `arith.truncf`. This preserves the
original semantics of the math operation but allows lowering to proceed.

Versions of this lowering are already implicitly present in some passes,
like ConvertGPUToROCDL. However, because those are implicit rewrites,
they hide the floating-point extension and truncation, preventing anyone
from writing passes that operate on those implitic extf/truncf pairs.

Exposing this legalization explicitly is needed to allow lowening 8-bit
floats on AMD GPUs, as the implementation of extf and truncf on that
platform requires the complex logic found in ArithToAMDGPU, which runs
before the GPU to ROCDL lowering.
2024-01-18 09:37:43 -06:00
Matthias Springer
bb6d5c2200 [mlir][Transforms] GreedyPatternRewriteDriver: Do not CSE constants during iterations (#75897)
The `GreedyPatternRewriteDriver` tries to iteratively fold ops and apply
rewrite patterns to ops. It has special handling for constants: they are
CSE'd and sometimes moved to parent regions to allow for additional
CSE'ing. This happens in `OperationFolder`.

To allow for efficient CSE'ing, `OperationFolder` maintains an internal
lookup data structure to find the existing constant ops with the same
value for each `IsolatedFromAbove` region:
```c++
/// A mapping between an insertion region and the constants that have been
/// created within it.
DenseMap<Region *, ConstantMap> foldScopes;
```

Rewrite patterns are allowed to modify operations. In particular, they
may move operations (including constants) from one region to another
one. Such an IR rewrite can make the above lookup data structure
inconsistent.

We encountered such a bug in a downstream project. This bug materialized
in the form of an op that uses the result of a constant op from a
different `IsolatedFromAbove` region (that is not accessible).

This commit changes the behavior of the `GreedyPatternRewriteDriver`
such that `OperationFolder` is used to CSE constants at the beginning of
each iteration (as the worklist is populated), but no longer during an
iteration. `OperationFolder` is no longer used after populating the
worklist, so we do not have to care about inconsistent state in the
`OperationFolder` due to IR rewrites. The `GreedyPatternRewriteDriver`
now performs the op folding by itself instead of calling
`OperationFolder::tryToFold`.

This change changes the order of constant ops in test cases, but not the
region in which they appear. All broken test cases were fixed by turning
`CHECK` into `CHECK-DAG`.

Alternatives considered: The state of `OperationFolder` could be
partially invalidated with every `notifyOperationModified` notification.
That is more fragile than the solution in this commit because incorrect
rewriter API usage can lead to missing notifications and hard-to-debug
`IsolatedFromAbove` violations. (It did not fix the above mention bug in
a downstream project, which could be due to incorrect rewriter API usage
or due to another conceptual problem that I missed.) Moreover, ops are
frequently getting modified during a greedy pattern rewrite, so we would
likely keep invalidating large parts of the state of `OperationFolder`
over and over.

Migration guide: Turn `CHECK` into `CHECK-DAG` in test cases. Constant
ops are no longer folded during a greedy pattern rewrite. If you rely on
folding (and rematerialization) of constant ops during a greedy pattern
rewrite, turn the folder into a pattern.
2024-01-05 09:22:18 +01:00
Cullen Rhodes
9816edc9f3 [mlir][vector] add result type to vector.extract assembly format (#66499)
The vector.extract assembly format currently only contains the source
type, for example:

  %1 = vector.extract %0[1] : vector<3x7x8xf32>

it's not immediately obvious if this is the source or result type. This
patch improves the assembly format to make this clearer, so the above
becomes:

  %1 = vector.extract %0[1] : vector<7x8xf32> from vector<3x7x8xf32>
2023-09-28 11:11:16 +01:00
Ivan Butygin
5dce74817b [mlir][ub] Add poison support to CommonFolders.h
Return poison from foldBinary/unary if argument(s) is poison. Add ub dialect as dependency to affected dialects (arith, math, spirv, shape).
Add poison materialization to dialects. Add tests for some ops from each dialect.
Not all affected ops are covered as it will involve a huge copypaste.

Differential Revision: https://reviews.llvm.org/D159013
2023-09-07 12:30:29 +02:00
Balaji V. Iyer
f66e4bd67a [mlir][math] Modify math.powf to handle negative bases.
Powf expansion currently returns NaN when the base is negative.
This is because taking natural log of a negative number gives
NaN. This patch will square the base and half the exponent, thereby
getting around the negative base problem.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D158797
2023-08-25 15:35:05 -07:00
Alexander Shaposhnikov
fe355a44e7 [MLIR][Math] Add support for f64 in the expansion of math.roundeven
Add support for f64 in the expansion of math.roundeven.
Associated GitHub issue: https://github.com/openxla/iree/issues/13522
This is based on the offline discussion and essentially recommits
https://reviews.llvm.org/D158234.

Test plan: ninja check-mlir check-all
2023-08-24 21:41:26 +00:00
Alexander Shaposhnikov
d22883e384 Revert "[MLIR][Math] Add support for f16 in the expansion of math.roundeven"
This reverts commit 40bf36319e.
The build bot ppc64le-mlir-rhel-test got broken by these changes,
see https://lab.llvm.org/buildbot#builders/88/builds/61048 .
2023-08-18 18:20:52 +00:00
Alexander Shaposhnikov
40bf36319e [MLIR][Math] Add support for f16 in the expansion of math.roundeven
Add support for f16 in the expansion of math.roundeven.
Associated GitHub issue: https://github.com/openxla/iree/issues/13522
This version addresses the build issues on Windows reported on
https://reviews.llvm.org/D157204

Test plan: ninja check-mlir check-all

Differential revision: https://reviews.llvm.org/D158234
2023-08-18 17:48:34 +00:00
Alexander Shaposhnikov
f745c91f61 Revert "[MLIR][Math] Add support for f16 in the expansion of math.roundeven"
This reverts commit b96f6cf629
since it has broken some Windows build bots
(see https://reviews.llvm.org/D157204).
Will recommit a fixed version later.
2023-08-17 23:22:05 +00:00
Alexander Shaposhnikov
b96f6cf629 [MLIR][Math] Add support for f16 in the expansion of math.roundeven
Add support for f16 in the expansion of math.roundeven.
Associated GitHub issue: https://github.com/openxla/iree/issues/13522

Test plan: ninja check-mlir check-all

Differential revision: https://reviews.llvm.org/D157204
2023-08-17 18:40:26 +00:00
Robert Suderman
0bedb667af [mlir][math] Improved math.atan approximation
Used the cephes numerical approximation for `math.atan`. This is a
significant accuracy improvement over the previous taylor series
approximation.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D153656
2023-06-23 17:25:34 -07:00
Robert Suderman
710dc7282a [mlir][math] Modified the 'math.exp' lowering for higher precision
The existing lowering has lower precision for certain use cases, e.g.
tanh. Improved version should demonstrate an overall higher level of precision.

Reviewed By: cota, jpienaar

Differential Revision: https://reviews.llvm.org/D153592
2023-06-23 12:25:18 -07:00
Ivan Butygin
ee8b8d6b58 [mlir][math] Uplift from arith to math.fma
Add pass to uplift from arith mulf + addf ops to math.fma if fastmath flags allow it.

Differential Revision: https://reviews.llvm.org/D152633
2023-06-18 17:11:21 +02:00
Ramiro Leal-Cavazos
44baa65589 Revert "Revert "Fix handling of special and large vals in expand pattern for round" and "Add pattern that expands math.roundeven into math.round + arith""
This reverts commit 87cef78fa1.

The issue in the original revert is that a lit test expecting a `-nan`
as an output was failing on M2. Since the IEEE 754-2008 standard does
not require the sign to be printed when displaying a `nan`, this
commit changes the `CHECK` for `-nan` to one that checks the result
value bitcasted to an `i32` to ensure that input is being left
unchanged. This check should now be independent of platform being used
to run test.

Reviewed By: jpienaar, mehdi_amini

Differential Revision: https://reviews.llvm.org/D148941
2023-04-22 07:15:40 -07:00
Mehdi Amini
87cef78fa1 Revert "Fix handling of special and large vals in expand pattern for round" and "Add pattern that expands math.roundeven into math.round + arith"
This reverts commit 8d2bae9abd and
commit ab2fc9521e.

Tests are broken on Mac M2
2023-04-21 00:16:32 -06:00
Ramiro Leal-Cavazos
8d2bae9abd Add pattern that expands math.roundeven into math.round + arith
This commit adds a pattern that expands `math.roundeven` into
`math.round` + some ops from `arith`. This is needed to be able to run
`math.roundeven` in a vectorized manner.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D148285
2023-04-20 12:48:12 -07:00
Ramiro Leal-Cavazos
ab2fc9521e Fix handling of special and large vals in expand pattern for round
The current expand pattern for `math.round` does not handle the
special values -0.0, +-inf, and +-nan correctly. It also does not
properly handle values with magnitude |x| >= 2^23. Lastly, the pattern
generates invalid IR when the input to `math.round` is a vector. This
patch fixes these issues.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D148398
2023-04-20 18:08:19 +00:00
Balaji V. Iyer
2d4e856709 [mlir][math] Expand math.powf to exp, log and multiply
Powf functions are pushed directly to libm. This is problematic for
situations where libm is not available. This patch will decompose the
powf function into log of exponent multiplied by log of base and raise
it to the exp.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D148164
2023-04-14 14:04:19 +00:00
Balaji V. Iyer
be9115788c [mlir][math] Expand math.round to truncate, compare and increment.
Round functions are pushed directly to libm. This is problematic for
situations where libm is not available. This patch will decompose the
roundf function by adding 0.5 to positive number to input
(subtracting for negative) following by a truncate.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D148026
2023-04-13 18:02:10 +00:00
Balaji V. Iyer
4da96515ea [mlir][math] Expand math.exp2 to use math.exp.
Exp2 functions are pushed directly to libm. This is problematic for
situations where libm is not available. This patch will expand the exp2
function to use exp2 with the input multiplied by ln2 (natural log).

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D148064
2023-04-13 16:06:04 +00:00
Balaji V. Iyer
2217888d2c [mlir][math] Expand math.ceilf to truncate, compares and increments
Ceilf are pushed directly to libm. This is problematic for
situations where libm is not available. This patch will break down
a ceilf function to truncate followed by an increment if the
truncated value is smaller than the input value.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D147974
2023-04-11 13:52:45 +00:00
Alex Zinenko
76c3908273 [mlir] rename Math/dependent-dialect.mlir to depends-on-arith.mlir 2023-04-11 12:35:46 +02:00
Alex Zinenko
56a275b999 [mlir] make Math dialect depend on Arith dialect
Ops from the Math dialect use fastmath attributes defined in Arith.
Therefore Math dialect must declare a dependency on Arith for proper
construction and parsing.

Reviewed By: tpopp

Differential Revision: https://reviews.llvm.org/D147999
2023-04-11 12:34:51 +02:00
Balaji V. Iyer
af9eb1e384 [mlir][math] Expand math.floorf to truncate, compares and increments
Floorf are pushed directly to libm. This is problematic for
situations where libm is not available. This patch will break down
a floorf function to truncate followed by an increment for negative
values, if necessary.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D147966
2023-04-10 21:04:27 +00:00
Balaji V. Iyer
a7c2102d98 [mlir][math]Expand Fused math.fmaf to a multiply-add
Fused multiply and add are being pushed directly to the libm. This is problematic
for situations where libm is not available. This patch will break down a fused multiply and
add into a multiply followed by an add.

Reviewed By: rsuderman

Differential Revision: https://reviews.llvm.org/D147811
2023-04-07 22:14:56 +00:00
Robert Suderman
711c58938f [mlir][math] Update math arith expansions for vectorization
The math arithmetic expansions do not support vectorized types.
Updated the lowerings so that they support vectorized types. This
includes a different implementation for `math.ctlz` to be a binary
search and not have variable termination time.

Reviewed By: jpienaar, NatashaKnk

Differential Revision: https://reviews.llvm.org/D147289
2023-04-06 18:42:01 +00:00
Robert Suderman
57e1943e8f [mlir] Add support for non-f32 polynomial approximation
Polynomial approximations assume F32 values. We can convert all non-f32
cases to operate on f32s with intermediate casts.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D146677
2023-03-27 21:57:57 +00:00
Robert Suderman
6b53881048 [mlir][math] Add math.cbrt polynomial approximation
Cbrt can be approximated with some relatively simple polynomial
operators. This includes a lit test validating the implementation
and some run tests that validate numerical correct.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D145019
2023-03-06 13:29:49 -08:00
Robert Suderman
740e2e908c [mlir][math] Math expansion for math.tan
We can implement a polynomial approximation of math.tan by
decomposing to `math.sin` and `math.cos`. While it is not
technically a polynomial approximation it should be the most
straight forward approximation.

Reviewed By: jpienaar

Differential Revision: https://reviews.llvm.org/D144980
2023-03-01 01:13:54 +00:00
Matthias Springer
ed9194be6d [mlir] GreedyPatternRewriter: Add ancestors to worklist
When adding an op to the worklist, also add its ancestors to the worklist. This allows for RewritePatterns to match an op `a` based on what is inside of the body of `a`.

This change fixes a problem that became apparent with `vector.warp_execute_on_lane_0`, but could probably be triggered with similar patterns. The pattern extracts an op `b` with `eligible = true` from the body of an op `a`:
```
test.a {
  %0 = test.b() {eligible = true}
  yield %0
}
```

Afterwards:
```
%0 = test.b() {eligible = true}
test.a {
  yield %0
}
```

The pattern is an `OpRewritePattern<OpA>`. For some reason, `test.a` is not on the GreedyPatternRewriter's worklist. E.g., because no pattern could be applied and it was removed. Now, another pattern updates `test.b`, so that `eligible` is changed from `true` to `false`. The `OpRewritePattern<OpA>` could now be applied, but (without this revision) `test.a` is still not on the worklist.

Note: In the above example, an `OpRewritePattern<OpB>` could have been used instead of an `OpRewritePattern<OpA>`. With such a design, we can run into the same problem (when the `eligible` attr is on `test.a` and `test.b` is removed from the worklist because no patterns could be applied).

Note: This change uncovered an unrelated bug in TestSCFUtils.cpp that was triggered due to a change in the order in which ops are processed. A TODO is added to the broken code and test cases are adapted so that the bug is no longer triggered.

Differential Revision: https://reviews.llvm.org/D140304
2023-01-13 10:51:28 +01:00
Matthias Springer
e7790fbed3 [mlir] Add test-convergence option to Canonicalizer tests
This new option is set to `false` by default. It should  be set only in Canonicalizer tests to detect faulty canonicalization patterns. I.e., patterns that prevent the canonicalizer from converging. The canonicalizer should always convergence on such small unit tests that we have in `canonicalize.mlir`.

Two faulty canonicalization patterns were detected and fixed with this change.

Differential Revision: https://reviews.llvm.org/D140873
2023-01-04 12:02:21 +01:00
Johannes Reifferscheid
998a3a3894 Add a math.cbrt instruction and lowering to libm.
There's currently no way to get accurate cube roots in the math dialect.
powf(x, 1/3.0) is too inaccurate in some cases.

Reviewed By: akuegel

Differential Revision: https://reviews.llvm.org/D140842
2023-01-03 08:44:12 +01:00
Slava Zakharin
07a4c4d601 [mlir][math] Added arith::FastMathAttr support for math::FPowI.
Differential Revision: https://reviews.llvm.org/D139805
2022-12-13 20:47:20 -08:00
Slava Zakharin
095ce655ec [mlir][math] Simplify pow(x, 0.75) into sqrt(sqrt(x)) * sqrt(x).
Trivial simplification for CPU2017/503.bwaves resulting in 3.89%
speed-up on icelake.

Differential Revision: https://reviews.llvm.org/D137351
2022-11-04 10:48:19 -07:00
Slava Zakharin
589764a382 [mlir][math] Initial support for fastmath flag attributes for Math dialect.
Added arith::FastMathAttr and ArithFastMathInterface support for Math dialect
floating point operations.

This change-set creates ArithCommon conversion utils that currently
provide classes and methods to aid with arith::FastMathAttr conversion
into LLVM::FastmathFlags. These utils are used in ArithToLLVM and
MathToLLVM convertors, but may eventually be used by other converters
that need to convert fast math attributes.

Since Math dialect operations use arith::FastMathAttr, MathOps.td now
has to include enum and attributes definitions from Arith dialect.
To minimize the amount of TD code included from Arith dialect,
I moved FastMathAttr definition into ArithBase.td.

Differential Revision: https://reviews.llvm.org/D136312
2022-11-04 10:41:56 -07:00
jacquesguan
a4ace22c05 [mlir][Math] Change regex to match fp value on different target.
Link: https://github.com/llvm/llvm-project/issues/58048

Reviewed By: ftynse, Mogball

Differential Revision: https://reviews.llvm.org/D134850
2022-10-12 15:08:21 +08:00
jacquesguan
6eebdc46e4 [mlir][Math] Add constant folder for ErfOp.
This patch adds constant folder for ErfOp by using erf/erff of libm.

Reviewed By: ftynse, Mogball

Differential Revision: https://reviews.llvm.org/D134017
2022-09-19 10:55:16 +08:00
jacquesguan
71e52a125c [mlir][Math] Add constant folder for SinOp.
This patch adds constant folder for SinOp by using sin/sinf of libm.

Reviewed By: ftynse

Differential Revision: https://reviews.llvm.org/D133915
2022-09-16 14:30:05 +08:00
Jeff Niu
3108249dea [MLIR][math] Use approximate matches for folded ops
LibM implementations differ, so the folders can have different results
on different platforms. For instance, the `cos` folder was failing on M1
mac. I chose to match the constant floats to 2(.5) significant digits.

Reviewed By: jacquesguan

Differential Revision: https://reviews.llvm.org/D133797
2022-09-14 08:39:41 -07:00
jacquesguan
9d0b90e933 [mlir][Math] Add TruncOp.
This patch adds TruncOp for Math, it returns the operand rounded to the nearest integer not larger in magnitude than the operand. And this patch also adds the correspond llvm intrinsic op.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D133342
2022-09-09 10:01:28 +08:00
Kai Sasaki
5bb621056b [mlir][math] Canonicalization for math.floor op
Support constant folding for math.floor op as well as math.ceil.

Reviewed By: Mogball

Differential Revision: https://reviews.llvm.org/D133398
2022-09-09 10:21:48 +09:00
jacquesguan
b53bccb18d [mlir][Math] Add constant folder for RoundOp.
This patch uses round/roundf of libm to fold RoundOp of constant.

Differential Revision: https://reviews.llvm.org/D133401
2022-09-08 14:51:17 +08:00
jacquesguan
ac66d87c4b [mlir][Math] Add constant folder for RoundEvenOp.
This patch uses roundeven/roundevenf of libm to fold RoundEvenOp of constant.

Differential Revision: https://reviews.llvm.org/D133344
2022-09-07 11:13:00 +08:00
jacquesguan
e3434a8627 [mlir][Math] Add constant folder for CosOp.
This patch adds constant folder for CosOp which only supports single and double precision floating-point.

Differential Revision: https://reviews.llvm.org/D131233
2022-09-07 10:54:08 +08:00