- added all variations of ffma and fdiv
- will add all new headers into yaml for next patch
- only fsub is left then all basic operations for float is complete
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
- fadd removed because I need to add for different input types
- finishing rest of basic operations
- noticed duplicates will remove
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
Division-less Newton iterations algorithm for cube roots.
1. **Range reduction**
For `x = (-1)^s * 2^e * (1.m)`, we get 2 reduced arguments `x_r` and `a`
as:
```
x_r = 1.m
a = (-1)^s * 2^(e % 3) * (1.m)
```
Then `cbrt(x) = x^(1/3)` can be computed as:
```
x^(1/3) = 2^(e / 3) * a^(1/3).
```
In order to avoid division, we compute `a^(-2/3)` using Newton method
and then
multiply the results by a:
```
a^(1/3) = a * a^(-2/3).
```
2. **First approximation to a^(-2/3)**
First, we use a degree-7 minimax polynomial generated by Sollya to
approximate `x_r^(-2/3)` for `1 <= x_r < 2`.
```
p = P(x_r) ~ x_r^(-2/3),
```
with relative errors bounded by:
```
| p / x_r^(-2/3) - 1 | < 1.16 * 2^-21.
```
Then we multiply with `2^(e % 3)` from a small lookup table to get:
```
x_0 = 2^(-2*(e % 3)/3) * p
~ 2^(-2*(e % 3)/3) * x_r^(-2/3)
= a^(-2/3)
```
with relative errors:
```
| x_0 / a^(-2/3) - 1 | < 1.16 * 2^-21.
```
This step is done in double precision.
3. **First Newton iteration**
We follow the method described in:
Sibidanov, A. and Zimmermann, P., "Correctly rounded cubic root
evaluation
in double precision", https://core-math.gitlabpages.inria.fr/cbrt64.pdf
to derive multiplicative Newton iterations as below:
Let `x_n` be the nth approximation to `a^(-2/3)`. Define the n^th error
as:
```
h_n = x_n^3 * a^2 - 1
```
Then:
```
a^(-2/3) = x_n / (1 + h_n)^(1/3)
= x_n * (1 - (1/3) * h_n + (2/9) * h_n^2 - (14/81) * h_n^3 + ...)
```
using the Taylor series expansion of `(1 + h_n)^(-1/3)`.
Apply to `x_0` above:
```
h_0 = x_0^3 * a^2 - 1
= a^2 * (x_0 - a^(-2/3)) * (x_0^2 + x_0 * a^(-2/3) + a^(-4/3)),
```
it's bounded by:
```
|h_0| < 4 * 3 * 1.16 * 2^-21 * 4 < 2^-17.
```
So in the first iteration step, we use:
```
x_1 = x_0 * (1 - (1/3) * h_n + (2/9) * h_n^2 - (14/81) * h_n^3)
```
Its relative error is bounded by:
```
| x_1 / a^(-2/3) - 1 | < 35/242 * |h_0|^4 < 2^-70.
```
Then we perform Ziv's rounding test and check if the answer is exact.
This step is done in double-double precision.
4. **Second Newton iteration**
If the Ziv's rounding test from the previous step fails, we define the
error
term:
```
h_1 = x_1^3 * a^2 - 1,
```
And perform another iteration:
```
x_2 = x_1 * (1 - h_1 / 3)
```
with the relative errors exceed the precision of double-double.
We then check the Ziv's accuracy test with relative errors < 2^-102 to
compensate for rounding errors.
5. **Final iteration**
If the Ziv's accuracy test from the previous step fails, we perform
another
iteration in 128-bit precision and check for exact outputs.
Summary:
This function is used by the CUDA / HIP / OpenMP headers and exists as
an NVIDIA extension basically. This function is implemented in the C23
standard as `pown`, but for now we need to provide `powi` for backwards
compatibility. In the future this entrypoint will just be a redirect to
`pown` once that is implemented.
Fixes https://github.com/llvm/llvm-project/issues/92874
Algorithm: Let `x = (-1)^s * 2^e * (1 + m)`.
- Step 1: Range reduction: reduce the exponent with:
```
y = cbrt(x) = (-1)^s * 2^(floor(e/3)) * 2^((e % 3)/3) * (1 + m)^(1/3)
```
- Step 2: Use the first 4 bit fractional bits of `m` to look up for a
degree-7 polynomial approximation to:
```
(1 + m)^(1/3) ~ 1 + m * P(m).
```
- Step 3: Perform the multiplication:
```
2^((e % 3)/3) * (1 + m)^(1/3).
```
- Step 4: Check for exact cases to prevent rounding and clear
`FE_INEXACT` floating point exception.
- Step 5: Combine with the exponent and sign before converting down to
`float` and return.
I also fixed a comment in sinpif.cpp in the first commit. Should this be
included in this PR?
All tests were passed, including the exhaustive test.
CC: @lntue
Using the same range reduction as `sin`, `cos`, and `sincos`:
1) Reducing `x = k*pi/128 + u`, with `|u| <= pi/256`, and `u` is in
double-double.
2) Approximate `tan(u)` using degree-9 Taylor polynomial.
3) Compute
```
tan(x) ~ (sin(k*pi/128) + tan(u) * cos(k*pi/128)) / (cos(k*pi/128) - tan(u) * sin(k*pi/128))
```
using the fast double-double division algorithm in [the CORE-MATH
project](https://gitlab.inria.fr/core-math/core-math/-/blob/master/src/binary64/tan/tan.c#L1855).
4) Perform relative-error Ziv's accuracy test
5) If the accuracy tests failed, we redo the computations using 128-bit
precision `DyadicFloat`.
Fixes https://github.com/llvm/llvm-project/issues/96930
- Algorithm:
- Step 1 - Range reduction: for a double precision input `x`, return `k`
and `u` such that
- k is an integer
- u = x - k * pi / 128, and |u| < pi/256
- Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor
polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
- Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) *
sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`.
- Step 4 - Use Ziv's rounding test to decide if the result is correctly
rounded.
- Step 4' - If the Ziv's rounding test failed, redo step 1-3 using
128-bit precision.
- Currently, without FMA instructions, the large range reduction only
works correctly for the default rounding mode (FE_TONEAREST).
- Provide `LIBC_MATH` flag so that users can set `LIBC_MATH =
LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4
and 4'.