Implements the functions `roundeven()`, `roundevenf()`, `roundevenl()`
from the roundeven family of functions introduced in C23. Also
implements `roundevenf128()`.
Re-organizes the tables that listed libc's support for math functions,
and adds two new columns to the tables indicating where the respective
function definitions and error handling methods are located in the C23
standard draft WG14-N3096.
We compute atan2f(y, x) in 2 stages:
- Fast step: perform computations in double precision , with relative
errors < 2^-50
- Accurate step: if the result from the Fast step fails Ziv's rounding
test, then we perform computations in double-double precision, with
relative errors < 2^-100.
On Ryzen 5900X, worst-case latency is ~ 200 clocks, compared to average
latency ~ 60 clocks, and average reciprocal throughput ~ 20 clocks.
Continuing #84689, this one required more changes than the others, so I
am making it a separate PR.
Extends some stuff in `str_to_float.h`, `str_to_integer.h` to work on
types wider than `unsigned long long` and `uint64_t`.
cc @lntue for review.
- Allow `FMod` template to have different computational types and make
it work for 80-bit long double.
- Switch to use `uint64_t` as the intermediate computational types for
`float`, significantly reduce the latency of `fmodf` when the exponent
difference is large.
In particular, we have internal customers that would like to use nanf
and
scalbnf.
The differences between various entrypoint files can be checked via:
$ comm -3 <(grep libc\.src path/to/entrypoints.txt | sort) \
<(grep libc\.src path/to/other/entrypoints.txt | sort)
Implements the `nexttoward`, `nexttowardf` and `nexttowardl` functions.
Also, raise excepts required by the standard in `nextafter` functions.
cc: @lntue
We compute `pow(x, y)` using the formula
```
pow(x, y) = x^y = 2^(y * log2(x))
```
We follow similar steps as in `log2f(x)` and `exp2f(x)`, by breaking
down into `hi + mid + lo` parts, in which `hi` parts are computed using
the exponent field directly, `mid` parts will use look-up tables, and
`lo` parts are approximated by polynomials.
We add some speedup for common use-cases:
```
pow(2, y) = exp2(y)
pow(10, y) = exp10(y)
pow(x, 2) = x * x
pow(x, 1/2) = sqrt(x)
pow(x, -1/2) = rsqrt(x) - to be added
```
Implementing expm1 function for double precision based on exp function
algorithm:
- Reduced x = log2(e) * (hi + mid1 + mid2) + lo, where:
* hi is an integer
* mid1 * 2^-6 is an integer
* mid2 * 2^-12 is an integer
* |lo| < 2^-13 + 2^-30
- Then exp(x) - 1 = 2^hi * 2^mid1 * 2^mid2 * exp(lo) - 1 ~ 2^hi *
(2^mid1 * 2^mid2 * (1 + lo * P(lo)) - 2^(-hi) )
- We evaluate fast pass with P(lo) is a degree-3 Taylor polynomial of
(e^lo - 1) / lo in double precision
- If the Ziv accuracy test fails, we use degree-6 Taylor polynomial of
(e^lo - 1) / lo in double double precision
- If the Ziv accuracy test still fails, we re-evaluate everything in
128-bit precision.
Implement double precision exp2 function correctly rounded for all
rounding modes. Using the same algorithm as double precision exp function in
https://reviews.llvm.org/D158551.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D158812
Implement double precision exp function correctly rounded for all
rounding modes. Using 4 stages:
- Range reduction: reduce to `exp(x) = 2^hi * 2^mid1 * 2^mid2 * exp(lo)`.
- Use 64 + 64 LUT for 2^mid1 and 2^mid2, and use cubic Taylor polynomial to
approximate `(exp(lo) - 1) / lo` in double precision. Relative error in this
step is bounded by 1.5 * 2^-63.
- If the rounding test fails, use degree-6 Taylor polynomial to approximate
`exp(lo)` in double-double precision. Relative error in this step is bounded by
2^-99.
- If the rounding test still fails, use degree-7 Taylor polynomial to compute
`exp(lo)` in ~128-bit precision.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D158551
Simplify the range reduction steps by choosing the reduction constants
carefully so that the reduced arguments v = r*m_x - 1 and v^2 are exact in double
precision, even without FMA instructions, and -2^-8 <= v < 2^-7. This allows the
polynomial evaluations to be parallelized more efficiently.
Reviewed By: zimmermann6
Differential Revision: https://reviews.llvm.org/D147676