This patch adds the malloc.h header, declaring Scudo's mallopt
entrypoint when built LLVM_LIBC_INCLUDE_SCUDO, as well as two
constants that can be passed to it (M_PURGE and M_PURGE_ALL).
Due to limitations of the current build system, only the declaration
of mallopt is gated by LLVM_LIBC_INCLUDE_SCUDO, and the two new
constants are defined irrespectively of it. We may need to refine
this in the future.
Note that some allocators other than Scudo may offer a mallopt
implementation too (e.g. man 3 mallopt), albeit with different
supported input values. This patch only supports the specific case of
LLVM_LIBC_INCLUDE_SCUDO.
- added all variations of ffma and fdiv
- will add all new headers into yaml for next patch
- only fsub is left then all basic operations for float is complete
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
- fadd removed because I need to add for different input types
- finishing rest of basic operations
- noticed duplicates will remove
---------
Co-authored-by: OverMighty <its.overmighty@gmail.com>
Division-less Newton iterations algorithm for cube roots.
1. **Range reduction**
For `x = (-1)^s * 2^e * (1.m)`, we get 2 reduced arguments `x_r` and `a`
as:
```
x_r = 1.m
a = (-1)^s * 2^(e % 3) * (1.m)
```
Then `cbrt(x) = x^(1/3)` can be computed as:
```
x^(1/3) = 2^(e / 3) * a^(1/3).
```
In order to avoid division, we compute `a^(-2/3)` using Newton method
and then
multiply the results by a:
```
a^(1/3) = a * a^(-2/3).
```
2. **First approximation to a^(-2/3)**
First, we use a degree-7 minimax polynomial generated by Sollya to
approximate `x_r^(-2/3)` for `1 <= x_r < 2`.
```
p = P(x_r) ~ x_r^(-2/3),
```
with relative errors bounded by:
```
| p / x_r^(-2/3) - 1 | < 1.16 * 2^-21.
```
Then we multiply with `2^(e % 3)` from a small lookup table to get:
```
x_0 = 2^(-2*(e % 3)/3) * p
~ 2^(-2*(e % 3)/3) * x_r^(-2/3)
= a^(-2/3)
```
with relative errors:
```
| x_0 / a^(-2/3) - 1 | < 1.16 * 2^-21.
```
This step is done in double precision.
3. **First Newton iteration**
We follow the method described in:
Sibidanov, A. and Zimmermann, P., "Correctly rounded cubic root
evaluation
in double precision", https://core-math.gitlabpages.inria.fr/cbrt64.pdf
to derive multiplicative Newton iterations as below:
Let `x_n` be the nth approximation to `a^(-2/3)`. Define the n^th error
as:
```
h_n = x_n^3 * a^2 - 1
```
Then:
```
a^(-2/3) = x_n / (1 + h_n)^(1/3)
= x_n * (1 - (1/3) * h_n + (2/9) * h_n^2 - (14/81) * h_n^3 + ...)
```
using the Taylor series expansion of `(1 + h_n)^(-1/3)`.
Apply to `x_0` above:
```
h_0 = x_0^3 * a^2 - 1
= a^2 * (x_0 - a^(-2/3)) * (x_0^2 + x_0 * a^(-2/3) + a^(-4/3)),
```
it's bounded by:
```
|h_0| < 4 * 3 * 1.16 * 2^-21 * 4 < 2^-17.
```
So in the first iteration step, we use:
```
x_1 = x_0 * (1 - (1/3) * h_n + (2/9) * h_n^2 - (14/81) * h_n^3)
```
Its relative error is bounded by:
```
| x_1 / a^(-2/3) - 1 | < 35/242 * |h_0|^4 < 2^-70.
```
Then we perform Ziv's rounding test and check if the answer is exact.
This step is done in double-double precision.
4. **Second Newton iteration**
If the Ziv's rounding test from the previous step fails, we define the
error
term:
```
h_1 = x_1^3 * a^2 - 1,
```
And perform another iteration:
```
x_2 = x_1 * (1 - h_1 / 3)
```
with the relative errors exceed the precision of double-double.
We then check the Ziv's accuracy test with relative errors < 2^-102 to
compensate for rounding errors.
5. **Final iteration**
If the Ziv's accuracy test from the previous step fails, we perform
another
iteration in 128-bit precision and check for exact outputs.
Fixes https://github.com/llvm/llvm-project/issues/92874
Algorithm: Let `x = (-1)^s * 2^e * (1 + m)`.
- Step 1: Range reduction: reduce the exponent with:
```
y = cbrt(x) = (-1)^s * 2^(floor(e/3)) * 2^((e % 3)/3) * (1 + m)^(1/3)
```
- Step 2: Use the first 4 bit fractional bits of `m` to look up for a
degree-7 polynomial approximation to:
```
(1 + m)^(1/3) ~ 1 + m * P(m).
```
- Step 3: Perform the multiplication:
```
2^((e % 3)/3) * (1 + m)^(1/3).
```
- Step 4: Check for exact cases to prevent rounding and clear
`FE_INEXACT` floating point exception.
- Step 5: Combine with the exponent and sign before converting down to
`float` and return.
Using the same range reduction as `sin`, `cos`, and `sincos`:
1) Reducing `x = k*pi/128 + u`, with `|u| <= pi/256`, and `u` is in
double-double.
2) Approximate `tan(u)` using degree-9 Taylor polynomial.
3) Compute
```
tan(x) ~ (sin(k*pi/128) + tan(u) * cos(k*pi/128)) / (cos(k*pi/128) - tan(u) * sin(k*pi/128))
```
using the fast double-double division algorithm in [the CORE-MATH
project](https://gitlab.inria.fr/core-math/core-math/-/blob/master/src/binary64/tan/tan.c#L1855).
4) Perform relative-error Ziv's accuracy test
5) If the accuracy tests failed, we redo the computations using 128-bit
precision `DyadicFloat`.
Fixes https://github.com/llvm/llvm-project/issues/96930
- Algorithm:
- Step 1 - Range reduction: for a double precision input `x`, return `k`
and `u` such that
- k is an integer
- u = x - k * pi / 128, and |u| < pi/256
- Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor
polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
- Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) *
sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`.
- Step 4 - Use Ziv's rounding test to decide if the result is correctly
rounded.
- Step 4' - If the Ziv's rounding test failed, redo step 1-3 using
128-bit precision.
- Currently, without FMA instructions, the large range reduction only
works correctly for the default rounding mode (FE_TONEAREST).
- Provide `LIBC_MATH` flag so that users can set `LIBC_MATH =
LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4
and 4'.
Note: our baremetal arm configuration compiles this as
`--target=arm-none-eabi`, so this code is built in -marm mode. It could be
smaller with `--target=armv7-none-eabi -mthumb`. The assembler is valid ARMv5,
or THUMB2, but not THUMB(1).
This PR implements a part of WG14 N2653:
- Define C23 char8_t
- Define C11 char16_t
- Define C11 char32_t
Missing goals are:
- The type of UTF-8 character literals is changed from unsigned char to
char8_t. (Since UTF-8 character literals already have type unsigned
char, this is not a semantic change).
- New mbrtoc8() and c8rtomb() functions declared in <uchar.h> enable
conversions between multibyte characters and UTF-8.
- A new ATOMIC_CHAR8_T_LOCK_FREE macro.
- A new atomic_char8_t typedef name.
We compute atan2f(y, x) in 2 stages:
- Fast step: perform computations in double precision , with relative
errors < 2^-50
- Accurate step: if the result from the Fast step fails Ziv's rounding
test, then we perform computations in double-double precision, with
relative errors < 2^-100.
On Ryzen 5900X, worst-case latency is ~ 200 clocks, compared to average
latency ~ 60 clocks, and average reciprocal throughput ~ 20 clocks.
Summary:
I've noticed one problem is that the user includes `stdint.h` the
compiler will do `#include_next <stdint.h>` potentially into a
conflicting implementation on systems with multiple headers installed.
The `clang` header is standards compliant and works with `clang` and
`gcc` which are both of our targets, so I simply copied it here. This
has the effect of including `stdint.h` on clang / LLVM libc behaving the
same as `-ffreestanding`.
The epoll_wait functions are syscall wrappers that were requested by
upstream users. This patch adds them, as well as their header and types.
The tests are currently incomplete since they require epoll_create to
properly test epoll_wait. That will be added in a followup patch since
this one is already very large.
The epoll_wait functions are syscall wrappers that were requested by
upstream users. This patch adds them, as well as their header and types.
The tests are currently incomplete since they require epoll_create to
properly test epoll_wait. That will be added in a followup patch since
this one is already very large.