Commit Graph

91 Commits

Author SHA1 Message Date
Tue Ly
484319f497 [libc] Make expm1f correctly rounded when the targets have no FMA instructions.
Add another exceptional value and fix the case when |x| is small.

Performance tests with CORE-MATH project scripts:
With FMA instructions on Ryzen 1700:
```
$ ./perf.sh expm1f
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH reciprocal throughput   : 15.362
System LIBC reciprocal throughput : 53.194
LIBC reciprocal throughput        : 14.595
$ ./perf.sh expm1f --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH latency   : 57.755
System LIBC latency : 147.020
LIBC latency        : 60.269
```
Without FMA instructions:
```
$ ./perf.sh expm1f
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH reciprocal throughput   : 15.362
System LIBC reciprocal throughput : 53.300
LIBC reciprocal throughput        : 18.020
$ ./perf.sh expm1f --latency
LIBC-location: /home/lnt/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
CORE-MATH latency   : 57.758
System LIBC latency : 147.025
LIBC latency        : 70.304
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D123440
2022-06-03 15:57:48 -04:00
Tue Ly
614567a7bf [libc] Automatically add -mfma flag for architectures supporting FMA.
Detect if the architecture supports FMA instructions and if
the targets depend on fma.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D123615
2022-06-03 01:21:20 -04:00
Michael Jones
1170951c73 [libc] add uint128 implementation
Some platforms don't support proper 128 bit integers, but some
algorithms use them, such as any that use long doubles. This patch
modifies the existing UInt class to support the necessary operators.
This does not put this new class into use, that will be in followup
patches.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D124959
2022-05-12 11:16:53 -07:00
Tue Ly
c5f8a0a1e9 [libc] Add support for x86-64 targets that do not have FMA instructions.
Make FMA flag checks more accurate for x86-64 targets, and refactor
polyeval to use multiply and add instead when FMA instructions are not
available.

Reviewed By: michaelrj, sivachandra

Differential Revision: https://reviews.llvm.org/D123335
2022-04-08 14:12:24 -04:00
Tue Ly
a5466f0436 [libc] Improve the performance of expm1f.
Improve the performance of expm1f:
- Rearrange the selection logic for different cases to improve the overall
throughput.
- Use the same degree-4 polynomial for large inputs as `expf`
(https://reviews.llvm.org/D122418), reduced from a degree-7 polynomial.

Performance benchmark using perf tool from CORE-MATH project
(https://gitlab.inria.fr/core-math/core-math/-/tree/master):
Before this patch:
```
$ ./perf.sh expm1f

CORE-MATH reciprocal throughput   : 15.362
System LIBC reciprocal throughput : 53.288
LIBC reciprocal throughput        : 54.572

$ ./perf.sh expm1f --latency

CORE-MATH latency   : 57.759
System LIBC latency : 147.146
LIBC latency        : 118.057
```

After this patch:
```
$ ./perf.sh expm1f

CORE-MATH reciprocal throughput   : 15.359
System LIBC reciprocal throughput : 53.188
LIBC reciprocal throughput        : 14.600

$ ./perf.sh expm1f --latency

CORE-MATH latency   : 57.774
System LIBC latency : 147.119
LIBC latency        : 60.280

```

Reviewed By: michaelrj, santoshn, zimmermann6

Differential Revision: https://reviews.llvm.org/D122538
2022-03-30 19:23:25 -04:00
Michael Jones
9276074271 [libc][obvious] Add mfma to log2f
In the previous patch adding -mfma to functions that need it for windows
builds I missed log2f.

Differential Revision: https://reviews.llvm.org/D122693
2022-03-29 16:34:52 -07:00
Michael Jones
2f8829aba3 [libc] Add mfma option to functions that use fma
On Windows the functions that use fma don't properly include the fma
intrinsics unless -mfma is added to the compile options. This patch adds
the compile option to all of the functions that need it.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D122689
2022-03-29 16:23:36 -07:00
Tue Ly
6168b42225 [libc] Improve the performance of expf.
Reduce the polynomial's degree from 7 down to 4.

Currently we use a degree-7 minimax polynomial on an interval of length 2^-7
around 0 to compute `expf`. Based on the suggestion of @santoshn and the RLIBM
project (https://github.com/rutgers-apl/rlibm-all/blob/main/source/float/exp.c)
and the improvement we made with `exp2f` in https://reviews.llvm.org/D122346,
it is possible to have a good polynomial of degree-4 on a subinterval of length
2^(-7) to approximate e^x.

We did try to either reduce the degree of the polynomial down to 3 or increase
the interval size to 2^(-6), but in both cases the number of exceptional values
exploded. So we settle with using a degree-4 polynomial of the interval of
size 2^(-7) around 0.

Reviewed By: sivachandra, zimmermann6, santoshn

Differential Revision: https://reviews.llvm.org/D122418
2022-03-25 12:20:20 -04:00
Tue Ly
b9d87d7466 [libc] Improve the performance of exp2f.
Reduce the range-reduction table size from 128 entries down to 64 entries, and
reduce the polynomial's degree from 6 down to 4.

Currently we use a degree-6 minimax polynomial on an interval of length 2^-7
around 0 to compute exp2f.  Based on the suggestion of @santoshn and the RLIBM
project (https://github.com/rutgers-apl/rlibm-prog/blob/main/libm/float/exp2.c)
it is possible to have a good polynomial of degree-4 on a subinterval of length
2^(-6) to approximate 2^x.

We did try to either reduce the degree of the polynomial down to 3 or increase
the interval size to 2^(-5), but in both cases the number of exceptional values
exploded.  So we settle with using a degree-4 polynomial of the interval of
size 2^(-6) around 0.

Reviewed By: michaelrj, sivachandra, zimmermann6, santoshn

Differential Revision: https://reviews.llvm.org/D122346
2022-03-24 18:06:37 -04:00
Tue Ly
64af346b18 [libc] Implement expm1f function that is correctly rounded for all rounding modes.
Implement expm1f function that is correctly rounded for all rounding modes.  This is based on expf implementation.

From exhaustive testings, using expf implementation, and subtract 1.0 before rounding the final result to single precision
gives correctly rounded results for all |x| > 2^-4 with 1 exception.  When |x| < 2^-25, we use x + x^2 (implemented with a
single fma).  And for 2^-25 <= |x| <= 2^-4, we use a single degree-8 minimax polynomial generated by Sollya.

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D121574
2022-03-15 10:24:56 -04:00
Tue Ly
58edd26255 [libc] Include -150 to the special cases at the beginning of exp2f function. 2022-03-14 10:06:27 -04:00
Tue Ly
64721a3312 [libc] Implement exp2f function that is correctly rounded for all rounding modes.
Implement exp2f function that is correctly rounded for all rounding modes.

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D121463
2022-03-14 09:42:37 -04:00
Tue Ly
38cadd90b7 [libc] Implement expf function that is correctly rounded for all rounding modes.
Implement expf function that is correctly rounded for all rounding modes.

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D121440
2022-03-11 07:16:47 -05:00
Tue Ly
76ec69a911 [libc] Remove the redundant header FPUtil/FEnvUtils.h
Remove the redundant header FPUtil/FEnvUtils.h, use FPUtil/FEnvImpl.h header instead.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D120965
2022-03-04 14:09:47 -05:00
Siva Chandra Reddy
dd33f9cdef [libc] Make the errno macro resolve to the thread local variable directly.
With modern architectures having a thread pointer and language supporting
thread locals, there is no reason to use a function intermediary to access
the thread local errno value.

The entrypoint corresponding to errno has been replaced with an object
library as there is no formal entrypoint for errno anymore.

Reviewed By: jeffbailey, michaelrj

Differential Revision: https://reviews.llvm.org/D120920
2022-03-04 17:29:49 +00:00
Alex Brachet
64f5f6d759 [libc] Use '+' constraint on inline assembly
As suggested by @mcgrathr in D118099

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D119978
2022-02-17 03:00:17 +00:00
Tue Ly
f1ec99f973 [libc] Improve hypotf performance with different algorithm correctly rounded to all rounding modes.
Algorithm for hypotf: compute (a*a + b*b) in double precision, then use Dekker's algorithm to find the rounding error, and then correcting it after taking its square-root.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D118157
2022-02-16 09:48:51 -05:00
Guillaume Chatelet
7e7ecef980 [libc] Replace type punning with bit_cast
Although type punning is defined for union in C, it is UB in C++.
This patch introduces a bit_cast function to convert between types in a safe way.

This is necessary to get llvm-libc compile with GCC.
This patch is extracted from D119002.

Differential Revision: https://reviews.llvm.org/D119145
2022-02-08 20:45:59 +00:00
Tue Ly
e5e93f60ee [libc] Return a float NaN for log1pf instead of double NaN. 2022-02-07 21:07:09 -05:00
Tue Ly
9e7688c71e [libc] Implement log1pf correctly rounded to all rounding modes.
Implement log1pf correctly rounded to all rounding modes relying on logf implementation for exponent > 2^(-8).

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D118962
2022-02-07 16:17:18 -05:00
Tue Ly
700aebaf74 [libc] Set default CXX_STANDARD to C++17 and let targets set their own standard if needed.
CMAKE_CXX_STANDARD 14 is set in the llvm-project/llvm folder overriding all COMPILE_OPTIONS -std=c++17.  We need to override the CXX_STANDARD property of the target in order to set the correct C++ standard flags.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D118871
2022-02-04 09:59:21 -05:00
Tue Ly
ad4ee2d778 [libc] Refactor sqrt implementations and add tests for generic sqrt implementations.
Re-apply https://reviews.llvm.org/D118173 with fix for aarch64.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D118433
2022-01-28 13:39:03 -05:00
Siva Chandra Reddy
4beba3a32a [libc] Revert "Refactor sqrt implementations and add tests for generic sqrt implementations."
This reverts commit 21c4c82c20.
2022-01-27 21:06:14 +00:00
Tue Ly
21c4c82c20 [libc] Refactor sqrt implementations and add tests for generic sqrt implementations.
Refactor sqrt implementations:
- Move architecture specific instructions from `src/math/<arch>` to `src/__support/FPUtil/<arch>` folder.
- Move generic implementation of `sqrt` to `src/__support/FPUtil/generic` folder and add it as a header library.
- Use `src/__support/FPUtil/sqrt.h` for architecture/generic selections.
- Add unit tests for generic implementation of `sqrt`.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D118173
2022-01-27 11:54:54 -05:00
Tue Ly
82df72cc67 [libc] Make logf function correctly rounded for all rounding modes.
Make logf function correctly rounded for all rounding modes.

Reviewed By: sivachandra, zimmermann6, santoshn, jpl169

Differential Revision: https://reviews.llvm.org/D118149
2022-01-25 15:22:21 -05:00
Alex Brachet
ce368e1aa5 [libc][NFC] Workaround clang assertion in inline asm
The clobber list "cc" is added to inline assembly to workaround a clang assertion that triggers when building with a clang built with assertions enabled. See bug [53391](https://github.com/llvm/llvm-project/issues/53391).

See https://godbolt.org/z/z3bc6a9PM showing functionally same output assembly.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D118099
2022-01-25 16:39:55 +00:00
Tue Ly
e581841e8c [libc] Implement log10f correctly rounded for all rounding modes.
Based on RLIBM implementation similar to logf and log2f.  Most of the exceptional inputs are the exact powers of 10.

Reviewed By: sivachandra, zimmermann6, santoshn, jpl169

Differential Revision: https://reviews.llvm.org/D118093
2022-01-25 10:33:39 -05:00
Tue Ly
1f3f90ab88 [libc] Make log2f correctly rounded for all rounding modes when FMA is not available.
Add to log2f 2 more exceptional cases got when not using fma for polyeval.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D117812
2022-01-20 16:16:11 -05:00
Tue Ly
d4baf3b132 [libc] Use get_round() instead of floating point tricks in generic hypot implementation.
The floating point tricks used to get rounding mode require -frounding-math flag, which behaves differently on aarch64.  Reverting back to use get_round instead.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D117824
2022-01-20 14:54:57 -05:00
Tue Ly
aad04534c4 [libc] Implement correct rounding with all rounding modes for hypot functions.
Update the rounding logic for generic hypot function so that it will round correctly with all rounding modes.

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D117590
2022-01-20 13:33:20 -05:00
Siva Chandra Reddy
75d2fcb03f [libc] Add a naming rule for global constants.
Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D117645
2022-01-19 22:11:16 +00:00
Siva Chandra Reddy
d7c8d51f94 [libc][Obvious] Add -Wno-c++17-extensions to sinf, cosf and sincosf targets. 2022-01-19 06:22:17 +00:00
Tue Ly
b0cd3abf03 [libc] Remove as_double usage as constant initializations in sincosf implementation.
Use hexadecimal floats with C++17 instead of as_double as floating point constant initializations.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D117628
2022-01-18 23:48:48 -05:00
Tue Ly
63d2df003e [libc] Implement correctly rounded log2f based on RLIBM library.
Implement log2f based on RLIBM library correctly rounded for all rounding modes.

Reviewed By: sivachandra, michaelrj, santoshn, jpl169, zimmermann6

Differential Revision: https://reviews.llvm.org/D115828
2022-01-14 12:40:49 -05:00
Tue Ly
e11e973e68 [libc] Update exhaustive testing documentations. 2022-01-14 11:10:05 -05:00
Michael Jones
3e52096809 [libc][NFC] fix variable name
A variable was named in a way that doesn't match the format. This patch
renames it to match the format.

Differential Revision: https://reviews.llvm.org/D116228
2021-12-23 10:42:30 -08:00
Tue Ly
9369aa1444 [libc][Obvious] Change func_ to <func>_ in add_math_function.md. 2021-12-17 13:32:51 -05:00
Tue Ly
d08a801b5f [libc] Implement correctly rounded logf based on RLIBM library.
Implement correctly rounded logf based on RLIBM library: https://people.cs.rutgers.edu/~sn349/rlibm/.

Reviewed By: sivachandra, santoshn, jpl169, zimmermann6

Differential Revision: https://reviews.llvm.org/D115408
2021-12-16 13:43:15 -05:00
Tue Ly
a2b3e6bed8 [libc] Add documentation about how to add a math function to LLVM-libc.
Add documentation about how to add a math function to LLVM-libc.

Differential Revision: https://reviews.llvm.org/D115608
2021-12-16 12:12:21 -05:00
Tue Ly
08aa40b9e6 [libc] Add ADD_FMA_FLAG macro to add -mfma flag to functions that requires it.
Add ADD_FMA_FLAG macro to add -mfma flag to functions that requires it.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D115572
2021-12-11 16:21:33 -05:00
Michael Jones
1c92911e9e [libc] apply new lint rules
This patch applies the lint rules described in the previous patch. There
was also a significant amount of effort put into manually fixing things,
since all of the templated functions, or structs defined in /spec, were
not updated and had to be handled manually.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D114302
2021-12-07 10:49:47 -08:00
Guillaume Chatelet
cca8e1e415 [libc][NFC] Fix typo in CMakeLists documentation 2021-12-03 13:52:09 +01:00
Michael Jones
155f5a6dac [libc][clang-tidy] fix namespace check for externals
Up until now, all references to `errno` were marked with `NOLINT`, since
it was technically calling an external function. This fixes the lint
rules so that `errno`, as well as `malloc`, `calloc`, `realloc`, and
`free` are all allowed to be called as external functions. All of the
relevant `NOLINT` comments have been removed, and the documentation has
been updated.

Reviewed By: sivachandra, lntue, aaron.ballman

Differential Revision: https://reviews.llvm.org/D113946
2021-11-30 11:44:24 -08:00
Siva Chandra Reddy
f362aea42d [libc][NFC] Move utils/CPP to src/__support/CPP.
The idea is to move all pieces related to the actual libc sources to the
"src" directory. This allows downstream users to ship and build just the
"src" directory.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D112653
2021-10-28 15:50:00 +00:00
Siva Chandra Reddy
ca6b354229 [libc] Add range reduction functions based on Paine and Hanek algorithm.
These functions will be used in a future patch to implement
trigonometric functions. Unit tests have been added but to the
libc-long-running-tests suite. The unit tests long running because we
compare against MPFR computations performed at 1280 bits of precision.

Some cleanups or elimination of repeated patterns can be done as follow
up changes.

Differential Revision: https://reviews.llvm.org/D104817
2021-08-23 05:18:41 +00:00
Michael Jones
c120edc7b3 [libc][nfc] move ctype_utils and FPUtils to __support
Some ctype functions are called from other libc functions (e.g. isspace
is used in atoi). By moving ctype_utils.h to __support it becomes easier
to include just the implementations of these functions. For these
reasons the implementation for isspace was moved into
ctype_utils as well.

FPUtils was moved to simplify the build order, and to clarify which
files are a part of the actual libc.

Many files were modified to accomodate these changes, mostly changing
the #include paths.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D107600
2021-08-06 17:29:41 +00:00
Siva Chandra Reddy
a58b2827fe [libc] Add hardware implementations of x86_64 sqrt functions. 2021-06-14 21:25:37 +00:00
Tue Ly
4e5f8b4d8d [libc] Add implementation of expm1f.
Use expm1f(x) = exp(x) - 1 for |x| > ln(2).
For |x| <= ln(2), divide it into 3 subintervals: [-ln2, -1/8], [-1/8, 1/8], [1/8, ln2]
and use a degree-6 polynomial approximation generated by Sollya's fpminmax for each interval.
Errors < 1.5 ULPs when we use fma to evaluate the polynomials.

Differential Revision: https://reviews.llvm.org/D101134
2021-06-10 14:58:34 -04:00
Siva Chandra Reddy
7deb5ef44f [libc][NFC] Instead of erroring, skip math targets with missing implementations.
Fixes Aarch64 bot.
2021-05-13 19:22:11 +00:00
Siva Chandra Reddy
861dc75906 [libc] Add x86_64 implementations of double precision cos, sin and tan.
The implementations use the x86_64 FPU instructions. These instructions
are extremely slow compared to a polynomial based software
implementation. Also, their accuracy falls drastically once the input
goes beyond 2PI. To improve both the speed and accuracy, we will be
taking the following approach going forward:
1. As a follow up to this CL, we will implement a range reduction algorithm
which will expand the accuracy to the entire double precision range.
2. After that, we will replace the HW instructions with a polynomial
implementation to improve the run time.

After step 2, the implementations will be accurate, performant and target
architecture independent.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D102384
2021-05-13 19:02:00 +00:00