Commit Graph

72 Commits

Author SHA1 Message Date
Michael Jones
cfbcbc8f88 [libc] fix MPFR rounding problems in fuzz test
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D154150
2023-07-05 10:53:40 -07:00
Tue Ly
f320fefc4a [libc][math] Implement erff function correctly rounded to all rounding modes.
Implement correctly rounded `erff` functions.

For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.

For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
  erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```

For `x < 0`, we can use the same formula as above, since the odd part is factored out.

Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```

and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153683
2023-06-28 13:58:37 -04:00
Tue Ly
37458f6693 [libc][math] Move str method from FPBits class to testing utils.
str method of FPBits class is only used for pretty printing its objects
in tests.  It brings cpp::string dependency to FPBits class, which is not ideal
for embedded use case.  We move str method to a free function in test utils and
remove this dependency of FPBits class.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152607
2023-06-10 02:50:58 -04:00
Michael Jones
ae3b59e623 [libc] Use MPFR for strtofloat fuzzing
The previous string to float tests didn't check correctness, but due to
the atof differential test proving unreliable the strtofloat fuzz test
has been changed to use MPFR for correctness checking. Some minor bugs
have been found and fixed as well.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D150905
2023-05-22 11:04:53 -07:00
Siva Chandra Reddy
00bd8e9011 [libc] Add a str() method to FPBits which returns a string representation.
Unit tests for the str() method have also been added.

Previously, a separate test only helper function was being used by the
test matchers which has regressed over many cleanups. Moreover, being a
test only utility, it was not tested separately (and hence the
regression).

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D150906
2023-05-19 06:20:41 +00:00
Siva Chandra Reddy
dcf296b541 [libc][NFC] Remove the StreamWrapper class and use the new test logger.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D148452
2023-04-17 15:48:18 +00:00
Siva Chandra Reddy
af1315c28f [libc][NFC] Move UnitTest and IntegrationTest to the 'test' directory.
This part of the effort to make all test related pieces into the `test`
directory. This helps is excluding test related pieces in a straight
forward manner if LLVM_INCLUDE_TESTS is OFF. Future patches will also move
the MPFR wrapper and testutils into the 'test' directory.
2023-02-07 19:45:51 +00:00
Tue Ly
9b30f6b6d7 [libc][math] Implement acoshf function correctly rounded to all rounding modes.
Implement acoshf function correctly rounded to all rounding modes.

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D142781
2023-02-01 11:35:15 -05:00
Tue Ly
46b15fd19e [libc][math] Implement asinhf function correctly rounded for all rounding modes.
Implement asinhf function correctly rounded for all rounding modes.

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D142681
2023-01-27 11:12:27 -05:00
Tue Ly
a752460d73 [libc][math] Implement exp10f function correctly rounded to all rounding modes.
Implement exp10f function correctly rounded to all rounding modes.

Algorithm: perform range reduction to reduce
```
  10^x = 2^(hi + mid) * 10^lo
```
where:
```
  hi is an integer,
  0 <= mid * 2^5 < 2^5
  -log10(2) / 2^6 <= lo <= log10(2) / 2^6
```
Then `2^mid` is stored in a table of 32 entries and the product `2^hi * 2^mid` is
performed by adding `hi` into the exponent field of `2^mid`.
`10^lo` is then approximated by a degree-5 minimax polynomials generated by Sollya with:
```
  > P = fpminimax((10^x - 1)/x, 4, [|D...|], [-log10(2)/64. log10(2)/64]);
```
Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 10.215
System LIBC reciprocal throughput : 7.944

LIBC reciprocal throughput        : 38.538
LIBC reciprocal throughput        : 12.175   (with `-msse4.2` flag)
LIBC reciprocal throughput        : 9.862    (with `-mfma` flag)

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh exp10f --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 40.744
System LIBC latency : 37.546

BEFORE
LIBC latency        : 48.989
LIBC latency        : 44.486   (with `-msse4.2` flag)
LIBC latency        : 40.221   (with `-mfma` flag)
```
This patch relies on https://reviews.llvm.org/D134002

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D134104
2022-09-19 10:01:40 -04:00
Tue Ly
463dcc8749 [libc][math] Implement acosf function correctly rounded for all rounding modes.
Implement acosf function correctly rounded for all rounding modes.

We perform range reduction as follows:

- When `|x| < 2^(-10)`, we use cubic Taylor polynomial:
```
  acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 / 6.
```
- When `2^(-10) <= |x| <= 0.5`, we use the same approximation that is used for `asinf(x)` when `|x| <= 0.5`:
```
  acos(x) = pi/2 - asin(x) ~ pi/2 - x - x^3 * P(x^2).
```
- When `0.5 < x <= 1`, we use the double angle formula: `cos(2y) = 1 - 2 * sin^2 (y)` to reduce to:
```
  acos(x) = 2 * asin( sqrt( (1 - x)/2 ) )
```
- When `-1 <= x < -0.5`, we reduce to the positive case above using the formula:
```
  acos(x) = pi - acos(-x)
```

Performance benchmark using perf tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh acosf
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH reciprocal throughput   : 28.613
System LIBC reciprocal throughput : 29.204
LIBC reciprocal throughput        : 24.271

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 55.554
System LIBC latency : 76.879
LIBC latency        : 62.118
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D133550
2022-09-09 09:55:30 -04:00
Tue Ly
e2f065c2a3 [libc][math] Implement asinf function correctly rounded for all rounding modes.
Implement asinf function correctly rounded for all rounding modes.

For `|x| <= 0.5`, we approximate `asin(x)` by
```
  asin(x) = x * P(x^2)
```
where `P(X^2) = Q(X)` is a degree-20 minimax even polynomial approximating
`asin(x)/x` on `[0, 0.5]` generated by Sollya with:
```
  > Q = fpminimax(asin(x)/x, [|0, 2, 4, 6, 8, 10, 12, 14, 16, 18, 20|],
                 [|1, D...|], [0, 0.5]);
```

When `|x| > 0.5`, we perform range reduction as follow:
Assume further that `0.5 < x <= 1`, and let:
```
  y = asin(x)
```
We will use the double angle formula:
```
  cos(2X) = 1 - 2 sin^2(X)
```
and the complement angle identity:
```
  x = sin(y) = cos(pi/2 - y)
              = 1 - 2 sin^2 (pi/4 - y/2)
```
So:
```
  sin(pi/4 - y/2) = sqrt( (1 - x)/2 )
```
And hence:
```
  pi/4 - y/2 = asin( sqrt( (1 - x)/2 ) )
```
Equivalently:
```
  asin(x) = y = pi/2 - 2 * asin( sqrt( (1 - x)/2 ) )
```
Let `u = (1 - x)/2`, then
```
  asin(x) = pi/2 - 2 * asin(u)
```
Moreover, since `0.5 < x <= 1`,
```
  0 <= u < 1/4, and 0 <= sqrt(u) < 0.5.
```
And hence we can reuse the same polynomial approximation of `asin(x)` when
`|x| <= 0.5`:
```
  asin(x) = pi/2 - 2 * u * P(u^2).
```

Performance benchmark using `perf` tool from the CORE-MATH project on Ryzen 1700:
```
$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf
CORE-MATH reciprocal throughput   : 23.418
System LIBC reciprocal throughput : 27.310
LIBC reciprocal throughput        : 22.741

$ CORE_MATH_PERF_MODE="rdtsc" ./perf.sh asinf --latency
GNU libc version: 2.35
GNU libc release: stable
CORE-MATH latency   : 58.884
System LIBC latency : 62.055
LIBC latency        : 62.037
```

Reviewed By: orex, zimmermann6

Differential Revision: https://reviews.llvm.org/D133400
2022-09-07 19:27:47 -04:00
Kirill Okhotnikov
77e1d9beed [libc][math] Added atanf function.
Performance by core-math (core-math/glibc 2.31/current llvm-14):
28.879/20.843/20.15

Differential Revision: https://reviews.llvm.org/D132842
2022-08-30 22:39:54 +02:00
Kirill Okhotnikov
6c1fc7e430 [libc][math] Added atanhf function.
Performance by core-math (core-math/glibc 2.31/current llvm-14):
10.845/43.174/13.467

The review is done on top of D132809.

Differential Revision: https://reviews.llvm.org/D132811
2022-08-30 22:39:54 +02:00
Guillaume Chatelet
1e5b3ce707 [reland][NFC][libc] standardize string_view 2022-08-23 12:40:49 +00:00
Guillaume Chatelet
eebe9b2964 Revert "[reland][NFC][libc] standardize string_view"
This reverts commit df99774ef7.
2022-08-23 12:40:48 +00:00
Guillaume Chatelet
df99774ef7 [reland][NFC][libc] standardize string_view 2022-08-23 12:10:28 +00:00
Guillaume Chatelet
54cfe5f778 Revert "[reland][NFC][libc] standardize string_view"
This reverts commit 522d29a6a7.
2022-08-23 11:50:56 +00:00
Guillaume Chatelet
522d29a6a7 [reland][NFC][libc] standardize string_view 2022-08-23 11:48:53 +00:00
Guillaume Chatelet
0df7e1b0e5 Revert "[reland][NFC][libc] standardize string_view"
This reverts commit 187099da1c.
2022-08-23 11:00:22 +00:00
Guillaume Chatelet
187099da1c [reland][NFC][libc] standardize string_view 2022-08-23 10:55:57 +00:00
Guillaume Chatelet
aa59c9810a [libc][NFC] Use STL case for string_view 2022-08-22 15:25:14 +00:00
Kirill Okhotnikov
5ef987c985 [libc][math] Added tanhf function.
Correct rounding function. Performance ~2x faster than glibc analog.

Performance (llvm 12 intel):
```
CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='' ./perf.sh tanhf
GNU libc version: 2.31
GNU libc release: stable
13.279
37.492
18.145
CORE_MATH_PERF_MODE=rdtsc PERF_ARGS='--latency' ./perf.sh tanhf
GNU libc version: 2.31
GNU libc release: stable
40.658
109.582
66.568
```

Differential Revision: https://reviews.llvm.org/D130780
2022-08-01 22:43:00 +02:00
Kirill Okhotnikov
a7f55f0805 [libc][math] Added sinhf function.
Differential Revision: https://reviews.llvm.org/D129278
2022-07-29 17:20:53 +02:00
Kirill Okhotnikov
fcb9d7e2cf [libc][math] Added coshf function.
Differential Revision: https://reviews.llvm.org/D129275
2022-07-29 16:57:28 +02:00
Guillaume Chatelet
f72261508a [libc][NFC] Use STL case for type_traits
Migrating all private STL code to the standard STL case but keeping it under the CPP namespace to avoid confusion. Starting with the type_traits header.

Differential Revision: https://reviews.llvm.org/D130727
2022-07-29 09:57:03 +00:00
Alex Brachet
04c681d195 [libc] Specify rounding mode for strto[f|d] tests
The specified rounding mode will be used and restored
to what it was before the test ran.

Additionally, it moves ForceRoundingMode and RoundingMode
out of MPFRUtils to be used in more places.

Differential Revision: https://reviews.llvm.org/D129685
2022-07-13 20:20:30 +00:00
Kirill Okhotnikov
b8e8012aa2 [libc][math] fmod/fmodf implementation.
This is a implementation of find remainder fmod function from standard libm.
The underline algorithm is developed by myself, but probably it was first
invented before.
Some features of the implementation:
1. The code is written on more-or-less modern C++.
2. One general implementation for both float and double precision numbers.
3. Spitted platform/architecture dependent and independent code and tests.
4. Tests covers 100% of the code for both float and double numbers. Tests cases with NaN/Inf etc is copied from glibc.
5. The new implementation in general 2-4 times faster for “regular” x,y values. It can be 20 times faster for x/y huge value, but can also be 2 times slower for double denormalized range (according to perf tests provided).
6. Two different implementation of division loop are provided. In some platforms division can be very time consuming operation. Depend on platform it can be 3-10 times slower than multiplication.

Performance tests:

The test is based on core-math project (https://gitlab.inria.fr/core-math/core-math). By Tue Ly suggestion I took hypot function and use it as template for fmod. Preserving all test cases.

`./check.sh <--special|--worst> fmodf` passed.
`CORE_MATH_PERF_MODE=rdtsc ./perf.sh fmodf` results are

```
GNU libc version: 2.35
GNU libc release: stable
21.166 <-- FPU
51.031 <-- current glibc
37.659 <-- this fmod version.
```
2022-06-24 23:09:14 +02:00
Guillaume Chatelet
c28a522fc7 [libc][NFC] moving template specialization outside class declaration
This is necessary to get llvm-libc compile with GCC.
This patch is extracted from D119002.

Differential Revision: https://reviews.llvm.org/D119142
2022-02-08 10:35:44 +00:00
Tue Ly
9e7688c71e [libc] Implement log1pf correctly rounded to all rounding modes.
Implement log1pf correctly rounded to all rounding modes relying on logf implementation for exponent > 2^(-8).

Reviewed By: sivachandra, zimmermann6

Differential Revision: https://reviews.llvm.org/D118962
2022-02-07 16:17:18 -05:00
Tue Ly
e581841e8c [libc] Implement log10f correctly rounded for all rounding modes.
Based on RLIBM implementation similar to logf and log2f.  Most of the exceptional inputs are the exact powers of 10.

Reviewed By: sivachandra, zimmermann6, santoshn, jpl169

Differential Revision: https://reviews.llvm.org/D118093
2022-01-25 10:33:39 -05:00
Tue Ly
63d2df003e [libc] Implement correctly rounded log2f based on RLIBM library.
Implement log2f based on RLIBM library correctly rounded for all rounding modes.

Reviewed By: sivachandra, michaelrj, santoshn, jpl169, zimmermann6

Differential Revision: https://reviews.llvm.org/D115828
2022-01-14 12:40:49 -05:00
Tue Ly
c386d6eb2d [libc] Fix precision constants for long double in MPFRUtils.cpp. 2022-01-13 21:33:05 -05:00
Tue Ly
8cd81274ff [libc] Add multithreading support for exhaustive testing and MPFRUtils.
Add threading support for exhaustive testing and MPFRUtils.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D117028
2022-01-13 13:46:14 -05:00
Tue Ly
cce6507767 [libc] Add rounding mode support for MPFR testing macros.
Add an extra argument for rounding mode to EXPECT_MPFR_MATCH and ASSERT_MPFR_MATCH macros.

Reviewed By: sivachandra, michaelrj

Differential Revision: https://reviews.llvm.org/D116777
2022-01-13 13:28:50 -05:00
Siva Chandra Reddy
60509623c4 [libc][obvious] Fix style of MPFRWrapper. 2021-12-23 23:19:42 +00:00
Tue Ly
d08a801b5f [libc] Implement correctly rounded logf based on RLIBM library.
Implement correctly rounded logf based on RLIBM library: https://people.cs.rutgers.edu/~sn349/rlibm/.

Reviewed By: sivachandra, santoshn, jpl169, zimmermann6

Differential Revision: https://reviews.llvm.org/D115408
2021-12-16 13:43:15 -05:00
Michael Jones
1c92911e9e [libc] apply new lint rules
This patch applies the lint rules described in the previous patch. There
was also a significant amount of effort put into manually fixing things,
since all of the templated functions, or structs defined in /spec, were
not updated and had to be handled manually.

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D114302
2021-12-07 10:49:47 -08:00
Tue Ly
32568fc95e [libc] Fix a bug in MPFRUtils making ULP values off by 2^(-mantissaWidth).
Fix a bug in MPFRUtils making ULP values off by 2^(-mantissaWidth) and incorrect eps for denormal numbers.

Differential Revision: https://reviews.llvm.org/D114878
2021-12-02 09:07:46 -05:00
Guillaume Chatelet
0aea170b97 [libc] Add more robust compile time architecture detection
We may want to restrict the detected platforms to only `x86_64` and `aarch64`.
There are still custom detection in api.td but I don't think we can handle these:
 - config/linux/api.td:205
 - config/linux/api.td:199

Differential Revision: https://reviews.llvm.org/D112818
2021-11-02 11:00:33 +00:00
Guillaume Chatelet
fe953b15cf Revert "[libc] Add more robust compile time architecture detection"
This reverts commit a72e249986.
2021-10-29 20:25:55 +00:00
Guillaume Chatelet
a72e249986 [libc] Add more robust compile time architecture detection
We may want to restrict the detected platforms to only `x86_64` and `aarch64`.
There are still custom detection in api.td but I don't think we can handle these:
 - config/linux/api.td:205
 - config/linux/api.td:199

Differential Revision: https://reviews.llvm.org/D112818
2021-10-29 20:15:12 +00:00
Siva Chandra Reddy
6c3f53c7ba [libc][NFC] Move test related pieces from FPUtil to util/UnitTest.
Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D112673
2021-10-29 15:37:30 +00:00
Siva Chandra Reddy
f362aea42d [libc][NFC] Move utils/CPP to src/__support/CPP.
The idea is to move all pieces related to the actual libc sources to the
"src" directory. This allows downstream users to ship and build just the
"src" directory.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D112653
2021-10-28 15:50:00 +00:00
Siva Chandra Reddy
ca6b354229 [libc] Add range reduction functions based on Paine and Hanek algorithm.
These functions will be used in a future patch to implement
trigonometric functions. Unit tests have been added but to the
libc-long-running-tests suite. The unit tests long running because we
compare against MPFR computations performed at 1280 bits of precision.

Some cleanups or elimination of repeated patterns can be done as follow
up changes.

Differential Revision: https://reviews.llvm.org/D104817
2021-08-23 05:18:41 +00:00
Michael Jones
c120edc7b3 [libc][nfc] move ctype_utils and FPUtils to __support
Some ctype functions are called from other libc functions (e.g. isspace
is used in atoi). By moving ctype_utils.h to __support it becomes easier
to include just the implementations of these functions. For these
reasons the implementation for isspace was moved into
ctype_utils as well.

FPUtils was moved to simplify the build order, and to clarify which
files are a part of the actual libc.

Many files were modified to accomodate these changes, mostly changing
the #include paths.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D107600
2021-08-06 17:29:41 +00:00
Siva Chandra Reddy
dba74c6817 [libc] Make ULP error reflect the bit distance more closely.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D105334
2021-07-02 16:56:01 +00:00
Siva Chandra Reddy
d5700bb694 [libc] Calculate ulp error after rounding MPFR result to the result type.
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D104615
2021-06-23 20:29:46 +00:00
Tue Ly
4e5f8b4d8d [libc] Add implementation of expm1f.
Use expm1f(x) = exp(x) - 1 for |x| > ln(2).
For |x| <= ln(2), divide it into 3 subintervals: [-ln2, -1/8], [-1/8, 1/8], [1/8, ln2]
and use a degree-6 polynomial approximation generated by Sollya's fpminmax for each interval.
Errors < 1.5 ULPs when we use fma to evaluate the polynomials.

Differential Revision: https://reviews.llvm.org/D101134
2021-06-10 14:58:34 -04:00
Siva Chandra Reddy
861dc75906 [libc] Add x86_64 implementations of double precision cos, sin and tan.
The implementations use the x86_64 FPU instructions. These instructions
are extremely slow compared to a polynomial based software
implementation. Also, their accuracy falls drastically once the input
goes beyond 2PI. To improve both the speed and accuracy, we will be
taking the following approach going forward:
1. As a follow up to this CL, we will implement a range reduction algorithm
which will expand the accuracy to the entire double precision range.
2. After that, we will replace the HW instructions with a polynomial
implementation to improve the run time.

After step 2, the implementations will be accurate, performant and target
architecture independent.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D102384
2021-05-13 19:02:00 +00:00