Commit Graph

1884 Commits

Author SHA1 Message Date
Joseph Huber
8aad5012cc [libc][Docs] Add support for the printing functions 2023-06-06 14:33:08 -05:00
Joseph Huber
27a80fc946 [libc] Replace use of asm in the GPU code with LIBC_INLINE_ASM
We should more consistently use inline assembly using the LIBC wrappers.
It's much safer to mark all of these volatile as well.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D152294
2023-06-06 14:24:52 -05:00
Tue Ly
b95ed8b6d9 [libc] Remove operator T from cpp::expected.
The libc's equivalent of std::expected has a non-standard and
non-explicit operator T - https://github.com/llvm/llvm-project/issues/62738

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D152270
2023-06-06 13:57:44 -04:00
Aiden Grossman
14a06b806e [CMake][libc] Don't put archive in build/lib/<target triple> by default
ea8f4b9841 broke some build configurations
because it was enabled by default and some people are using a just built
libc/clang/LLVM to work on other projects where having a just built LLVM
libc in one of Clang's default include directories can make things
unusable.

Differential Revision: https://reviews.llvm.org/D152190
2023-06-06 00:43:11 +00:00
Joseph Huber
e6a350df10 [libc] Replace the PRINT_TO_STDERR opcode for RPC printing.
A previous patch added general support for printing via the RPC
interface. we should consolidate this functionality and get rid of the
old opcode that was used for simple testing.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D152211
2023-06-05 19:28:30 -05:00
Joseph Huber
a59e1712fa [libc][obvious] Fix conditional when CUDA is not found
If CUDA is not found this string will expand into nothing. We need to
surround it with a string otherwise it will cause build failures.

Differential Revision: https://reviews.llvm.org/D152209
2023-06-05 18:51:23 -05:00
Joseph Huber
e6c401b5e8 [libc] Add initial support for 'puts' and 'fputs' to the GPU
This patch adds the initial support required to support basic priting in
`stdio.h` via `puts` and `fputs`. This is done using the existing LLVM C
library `File` API. In this sense we can think of the RPC interface as
our system call to dump the character string to the file. We carry a
`uintptr_t` reference as our native "file descriptor" as it will be used
as an opaque reference to the host's version once functions like
`fopen` are supported.

For some unknown reason the declaration of the `StdIn` variable causes
both the AMDGPU and NVPTX backends to crash if I use the `READ` flag.
This is not used currently as we only support output now, but it needs
to be fixed

Reviewed By: sivachandra, lntue

Differential Revision: https://reviews.llvm.org/D151282
2023-06-05 17:56:55 -05:00
Joseph Huber
a621308881 [libc] Implement basic malloc and free support on the GPU
This patch adds support for the `malloc` and `free` functions. These
currently aren't implemented in-tree so we first add the interface
filies.

This patch provides the most basic support for a true `malloc` and
`free` by using the RPC interface. This is functional, but in the future
we will want to implement a more intelligent system and primarily use
the RPC interface more as a `brk()` or `sbrk()` interface only called
when absolutely necessary. We will need to design an intelligent
allocator in the future.

The semantics of these memory allocations will need to be checked. I am
somewhat iffy on the details. I've heard that HSA can allocate
asynchronously which seems to work with my tests at least. CUDA uses an
implicit synchronization scheme so we need to use an explicitly separate
stream from the one launching the kernel or the default stream. I will
need to test the NVPTX case.

I would appreciate if anyone more experienced with the implementation details
here could chime in for the HSA and CUDA cases.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151735
2023-06-05 17:56:53 -05:00
Guillaume Chatelet
e49a608511 Revert D148717 "[libc] Improve memcmp latency and codegen"
This reverts commit 9ec6ebd3ce.

The patch broke RISCV and aarch64 builtbots.
2023-06-05 09:50:30 +00:00
Guillaume Chatelet
9ec6ebd3ce [libc] Improve memcmp latency and codegen
This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
2023-06-05 09:46:05 +00:00
Aiden Grossman
ea8f4b9841 [libc][CMake] Place archives in build/lib/<target-triple>
This patch moves the location of libllvmlibc.a within the build tree to
within ./lib/<target triple>. This more closely matches the behavior of
other runtime builds and allows for clang in the same build tree to
automatically be able to link against llvmlibc since this path is by
default included by the driver.

Also removes the LIBC_BINARY_DIR CMake flag since it isn't used anywhere
in the tree (based on a quick grep).

Reviewed By: phosek

Differential Revision: https://reviews.llvm.org/D151624
2023-06-03 22:40:03 +00:00
Tue Ly
5a4e344bd9 [libc][NFC] Add LIBC_INLINE and attribute.h header includes to targets' FMA.h.
Targets' FMA.h headers are missing LIBC_INLINE and attributes.h header.

Reviewed By: brooksmoses

Differential Revision: https://reviews.llvm.org/D152024
2023-06-02 21:15:58 -04:00
Joseph Huber
48bb7bb868 [libc] Disable the string_to_float test on NVPTX
This test began failing after recent changes. Disable it for now.

Differential Revision: https://reviews.llvm.org/D152032
2023-06-02 15:56:39 -05:00
Joseph Huber
cfde5f2d89 [libc] Implement 'errno' on the GPU as a global integer internally
The C standard asserts that the `errno` value is an l-value thread local
integer. We cannot provide a generic thread local integer on the GPU
currently without some workarounds. Previously, we worked around this by
implementing the `errno` value as a special consumer class that made all
the writes disappear. However, this is problematic for internal tests.
Currently there are build failures because of this handling and it's
only likely to cause more problems the more we do this.

This patch instead makes the internal target used for testing export the
`errno` value as a simple global integer. This allows us to use and test
the `errno` interface correctly assuming we run with a single thread.
Because this is only used for the non-exported target we still do not
provide this feature in the version that users will use so we do not
need to worrk about it being incorrect in general.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D152015
2023-06-02 14:16:24 -05:00
Tue Ly
f9753ef189 [libc][Obvious] Fix a typo in setting FMA control option for RISCV64. 2023-06-02 11:15:29 -04:00
Michael Jones
722832e6d7 [libc] Add strtoint32 and strtoint64 tests
There were regressions in the testing framework due to none of the
functioning buildbots having a 32 bit long. This allowed the 32 bit
version of the strtointeger function to go untested. This patch adds
tests for strtoint32 and strtoint64, which are internal testing
functions that use constant integer sizes. It also fixes the tests to
properly handle these situations.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151935
2023-06-01 15:06:46 -07:00
Guillaume Chatelet
2697ffd039 [libc] Reduce math tests runtime further
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151875
2023-06-01 12:43:18 +00:00
Guillaume Chatelet
ae5c472410 [libc] Reduce math tests runtime
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151798
2023-06-01 05:01:56 +00:00
Tue Ly
cfc5c6cb8d [libc][docs] Update implementation status table for Date and Time Functions.
Update implementation status table for Date and Time Functions to include different targets.

Reviewed By: jeffbailey

Differential Revision: https://reviews.llvm.org/D151809
2023-05-31 15:09:06 -04:00
Guillaume Chatelet
c76a3e795e [libc][NFC] Fixing various typos 2023-05-31 12:11:09 +00:00
Tue Ly
e557b8a142 [libc][RISCV] Add log, log2, log1p, log10 for RISC-V64 entrypoints.
Add log, log2, log1p, log10 RISCV64 entrypoints.

Reviewed By: michaelrj, sivachandra

Differential Revision: https://reviews.llvm.org/D151674
2023-05-30 14:18:19 -04:00
Joseph Huber
1ef0bafc4f [libc][NFC] Move the Linux file implementation to a subdirectory
This patch simply moves the special handling for `linux` files to a
subdirectory. This is done to make it easier in the future to extend
this support to targets (like the GPU) that will have different
dependencies.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151231
2023-05-30 06:49:21 -05:00
Mark de Wever
cbaa3597aa Reland "[CMake] Bumps minimum version to 3.20.0.
This reverts commit d763c6e5e2.

Adds the patch by @hans from
https://github.com/llvm/llvm-project/issues/62719
This patch fixes the Windows build.

d763c6e5e2 reverted the reviews

D144509 [CMake] Bumps minimum version to 3.20.0.

This partly undoes D137724.

This change has been discussed on discourse
https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193

Note this does not remove work-arounds for older CMake versions, that
will be done in followup patches.

D150532 [OpenMP] Compile assembly files as ASM, not C

Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent)
when compiling a file which has been set as having the language
C. This behaviour change only takes place if "cmake_minimum_required"
is set to 3.20 or newer, or if the policy CMP0119 is set to new.

Attempting to compile assembly files with "-x c" fails, however
this is workarounded in many cases, as OpenMP overrides this with
"-x assembler-with-cpp", however this is only added for non-Windows
targets.

Thus, after increasing cmake_minimum_required to 3.20, this breaks
compiling the GNU assembly for Windows targets; the GNU assembly is
used for ARM and AArch64 Windows targets when building with Clang.
This patch unbreaks that.

D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump

The build uses other mechanism to select the runtime.

Fixes #62719

Reviewed By: #libc, Mordante

Differential Revision: https://reviews.llvm.org/D151344
2023-05-27 12:51:21 +02:00
Krasimir Georgiev
07f49bf475 [libc] Adapt includes after 25174976e1 2023-05-26 14:25:50 +00:00
Siva Chandra Reddy
4f1fe19df3 [libc] Make ErrnoSetterMatcher handle logging floating point values.
Along the way, couple of additional things have been done:

1. Move `ErrnoSetterMatcher.h` to `test/UnitTest` as all other matchers live
   there now.
2. `ErrnoSetterMatcher` ignores matching `errno` on GPUs.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D151129
2023-05-26 06:24:51 +00:00
Joseph Huber
4311246a3a Revert "[libc] Enable hermetic floating point tests"
This passed locally but unfortauntely it seems some tests are not ready
to be made hermetic. Revert for now until we can investigate
specifically which tests are failing and mark those as `UNIT_TEST_ONLY`.

This reverts commit 417ea79e79.
2023-05-25 19:15:02 -05:00
Joseph Huber
417ea79e79 [libc] Enable hermetic floating point tests
This patch enables us to run the floating point tests as hermetic.
Importantly we now use the internal versions of the `fesetround` and
`fegetround` functions.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151123
2023-05-25 19:08:44 -05:00
Tue Ly
0aa7ea4e22 [libc][darwin] Add OSUtil for darwin arm64 target so that unit tests can be run.
Currently unit tests cannot be run on macOS due to missing OSUtil.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151377
2023-05-25 19:25:24 -04:00
Tue Ly
0bda541829 [libc][doc] Update math function status page to show more targets.
Show availability of math functions on each target.

Reviewed By: jeffbailey

Differential Revision: https://reviews.llvm.org/D151489
2023-05-25 19:24:33 -04:00
Roland McGrath
65c78933ae [libc] Support LIBC_COPT_USE_C_ASSERT build flag
In this mode, LIBC_ASSERT is just standard C assert.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D151498
2023-05-25 14:10:33 -07:00
Roland McGrath
1d4e8f0ea6 [libc] Fix compilation issues in memory_check_utils.h
Strict warnings require explicit static_cast to counteract
default widening of types narrower than int.

Functions in header files should have vague linkage (inline
keyword), not internal linkage (static) or external linkage
(no inline keyword) even for template functions.  Note these
don't use the LIBC_INLINE macro since this is only for test code.

Reviewed By: abrachet

Differential Revision: https://reviews.llvm.org/D151494
2023-05-25 14:09:14 -07:00
Siva Chandra Reddy
daeee56798 [libc] Add macro LIBC_THREAD_LOCAL.
It resolves to thread_local on all platform except for the GPUs on which
it resolves to nothing. The use of thread_local in the source code has been
replaced with the new macro.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D151486
2023-05-25 19:53:52 +00:00
Guillaume Chatelet
298843cd66 [libc][test] Drastically reduce mem test runtime
Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151450
2023-05-25 15:05:21 +00:00
Tobias Hieta
f98ee40f4b [NFC][Py Reformat] Reformat python files in the rest of the dirs
This is an ongoing series of commits that are reformatting our
Python code. This catches the last of the python files to
reformat. Since they where so few I bunched them together.

Reformatting is done with `black`.

If you end up having problems merging this commit because you
have made changes to a python file, the best way to handle that
is to run git checkout --ours <yourfile> and then reformat it
with black.

If you run into any problems, post to discourse about it and
we will try to help.

RFC Thread below:

https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style

Reviewed By: jhenderson, #libc, Mordante, sivachandra

Differential Revision: https://reviews.llvm.org/D150784
2023-05-25 11:17:05 +02:00
Siva Chandra Reddy
99a493e3e3 [libc] Make hermetic test depend on the unit test if it exists.
We want to do this so that build system like ninja don't end up running
the hermetic and unit tests in parallel. Running in parallel can cause
problems for tests which read/write disk files as the hermetic and unit
tests can end up stepping on each other.

Reviewed By: jhuber6

Differential Revision: https://reviews.llvm.org/D151291
2023-05-25 06:27:00 +00:00
Siva Chandra Reddy
25174976e1 [libc] Rearrange error and signal tables.
This is largely a cosmetic change done with a few goals:
1. Reduce the conditionals in picking the correct set of tables for the
   platform.
2. Avoid exposing, for example Linux errors, when building for non-Linux
   platforms. This also prevents build failures when Linux errors are not
   defined on the target non-Linux platform.
3. Some "_table" suffixes have been removed to avoid repeated
   occurance of "table" like "tables/linux_error_table.h".

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151367
2023-05-25 05:51:32 +00:00
Guillaume Chatelet
bb4f88f9b9 [libc] simplify test for getrandom
`getrandom` is implemented as a syscall.
We don't want to test linux implementation of the syscall. We just want to verify that it reacts as expected to sensible values.

Runtime before
```
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (took 83 ms)
[ RUN      ] LlvmLibcGetRandomTest.PiEstimation
[       OK ] LlvmLibcGetRandomTest.PiEstimation (took 9882 ms)
```

Runtime after
```
[ RUN      ] LlvmLibcGetRandomTest.InvalidFlag
[       OK ] LlvmLibcGetRandomTest.InvalidFlag (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.InvalidBuffer
[       OK ] LlvmLibcGetRandomTest.InvalidBuffer (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.ReturnsSize
[       OK ] LlvmLibcGetRandomTest.ReturnsSize (took 0 ms)
[ RUN      ] LlvmLibcGetRandomTest.CheckValue
[       OK ] LlvmLibcGetRandomTest.CheckValue (took 0 ms)
```

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D151336
2023-05-24 15:07:51 +00:00
Tue Ly
8c6b83dcfd [libc] Reduce the sizes of some math tests that take longest time.
Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D151256
2023-05-24 08:55:51 -04:00
Tue Ly
a2ac3678cd [libc][bazel] Add log, log2, log10, log1p to bazel layout.
Add log, log2, log10, log1p and their unit tests to bazel layout.

Reviewed By: gchatelet

Differential Revision: https://reviews.llvm.org/D151252
2023-05-24 07:43:58 -04:00
Tue Ly
7cbcc581a5 [libc] Change UInt integer conversion operators to use standard types.
This fixes an issue with missing `unsigned long` conversion on macOS.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D151234
2023-05-23 14:12:46 -04:00
Joseph Huber
99c9515b37 [libc][obvious] Correctly hoist mask out of the loop
Summry:
This was accidentally dropped from a previous patch following a rebase.
Fix it to where it's consistent.

Differential Revision: https://reviews.llvm.org/D151232
2023-05-23 12:21:10 -05:00
Joseph Huber
e826762a08 [libc] More efficiently send bytes via send_n and recv_n
Currently we have the `send_n` and `recv_n` routines to stream data,
such as a string to print, to the other side. The first operation is to
send the size so the other side knows the number of bytes to recieve.
However, this wasted 56 bytes that could've been sent. This meant that
small values, like the arguments to a function to call on the host for
example, needed to perform an extra send. This patch sends the first 56
bytes in the first packet and continues if necessary.

Depends on D150992

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D151041
2023-05-23 10:59:47 -05:00
Joseph Huber
29d3da3b86 [libc] Fix the send_n and recv_n utilities under divergent lanes
We provide the `send_n` and `recv_n` utilities as a generic way to
stream data between both sides of the process. This was previously
tested and performed as expected when using a string of constant size.
However, when the size was allowed to diverge between the threads in the
warp or wavefront this could deadlock. This did not occur on NVPTX
because of the use of the explicit warp sync. However, on AMD one of the
work items in the wavefront could continue executing and hit the next
`recv` call before the other threads, then we would deadlock as we
violated the RPC invariants.

This patch replaces the for loop with a thread ballot. This will cause
every thread in the warp or wavefront to continue executing the loop
until all of them can exit. This acts as a more explicit wavefront sync.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D150992
2023-05-23 10:59:47 -05:00
Tue Ly
b91e78da37 [libc][math] Implement double precision log1p correctly rounded to all rounding modes.
Implement double precision log1p function correctly rounded to all
rounding modes.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.
  - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call).
  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log1p --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log1p --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
760.444

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
827.880

-- LIBC latency -- with FMA
711.837

-- LIBC latency -- without FMA
764.317
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D151049
2023-05-23 11:04:04 -04:00
Tue Ly
111d274841 [libc][math] Implement double precision log2 function correctly rounded to all rounding modes.
Implement double precision log2 function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log2 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log2 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
177.632

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
231.332

-- LIBC latency -- with FMA
459.751

-- LIBC latency -- without FMA
463.850
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150374
2023-05-23 10:49:30 -04:00
Tue Ly
a68bbf42fa [libc][math] Implement double precision log function correctly rounded to all rounding modes.
Implement double precision log function correctly rounded to all
rounding modes.

See https://reviews.llvm.org/D150014 for a more detail description of the algorithm.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
598.306

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
632.925

-- LIBC latency -- with FMA
455.632

-- LIBC latency -- without FMA
488.564
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150131
2023-05-23 10:35:15 -04:00
Joseph Huber
ad00a3db4d [libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc
The AMDGPU backend has a built-in pass to lower constructors. We do this
manually in the `start.cpp` implementation so we can disable this to
keep the binaries smaller.

Differential Revision: https://reviews.llvm.org/D151213
2023-05-23 09:20:41 -05:00
Tue Ly
a0c92a3817 [libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance.
Make log10 correctly rounded for non-FMA targets and improve its
performance.

Implemented fast pass and accurate pass:

**Fast Pass**:

  - Range reduction step 0: Extract exponent and mantissa
```
  x = 2^(e_x) * m_x
```
  - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to:
```
   -2^-8 <= v = r * m_x - 1 < 2^-7
  where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) )
  and k = trunc( (m_x - 1) * 2^7 )
```
  - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with:
```
 > P = fpminimax((log(1 + x) - x)/x^2, 5, [|D...|], [-2^-8, 2^-7]);
```
  - Combine the results:
```
  log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e)
```
  - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`.  Return the result if Ziv's test passed.

**Accurate Pass**:

  - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass.
  - Perform 3 more range reduction steps:
    - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]`
```
   v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v
  where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) )
  and k = trunc( v * 2^14 + 0.5 ).
```
    - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]`
```
   v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2
  where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) )
  and k = trunc( v * 2^21 + 0.5 ).
```
    - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]`
```
   v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3
  where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) )
  and k = trunc( v * 2^28 + 0.5 ).
```
  - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with:
```
  > P = fpminimax(log10(1 + x)/x, 3, [|128...|], [-0x1.0002143p-29 , 0x1p-29]);
```
  - Combine the results:
```
  log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v)
```
  - The combined results are computed using floating points of 128-bit precision.

**Performance**

  - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%.

  - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call;

-- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call;

-- LIBC reciprocal throughput -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call;

-- LIBC reciprocal throughput -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call;

```
  - Latency from CORE-MATH's perf tool on Ryzen 5900X:
```
$ ./perf.sh log10 --latency
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call;

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
[####################] 100 %
Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call;

-- LIBC latency -- with FMA
[####################] 100 %
Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call;

-- LIBC latency -- without FMA
[####################] 100 %
Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call;
```
  - Accurate pass latency:
```
$ ./perf.sh log10 --latency --simple_stat
GNU libc version: 2.35
GNU libc release: stable

-- CORE-MATH latency -- with FMA
640.688

-- CORE-MATH latency -- without FMA (-march=x86-64-v2)
667.354

-- LIBC latency -- with FMA
495.593

-- LIBC latency -- without FMA
504.143
```

Reviewed By: zimmermann6

Differential Revision: https://reviews.llvm.org/D150014
2023-05-23 10:18:23 -04:00
Guillaume Chatelet
04e066df5e [libc] Display unit test runtime for hosted environments
With more tests added to LLVM libc each week we want to keep track of unittest's runtime, especially for low end build bots.

Top offender can be tracked with a bit of scripting (spoiler alert, mem function sweep tests are in the top ones)
```
ninja check-libc | grep "ms)" | awk '{print $(NF-1),$0}' | sort -nr | cut -f2- -d' '
```

Unfortunately this doesn't work for hermetic tests since `clock` is unavailable.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D151097
2023-05-23 09:23:12 +00:00
Kazu Hirata
9a515d8142 [libc] Fix typos in documentation 2023-05-22 23:27:59 -07:00