clang-p2996

Author	SHA1	Message	Date
Joseph Huber	8aad5012cc	[libc][Docs] Add support for the printing functions	2023-06-06 14:33:08 -05:00
Joseph Huber	27a80fc946	[libc] Replace use of `asm` in the GPU code with LIBC_INLINE_ASM We should more consistently use inline assembly using the LIBC wrappers. It's much safer to mark all of these volatile as well. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D152294	2023-06-06 14:24:52 -05:00
Tue Ly	b95ed8b6d9	[libc] Remove operator T from cpp::expected. The libc's equivalent of std::expected has a non-standard and non-explicit operator T - https://github.com/llvm/llvm-project/issues/62738 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D152270	2023-06-06 13:57:44 -04:00
Aiden Grossman	14a06b806e	[CMake][libc] Don't put archive in build/lib/<target triple> by default `ea8f4b9841` broke some build configurations because it was enabled by default and some people are using a just built libc/clang/LLVM to work on other projects where having a just built LLVM libc in one of Clang's default include directories can make things unusable. Differential Revision: https://reviews.llvm.org/D152190	2023-06-06 00:43:11 +00:00
Joseph Huber	e6a350df10	[libc] Replace the `PRINT_TO_STDERR` opcode for RPC printing. A previous patch added general support for printing via the RPC interface. we should consolidate this functionality and get rid of the old opcode that was used for simple testing. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D152211	2023-06-05 19:28:30 -05:00
Joseph Huber	a59e1712fa	[libc][obvious] Fix conditional when CUDA is not found If CUDA is not found this string will expand into nothing. We need to surround it with a string otherwise it will cause build failures. Differential Revision: https://reviews.llvm.org/D152209	2023-06-05 18:51:23 -05:00
Joseph Huber	e6c401b5e8	[libc] Add initial support for 'puts' and 'fputs' to the GPU This patch adds the initial support required to support basic priting in `stdio.h` via `puts` and `fputs`. This is done using the existing LLVM C library `File` API. In this sense we can think of the RPC interface as our system call to dump the character string to the file. We carry a `uintptr_t` reference as our native "file descriptor" as it will be used as an opaque reference to the host's version once functions like `fopen` are supported. For some unknown reason the declaration of the `StdIn` variable causes both the AMDGPU and NVPTX backends to crash if I use the `READ` flag. This is not used currently as we only support output now, but it needs to be fixed Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D151282	2023-06-05 17:56:55 -05:00
Joseph Huber	a621308881	[libc] Implement basic `malloc` and `free` support on the GPU This patch adds support for the `malloc` and `free` functions. These currently aren't implemented in-tree so we first add the interface filies. This patch provides the most basic support for a true `malloc` and `free` by using the RPC interface. This is functional, but in the future we will want to implement a more intelligent system and primarily use the RPC interface more as a `brk()` or `sbrk()` interface only called when absolutely necessary. We will need to design an intelligent allocator in the future. The semantics of these memory allocations will need to be checked. I am somewhat iffy on the details. I've heard that HSA can allocate asynchronously which seems to work with my tests at least. CUDA uses an implicit synchronization scheme so we need to use an explicitly separate stream from the one launching the kernel or the default stream. I will need to test the NVPTX case. I would appreciate if anyone more experienced with the implementation details here could chime in for the HSA and CUDA cases. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D151735	2023-06-05 17:56:53 -05:00
Guillaume Chatelet	e49a608511	Revert D148717 "[libc] Improve memcmp latency and codegen" This reverts commit `9ec6ebd3ce`. The patch broke RISCV and aarch64 builtbots.	2023-06-05 09:50:30 +00:00
Guillaume Chatelet	9ec6ebd3ce	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-05 09:46:05 +00:00
Aiden Grossman	ea8f4b9841	[libc][CMake] Place archives in build/lib/<target-triple> This patch moves the location of libllvmlibc.a within the build tree to within ./lib/<target triple>. This more closely matches the behavior of other runtime builds and allows for clang in the same build tree to automatically be able to link against llvmlibc since this path is by default included by the driver. Also removes the LIBC_BINARY_DIR CMake flag since it isn't used anywhere in the tree (based on a quick grep). Reviewed By: phosek Differential Revision: https://reviews.llvm.org/D151624	2023-06-03 22:40:03 +00:00
Tue Ly	5a4e344bd9	[libc][NFC] Add LIBC_INLINE and attribute.h header includes to targets' FMA.h. Targets' FMA.h headers are missing LIBC_INLINE and attributes.h header. Reviewed By: brooksmoses Differential Revision: https://reviews.llvm.org/D152024	2023-06-02 21:15:58 -04:00
Joseph Huber	48bb7bb868	[libc] Disable the string_to_float test on NVPTX This test began failing after recent changes. Disable it for now. Differential Revision: https://reviews.llvm.org/D152032	2023-06-02 15:56:39 -05:00
Joseph Huber	cfde5f2d89	[libc] Implement 'errno' on the GPU as a global integer internally The C standard asserts that the `errno` value is an l-value thread local integer. We cannot provide a generic thread local integer on the GPU currently without some workarounds. Previously, we worked around this by implementing the `errno` value as a special consumer class that made all the writes disappear. However, this is problematic for internal tests. Currently there are build failures because of this handling and it's only likely to cause more problems the more we do this. This patch instead makes the internal target used for testing export the `errno` value as a simple global integer. This allows us to use and test the `errno` interface correctly assuming we run with a single thread. Because this is only used for the non-exported target we still do not provide this feature in the version that users will use so we do not need to worrk about it being incorrect in general. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D152015	2023-06-02 14:16:24 -05:00
Tue Ly	f9753ef189	[libc][Obvious] Fix a typo in setting FMA control option for RISCV64.	2023-06-02 11:15:29 -04:00
Michael Jones	722832e6d7	[libc] Add strtoint32 and strtoint64 tests There were regressions in the testing framework due to none of the functioning buildbots having a 32 bit long. This allowed the 32 bit version of the strtointeger function to go untested. This patch adds tests for strtoint32 and strtoint64, which are internal testing functions that use constant integer sizes. It also fixes the tests to properly handle these situations. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D151935	2023-06-01 15:06:46 -07:00
Guillaume Chatelet	2697ffd039	[libc] Reduce math tests runtime further Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D151875	2023-06-01 12:43:18 +00:00
Guillaume Chatelet	ae5c472410	[libc] Reduce math tests runtime Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D151798	2023-06-01 05:01:56 +00:00
Tue Ly	cfc5c6cb8d	[libc][docs] Update implementation status table for Date and Time Functions. Update implementation status table for Date and Time Functions to include different targets. Reviewed By: jeffbailey Differential Revision: https://reviews.llvm.org/D151809	2023-05-31 15:09:06 -04:00
Guillaume Chatelet	c76a3e795e	[libc][NFC] Fixing various typos	2023-05-31 12:11:09 +00:00
Tue Ly	e557b8a142	[libc][RISCV] Add log, log2, log1p, log10 for RISC-V64 entrypoints. Add log, log2, log1p, log10 RISCV64 entrypoints. Reviewed By: michaelrj, sivachandra Differential Revision: https://reviews.llvm.org/D151674	2023-05-30 14:18:19 -04:00
Joseph Huber	1ef0bafc4f	[libc][NFC] Move the Linux file implementation to a subdirectory This patch simply moves the special handling for `linux` files to a subdirectory. This is done to make it easier in the future to extend this support to targets (like the GPU) that will have different dependencies. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D151231	2023-05-30 06:49:21 -05:00
Mark de Wever	cbaa3597aa	Reland "[CMake] Bumps minimum version to 3.20.0. This reverts commit `d763c6e5e2`. Adds the patch by @hans from https://github.com/llvm/llvm-project/issues/62719 This patch fixes the Windows build. `d763c6e5e2` reverted the reviews D144509 [CMake] Bumps minimum version to 3.20.0. This partly undoes D137724. This change has been discussed on discourse https://discourse.llvm.org/t/rfc-upgrading-llvms-minimum-required-cmake-version/66193 Note this does not remove work-arounds for older CMake versions, that will be done in followup patches. D150532 [OpenMP] Compile assembly files as ASM, not C Since CMake 3.20, CMake explicitly passes "-x c" (or equivalent) when compiling a file which has been set as having the language C. This behaviour change only takes place if "cmake_minimum_required" is set to 3.20 or newer, or if the policy CMP0119 is set to new. Attempting to compile assembly files with "-x c" fails, however this is workarounded in many cases, as OpenMP overrides this with "-x assembler-with-cpp", however this is only added for non-Windows targets. Thus, after increasing cmake_minimum_required to 3.20, this breaks compiling the GNU assembly for Windows targets; the GNU assembly is used for ARM and AArch64 Windows targets when building with Clang. This patch unbreaks that. D150688 [cmake] Set CMP0091 to fix Windows builds after the cmake_minimum_required bump The build uses other mechanism to select the runtime. Fixes #62719 Reviewed By: #libc, Mordante Differential Revision: https://reviews.llvm.org/D151344	2023-05-27 12:51:21 +02:00
Krasimir Georgiev	07f49bf475	[libc] Adapt includes after `25174976e1`	2023-05-26 14:25:50 +00:00
Siva Chandra Reddy	4f1fe19df3	[libc] Make ErrnoSetterMatcher handle logging floating point values. Along the way, couple of additional things have been done: 1. Move `ErrnoSetterMatcher.h` to `test/UnitTest` as all other matchers live there now. 2. `ErrnoSetterMatcher` ignores matching `errno` on GPUs. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D151129	2023-05-26 06:24:51 +00:00
Joseph Huber	4311246a3a	Revert "[libc] Enable hermetic floating point tests" This passed locally but unfortauntely it seems some tests are not ready to be made hermetic. Revert for now until we can investigate specifically which tests are failing and mark those as `UNIT_TEST_ONLY`. This reverts commit `417ea79e79`.	2023-05-25 19:15:02 -05:00
Joseph Huber	417ea79e79	[libc] Enable hermetic floating point tests This patch enables us to run the floating point tests as hermetic. Importantly we now use the internal versions of the `fesetround` and `fegetround` functions. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D151123	2023-05-25 19:08:44 -05:00
Tue Ly	0aa7ea4e22	[libc][darwin] Add OSUtil for darwin arm64 target so that unit tests can be run. Currently unit tests cannot be run on macOS due to missing OSUtil. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D151377	2023-05-25 19:25:24 -04:00
Tue Ly	0bda541829	[libc][doc] Update math function status page to show more targets. Show availability of math functions on each target. Reviewed By: jeffbailey Differential Revision: https://reviews.llvm.org/D151489	2023-05-25 19:24:33 -04:00
Roland McGrath	65c78933ae	[libc] Support LIBC_COPT_USE_C_ASSERT build flag In this mode, LIBC_ASSERT is just standard C assert. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D151498	2023-05-25 14:10:33 -07:00
Roland McGrath	1d4e8f0ea6	[libc] Fix compilation issues in memory_check_utils.h Strict warnings require explicit static_cast to counteract default widening of types narrower than int. Functions in header files should have vague linkage (inline keyword), not internal linkage (static) or external linkage (no inline keyword) even for template functions. Note these don't use the LIBC_INLINE macro since this is only for test code. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D151494	2023-05-25 14:09:14 -07:00
Siva Chandra Reddy	daeee56798	[libc] Add macro LIBC_THREAD_LOCAL. It resolves to thread_local on all platform except for the GPUs on which it resolves to nothing. The use of thread_local in the source code has been replaced with the new macro. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D151486	2023-05-25 19:53:52 +00:00
Guillaume Chatelet	298843cd66	[libc][test] Drastically reduce mem test runtime Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D151450	2023-05-25 15:05:21 +00:00
Tobias Hieta	f98ee40f4b	[NFC][Py Reformat] Reformat python files in the rest of the dirs This is an ongoing series of commits that are reformatting our Python code. This catches the last of the python files to reformat. Since they where so few I bunched them together. Reformatting is done with `black`. If you end up having problems merging this commit because you have made changes to a python file, the best way to handle that is to run git checkout --ours <yourfile> and then reformat it with black. If you run into any problems, post to discourse about it and we will try to help. RFC Thread below: https://discourse.llvm.org/t/rfc-document-and-standardize-python-code-style Reviewed By: jhenderson, #libc, Mordante, sivachandra Differential Revision: https://reviews.llvm.org/D150784	2023-05-25 11:17:05 +02:00
Siva Chandra Reddy	99a493e3e3	[libc] Make hermetic test depend on the unit test if it exists. We want to do this so that build system like ninja don't end up running the hermetic and unit tests in parallel. Running in parallel can cause problems for tests which read/write disk files as the hermetic and unit tests can end up stepping on each other. Reviewed By: jhuber6 Differential Revision: https://reviews.llvm.org/D151291	2023-05-25 06:27:00 +00:00
Siva Chandra Reddy	25174976e1	[libc] Rearrange error and signal tables. This is largely a cosmetic change done with a few goals: 1. Reduce the conditionals in picking the correct set of tables for the platform. 2. Avoid exposing, for example Linux errors, when building for non-Linux platforms. This also prevents build failures when Linux errors are not defined on the target non-Linux platform. 3. Some "_table" suffixes have been removed to avoid repeated occurance of "table" like "tables/linux_error_table.h". Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D151367	2023-05-25 05:51:32 +00:00
Guillaume Chatelet	bb4f88f9b9	[libc] simplify test for getrandom `getrandom` is implemented as a syscall. We don't want to test linux implementation of the syscall. We just want to verify that it reacts as expected to sensible values. Runtime before ``` [ RUN ] LlvmLibcGetRandomTest.InvalidFlag [ OK ] LlvmLibcGetRandomTest.InvalidFlag (took 0 ms) [ RUN ] LlvmLibcGetRandomTest.InvalidBuffer [ OK ] LlvmLibcGetRandomTest.InvalidBuffer (took 0 ms) [ RUN ] LlvmLibcGetRandomTest.ReturnsSize [ OK ] LlvmLibcGetRandomTest.ReturnsSize (took 83 ms) [ RUN ] LlvmLibcGetRandomTest.PiEstimation [ OK ] LlvmLibcGetRandomTest.PiEstimation (took 9882 ms) ``` Runtime after ``` [ RUN ] LlvmLibcGetRandomTest.InvalidFlag [ OK ] LlvmLibcGetRandomTest.InvalidFlag (took 0 ms) [ RUN ] LlvmLibcGetRandomTest.InvalidBuffer [ OK ] LlvmLibcGetRandomTest.InvalidBuffer (took 0 ms) [ RUN ] LlvmLibcGetRandomTest.ReturnsSize [ OK ] LlvmLibcGetRandomTest.ReturnsSize (took 0 ms) [ RUN ] LlvmLibcGetRandomTest.CheckValue [ OK ] LlvmLibcGetRandomTest.CheckValue (took 0 ms) ``` Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D151336	2023-05-24 15:07:51 +00:00
Tue Ly	8c6b83dcfd	[libc] Reduce the sizes of some math tests that take longest time. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D151256	2023-05-24 08:55:51 -04:00
Tue Ly	a2ac3678cd	[libc][bazel] Add log, log2, log10, log1p to bazel layout. Add log, log2, log10, log1p and their unit tests to bazel layout. Reviewed By: gchatelet Differential Revision: https://reviews.llvm.org/D151252	2023-05-24 07:43:58 -04:00
Tue Ly	7cbcc581a5	[libc] Change UInt integer conversion operators to use standard types. This fixes an issue with missing `unsigned long` conversion on macOS. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D151234	2023-05-23 14:12:46 -04:00
Joseph Huber	99c9515b37	[libc][obvious] Correctly hoist mask out of the loop Summry: This was accidentally dropped from a previous patch following a rebase. Fix it to where it's consistent. Differential Revision: https://reviews.llvm.org/D151232	2023-05-23 12:21:10 -05:00
Joseph Huber	e826762a08	[libc] More efficiently send bytes via `send_n` and `recv_n` Currently we have the `send_n` and `recv_n` routines to stream data, such as a string to print, to the other side. The first operation is to send the size so the other side knows the number of bytes to recieve. However, this wasted 56 bytes that could've been sent. This meant that small values, like the arguments to a function to call on the host for example, needed to perform an extra send. This patch sends the first 56 bytes in the first packet and continues if necessary. Depends on D150992 Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D151041	2023-05-23 10:59:47 -05:00
Joseph Huber	29d3da3b86	[libc] Fix the `send_n` and `recv_n` utilities under divergent lanes We provide the `send_n` and `recv_n` utilities as a generic way to stream data between both sides of the process. This was previously tested and performed as expected when using a string of constant size. However, when the size was allowed to diverge between the threads in the warp or wavefront this could deadlock. This did not occur on NVPTX because of the use of the explicit warp sync. However, on AMD one of the work items in the wavefront could continue executing and hit the next `recv` call before the other threads, then we would deadlock as we violated the RPC invariants. This patch replaces the for loop with a thread ballot. This will cause every thread in the warp or wavefront to continue executing the loop until all of them can exit. This acts as a more explicit wavefront sync. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D150992	2023-05-23 10:59:47 -05:00
Tue Ly	b91e78da37	[libc][math] Implement double precision log1p correctly rounded to all rounding modes. Implement double precision log1p function correctly rounded to all rounding modes. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%. - Benchmarks with `./perf.sh` tool from the CORE-MATH project, unit is (CPU clocks / call). - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log1p GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 39.792 + 1.011 clc/call; Median-Min = 0.940 clc/call; Max = 41.373 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 87.285 + 1.135 clc/call; Median-Min = 1.299 clc/call; Max = 89.715 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 20.666 + 0.123 clc/call; Median-Min = 0.125 clc/call; Max = 20.828 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.928 + 0.771 clc/call; Median-Min = 0.725 clc/call; Max = 22.767 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 31.461 + 0.528 clc/call; Median-Min = 0.602 clc/call; Max = 36.809 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log1p --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 77.875 + 0.062 clc/call; Median-Min = 0.051 clc/call; Max = 78.003 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 101.958 + 1.202 clc/call; Median-Min = 1.325 clc/call; Max = 104.452 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 60.581 + 1.443 clc/call; Median-Min = 1.611 clc/call; Max = 62.285 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.817 + 1.108 clc/call; Median-Min = 1.300 clc/call; Max = 50.282 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 61.121 + 0.599 clc/call; Median-Min = 0.761 clc/call; Max = 62.020 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log1p --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 760.444 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 827.880 -- LIBC latency -- with FMA 711.837 -- LIBC latency -- without FMA 764.317 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D151049	2023-05-23 11:04:04 -04:00
Tue Ly	111d274841	[libc][math] Implement double precision log2 function correctly rounded to all rounding modes. Implement double precision log2 function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log2 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 177.632 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 231.332 -- LIBC latency -- with FMA 459.751 -- LIBC latency -- without FMA 463.850 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150374	2023-05-23 10:49:30 -04:00
Tue Ly	a68bbf42fa	[libc][math] Implement double precision log function correctly rounded to all rounding modes. Implement double precision log function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.93%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.465 + 0.596 clc/call; Median-Min = 0.602 clc/call; Max = 18.389 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 54.961 + 2.606 clc/call; Median-Min = 2.180 clc/call; Max = 59.583 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 12.608 + 0.276 clc/call; Median-Min = 0.359 clc/call; Max = 13.147 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.952 + 0.468 clc/call; Median-Min = 0.602 clc/call; Max = 21.881 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 18.569 + 0.552 clc/call; Median-Min = 0.601 clc/call; Max = 19.259 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 48.431 + 0.699 clc/call; Median-Min = 0.073 clc/call; Max = 51.269 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 64.865 + 3.235 clc/call; Median-Min = 3.475 clc/call; Max = 71.788 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 42.151 + 2.090 clc/call; Median-Min = 2.270 clc/call; Max = 44.773 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 35.266 + 0.479 clc/call; Median-Min = 0.373 clc/call; Max = 36.798 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 48.518 + 0.484 clc/call; Median-Min = 0.500 clc/call; Max = 49.896 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 598.306 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 632.925 -- LIBC latency -- with FMA 455.632 -- LIBC latency -- without FMA 488.564 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150131	2023-05-23 10:35:15 -04:00
Joseph Huber	ad00a3db4d	[libc][AMDGPU] Disable the AMDGPU backend's ctor/dtor lowering for libc The AMDGPU backend has a built-in pass to lower constructors. We do this manually in the `start.cpp` implementation so we can disable this to keep the binaries smaller. Differential Revision: https://reviews.llvm.org/D151213	2023-05-23 09:20:41 -05:00
Tue Ly	a0c92a3817	[libc][math] Make log10 correctly rounded for non-FMA targets and improve itsperformance. Make log10 correctly rounded for non-FMA targets and improve its performance. Implemented fast pass and accurate pass: Fast Pass: - Range reduction step 0: Extract exponent and mantissa ``` x = 2^(e_x) * m_x ``` - Range reduction step 1: Use lookup tables of size 2^7 = 128 to reduce the argument to: ``` -2^-8 <= v = r * m_x - 1 < 2^-7 where r = 2^-8 * ceil( 2^8 * (1 - 2^-8) / (1 + k * 2^-7) ) and k = trunc( (m_x - 1) * 2^7 ) ``` - Polynomial approximation: approximate `log(1 + v)` by a degree-7 polynomial generated by Sollya with: ``` > P = fpminimax((log(1 + x) - x)/x^2, 5, [\|D...\|], [-2^-8, 2^-7]); ``` - Combine the results: ``` log10(x) ~ ( e_x * log(2) - log(r) + v + v^2 * P(v) ) * log10(e) ``` - Perform additive Ziv's test with errors bounded by `P_ERR * v^2`. Return the result if Ziv's test passed. Accurate Pass: - Take `e_x`, `v`, and the lookup table index from the range reduction step of fast pass. - Perform 3 more range reduction steps: - Range reduction step 2: Use look-up tables of size 193 to reduce the argument to `[-0x1.3ffcp-15, 0x1.3e3dp-15]` ``` v2 = r2 * (1 + v) - 1 = (1 + s2) * (1 + v) - 1 = s2 + v + s2 * v where r2 = 2^-16 * round ( 2^16 / (1 + k * 2^-14) ) and k = trunc( v * 2^14 + 0.5 ). ``` - Range reduction step 3: Use look-up tables of size 161 to reduce the argument to `[-0x1.01928p-22 , 0x1p-22]` ``` v3 = r3 * (1 + v2) - 1 = (1 + s3) * (1 + v2) - 1 = s3 + v2 + s3 * v2 where r3 = 2^-21 * round ( 2^21 / (1 + k * 2^-21) ) and k = trunc( v * 2^21 + 0.5 ). ``` - Range reduction step 4: Use look-up tables of size 130 to reduce the argument to `[-0x1.0002143p-29 , 0x1p-29]` ``` v4 = r4 * (1 + v3) - 1 = (1 + s4) * (1 + v3) - 1 = s4 + v3 + s4 * v3 where r4 = 2^-28 * round ( 2^28 / (1 + k * 2^-28) ) and k = trunc( v * 2^28 + 0.5 ). ``` - Polynomial approximation: approximate `log10(1 + v4)` by a degree-4 minimax polynomial generated by Sollya with: ``` > P = fpminimax(log10(1 + x)/x, 3, [\|128...\|], [-0x1.0002143p-29 , 0x1p-29]); ``` - Combine the results: ``` log10(x) ~ e_x * log10(2) - log10(r) - log10(r2) - log10(r3) - log10(r4) + v * P(v) ``` - The combined results are computed using floating points of 128-bit precision. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.92%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 20.402 + 0.589 clc/call; Median-Min = 0.277 clc/call; Max = 22.752 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 75.797 + 3.317 clc/call; Median-Min = 3.407 clc/call; Max = 79.371 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 22.668 + 0.184 clc/call; Median-Min = 0.181 clc/call; Max = 23.205 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 25.977 + 0.183 clc/call; Median-Min = 0.138 clc/call; Max = 26.283 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 22.140 + 0.980 clc/call; Median-Min = 0.853 clc/call; Max = 23.790 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log10 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 54.613 + 0.357 clc/call; Median-Min = 0.287 clc/call; Max = 55.701 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 79.681 + 0.482 clc/call; Median-Min = 0.294 clc/call; Max = 81.604 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 61.532 + 0.208 clc/call; Median-Min = 0.199 clc/call; Max = 62.256 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 41.510 + 0.205 clc/call; Median-Min = 0.244 clc/call; Max = 41.867 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 55.669 + 0.240 clc/call; Median-Min = 0.280 clc/call; Max = 56.056 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log10 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 640.688 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 667.354 -- LIBC latency -- with FMA 495.593 -- LIBC latency -- without FMA 504.143 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150014	2023-05-23 10:18:23 -04:00
Guillaume Chatelet	04e066df5e	[libc] Display unit test runtime for hosted environments With more tests added to LLVM libc each week we want to keep track of unittest's runtime, especially for low end build bots. Top offender can be tracked with a bit of scripting (spoiler alert, mem function sweep tests are in the top ones) ``` ninja check-libc \| grep "ms)" \| awk '{print $(NF-1),$0}' \| sort -nr \| cut -f2- -d' ' ``` Unfortunately this doesn't work for hermetic tests since `clock` is unavailable. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D151097	2023-05-23 09:23:12 +00:00
Kazu Hirata	9a515d8142	[libc] Fix typos in documentation	2023-05-22 23:27:59 -07:00

1 2 3 4 5 ...

1884 Commits