clang-p2996

Author	SHA1	Message	Date
lntue	d0a0de50f7	[libc] Fix implicit conversion warnings in tests. (#131362 )	2025-03-14 14:24:11 -04:00
Vinay Deshmukh	257e483715	[libc] Add `-Wno-sign-conversion` & re-attempt `-Wconversion` (#129811 ) Relates to https://github.com/llvm/llvm-project/issues/119281#issuecomment-2699470459	2025-03-10 11:57:09 -04:00
Augie Fackler	da61b0ddc5	Revert "[libc] Enable -Wconversion for tests. (#127523 )" This reverts commit `1e6e845d49` because it changed the 1st parameter of adjust() to be unsigned, but libc itself calls adjust() with a negative argument in align_backward() in op_generic.h.	2025-03-05 16:42:40 -05:00
Vinay Deshmukh	1e6e845d49	[libc] Enable -Wconversion for tests. (#127523 ) Relates to: #119281	2025-03-04 10:24:35 -05:00
Nick Desaulniers	431ea2d076	[libc] move bcmp, bzero, bcopy, index, rindex, strcasecmp, strncasecmp to strings.h (#118899 ) docgen relies on the convention that we have a file foo.cpp in libc/src/\<header\>/. Because the above functions weren't in libc/src/strings/ but rather libc/src/string/, docgen could not find that we had implemented these. Rather than add special carve outs to docgen, let's fix up our sources for these 7 functions to stick with the existing conventions the rest of the codebase follows. Link: #118860 Fixes: #118875	2024-12-10 08:58:45 -08:00
Michael Jones	a0c4f854ca	[libc] Change ctype to be encoding independent (#110574 ) The previous implementation of the ctype functions assumed ASCII. This patch changes to a switch/case implementation that looks odd, but actually is easier for the compiler to understand and optimize.	2024-12-03 12:36:04 -08:00
Job Henandez Lara	33bdb53d86	[libc] Remove the #include <stdlib.h> header (#114453 )	2024-11-01 21:49:57 -07:00
George Burgess IV	50c44478fe	[libc] fix behavior of strrchr(x, '\0') (#112620 ) `strrchr("foo", '\0')` is defined to point to the end of `foo`, rather than returning NULL. This wasn't caught by tests, since llvm-libc's `ASSERT_STREQ(nullptr, "");` is not an assertion error. While I'm here, refactor the test slightly to check for NULL more specifically. I considered adding fancier `ASSERT`s (and changing the semantics of `ASSERT_STREQ`), but opted for a more local fix by fair dice roll.	2024-10-30 15:08:03 -07:00
George Burgess IV	b03c8c4fdd	libc: strlcpy/strlcat shouldn't bzero the rest of `buf` (#114259 ) When running Bionic's testsuite over llvm-libc, tests broke because e.g., ``` const char *str = "abc"; char buf[7]{"111111"}; strlcpy(buf, str, 7); ASSERT_EQ(buf, {'1', '1', '1', '\0', '\0', '\0', '\0'}); ``` On my machine (Debian w/ glibc and clang-16), a `printf` loop over `buf` gets unrolled into a series of const `printf` at compile-time: ``` printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", 0); printf("%d\n", '1'); printf("%d\n", '1'); printf("%d\n", 0); ``` Seems best to match existing precedent here.	2024-10-30 12:28:32 -07:00
Joseph Huber	cbfcea1fc2	[libc] Temporarily disable strerror test on NVPTX Summary: This is failing on the NVPTX buildbot, https://lab.llvm.org/buildbot/#/builders/69/builds/6997/. I cannot reproduce it locally so I'm disabling it temporarily so the bot is green.	2024-10-10 20:20:05 -05:00
Sirui Mu	ded080152a	[libc] Add osutils for Windows and make libc and its tests build on Windows target (#104676 ) This PR first adds osutils for Windows, and changes some libc code to make libc and its tests build on the Windows target. It then temporarily disables some libc tests that are currently problematic on Windows. Specifically, the changes besides the addition of osutils include: - Macro `LIBC_TYPES_HAS_FLOAT16` is disabled on Windows. `clang-cl` generates calls to functions in `compiler-rt` to handle float16 arithmetic and these functions are currently not linked in on Windows. - Macro `LIBC_TYPES_HAS_INT128` is disabled on Windows. - The invocation to `::aligned_malloc` is changed to an invocation to `::_aligned_malloc`. - The following unit tests are temporarily disabled because they currently fail on Windows: - `test.src.__support.big_int_test` - `test.src.__support.arg_list_test` - `test.src.fenv.getenv_and_setenv_test` - Tests involving `__m128i`, `__m256i`, and `__m512i` in `test.src.string.memory_utils.op_tests.cpp` - `test_range_errors` in `libc/test/src/math/smoke/AddTest.h` and `libc/test/src/math/smoke/SubTest.h`	2024-09-11 23:41:32 -04:00
Joseph Huber	f7cee44ef2	[libc] Add `strerror` and `strerror_k` to the GPU (#99083 ) Summary: The GPU ignores `errno` primarily, but targets want these functions to be defined for certain C standard interfaces. This patch enables them and makes the test function on non-Linux targets.	2024-07-16 16:17:01 -05:00
OverMighty	88f0dc48d6	[libc] Fix warnings emitted by GCC (#98751 ) Fixes #98709.	2024-07-15 17:20:53 +02:00
Petr Hosek	5ff3ff33ff	[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98597 ) This is a part of #97655.	2024-07-12 09:28:41 -07:00
Mehdi Amini	ce9035f5bd	Revert "[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration" (#98593 ) Reverts llvm/llvm-project#98075 bots are broken	2024-07-12 09:12:13 +02:00
Petr Hosek	3f30effe1b	[libc] Migrate to using LIBC_NAMESPACE_DECL for namespace declaration (#98075 ) This is a part of #97655.	2024-07-11 12:35:22 -07:00
Joseph Huber	398162ddbc	[libc] Fix typo in test message	2024-05-15 07:12:54 -05:00
Jan Patrick Lehr	ccbf908b08	[libc] Fix GPU test build error (#92235 ) This fixes a build error on the AMDGPU buildbot introduced in PR https://github.com/llvm/llvm-project/pull/92172	2024-05-15 07:06:40 -05:00
Guillaume Chatelet	292b300c51	[libc][bug] Fix out of bound write in memcpy w/ software prefetching (#90591 ) This patch adds tests for `memcpy` and `memset` making sure that we don't access buffers out of bounds. It relies on POSIX `mmap` / `mprotect` and works only when FULL_BUILD_MODE is disabled. The bug showed up while enabling software prefetching. `loop_and_tail_offset` is always running at least one iteration but in some configurations loop unrolled prefetching was actually needing only the tail operation and no loop iterations at all.	2024-05-14 13:55:24 +02:00
Guillaume Chatelet	07d7b9c255	[libc] Fix forward arm32 builtbot (#84794 ) Introduced by https://github.com/llvm/llvm-project/pull/83441.	2024-03-11 18:17:18 +01:00
Guillaume Chatelet	a84e66a92d	[libc] Provide `LIBC_TYPES_HAS_INT64` (#83441 ) Umbrella bug #83182	2024-03-09 09:43:07 +01:00
Schrodinger ZHU Yifan	57a337378f	[libc][c23] add memset_explicit (#83577 )	2024-03-07 14:57:35 -05:00
michaelrj-google	3eb1e6d8e9	[libc] Move libc_errno inside of LIBC_NAMESPACE (#80774 ) Having libc_errno outside of the namespace causes versioning issues when trying to link the tests against LLVM-libc. Most of this patch is just moving libc_errno inside the namespace in tests. This isn't necessary in the function implementations since those are already inside the namespace.	2024-02-06 10:36:05 +01:00
Guillaume Chatelet	73874f7a39	[libc][NFC] Use specific ASSERT macros to test errno (#79573 ) This patch provides specific test macros to deal with `errno`. This will help abstract away the differences between unit test and integration/hermetic tests in #79319. In one case we use `libc_errno` which is a struct, in the other case we deal directly with `errno`.	2024-01-26 15:18:22 +01:00
Guillaume Chatelet	9ca6e5bb86	[libc] Fix buggy AVX2 / AVX512 `memcmp` (#77081 ) Fixes #77080.	2024-01-11 11:45:37 +01:00
Nick Desaulniers	7c6b4be615	[libc] fix msan failure in mempcpy_test (#75532 ) Internal builds of the unittests with msan flagged mempcpy_test. ==6862==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x55e34d7d734a in length llvm-project/libc/src/__support/CPP/string_view.h:41:11 #1 0x55e34d7d734a in string_view llvm-project/libc/src/__support/CPP/string_view.h:71:24 #2 0x55e34d7d734a in __llvm_libc_9999_0_0_git::testing::Test::testStrEq(char const, char const, char const, char const, __llvm_libc_9999_0_0_git::testing::internal::Location) llvm-project/libc/test/UnitTest/LibcTest.cpp:284:13 #3 0x55e34d7d4e09 in LlvmLibcMempcpyTest_Simple::Run() llvm-project/libc/test/src/string/mempcpy_test.cpp:20:3 #4 0x55e34d7d6dff in __llvm_libc_9999_0_0_git::testing::Test::runTests(char const*) llvm-project/libc/test/UnitTest/LibcTest.cpp:133:8 #5 0x55e34d7d86e0 in main llvm-project/libc/test/UnitTest/LibcTestMain.cpp:21:10 SUMMARY: MemorySanitizer: use-of-uninitialized-value llvm-project/libc/src/__support/CPP/string_view.h:41:11 in length What's going on here is that mempcpy_test.cpp's Simple test is using ASSERT_STREQ with a partially initialized char array. ASSERT_STREQ calls Test::testStrEq which constructs a cpp:string_view. That constructor calls the private method cpp::string_view::length. When built with msan, the loop is transformed into multi-byte access, which then fails upon access. I took a look at libc++'s __constexpr_strlen which just calls __builtin_strlen(). Replacing the implementation of cpp::string_view::length with a call to __builtin_strlen() may still result in out of bounds access when the test is built with msan. It's not safe to use ASSERT_STREQ with a partially initialized array. Initialize the whole array so that the test passes.	2023-12-14 13:39:33 -08:00
Guillaume Chatelet	1d89478830	[reland][libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead (#73939 ) (#74446 ) Same as #73939 but also fix `libc/src/string/memory_utils/op_aarch64.h` that was still using `deferred_static_assert`.	2023-12-05 11:35:13 +01:00
Guillaume Chatelet	de7fdc5b54	Revert "[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead" (#74444 ) Reverts llvm/llvm-project#73939 This broke libc-aarch64-ubuntu build bot https://lab.llvm.org/buildbot/#/builders/138/builds/56186	2023-12-05 11:25:39 +01:00
Guillaume Chatelet	b140948850	[libc][NFC] Remove __support/bit.h and use __support/CPP/bit.h instead (#73939 )	2023-12-05 11:21:07 +01:00
Joseph Huber	4cb6c1c7cb	[libc] Enable missing memory tests on the GPU (#68111 ) Summary: There were a few tests that weren't enabled on the GPU. This is because the logic caused them to be skipped as we don't use CPU featured on the host. This also disables the logic making multiple versions of the memory functions.	2023-10-06 08:27:36 -05:00
Siva Chandra	aecb58005c	[libc][NFC] Remove an inappropriate -ffreestanding arg to memory_utils test. (#67435 )	2023-09-26 08:04:08 -07:00
Guillaume Chatelet	b6bc9d72f6	[libc] Mass replace enclosing namespace (#67032 ) This is step 4 of https://discourse.llvm.org/t/rfc-customizable-namespace-to-allow-testing-the-libc-when-the-system-libc-is-also-llvms-libc/73079	2023-09-26 11:45:04 +02:00
Guillaume Chatelet	1c814c99aa	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-30 13:00:58 +00:00
Guillaume Chatelet	bd1cba9f4f	Revert D148717 "[libc] Improve memcmp latency and codegen" Once integrated in our codebase the patch triggered a bunch of failing tests. We do not yet understand where the bug is but we revert it to move forward with integration. This reverts commit `5e32765c15`.	2023-06-21 12:37:14 +00:00
Guillaume Chatelet	9902fc8dad	[libc] Enable custom logging in LibcTest This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_. Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D152630	2023-06-14 13:37:50 +00:00
Guillaume Chatelet	bdb07c98c4	Revert D152630 "[libc] Enable custom logging in LibcTest" Failing buildbot https://lab.llvm.org/buildbot/#/builders/73/builds/49707 This reverts commit `9a7b4c9348`.	2023-06-14 10:31:49 +00:00
Guillaume Chatelet	9a7b4c9348	[libc] Enable custom logging in LibcTest This patch mimics the behavior of Google Test and allow users to log custom messages after all flavors of ASSERT_ / EXPECT_. Reviewed By: sivachandra, lntue Differential Revision: https://reviews.llvm.org/D152630	2023-06-14 10:26:18 +00:00
Guillaume Chatelet	2cfae7cdf4	[libc] Dispatch memmove to memcpy when buffers are disjoint Most of the time `memmove` is called on buffers that are disjoint, in that case we can use `memcpy` which is faster. The additional test is branchless on x86, aarch64 and RISCV with the zbb extension (bitmanip). On x86 this patch adds a latency of 2 to 3 cycles. Before ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations UserCounters... -------------------------------------------------------------------------------- BM_Memmove/0/0_median 5.00 ns 5.00 ns 10 bytes_per_cycle=1.25477/s bytes_per_second=2.62933G/s items_per_second=199.87M/s __llvm_libc::memmove,memmove Google A BM_Memmove/1/0_median 6.21 ns 6.21 ns 10 bytes_per_cycle=3.22173/s bytes_per_second=6.75106G/s items_per_second=160.955M/s __llvm_libc::memmove,memmove Google B BM_Memmove/2/0_median 8.09 ns 8.09 ns 10 bytes_per_cycle=5.31462/s bytes_per_second=11.1366G/s items_per_second=123.603M/s __llvm_libc::memmove,memmove Google D BM_Memmove/3/0_median 5.95 ns 5.95 ns 10 bytes_per_cycle=2.71865/s bytes_per_second=5.69687G/s items_per_second=167.967M/s __llvm_libc::memmove,memmove Google L BM_Memmove/4/0_median 5.63 ns 5.63 ns 10 bytes_per_cycle=2.28294/s bytes_per_second=4.78383G/s items_per_second=177.615M/s __llvm_libc::memmove,memmove Google M BM_Memmove/5/0_median 5.68 ns 5.68 ns 10 bytes_per_cycle=2.16798/s bytes_per_second=4.54295G/s items_per_second=176.015M/s __llvm_libc::memmove,memmove Google Q BM_Memmove/6/0_median 7.46 ns 7.46 ns 10 bytes_per_cycle=3.97619/s bytes_per_second=8.332G/s items_per_second=134.044M/s __llvm_libc::memmove,memmove Google S BM_Memmove/7/0_median 5.40 ns 5.40 ns 10 bytes_per_cycle=1.79695/s bytes_per_second=3.76546G/s items_per_second=185.211M/s __llvm_libc::memmove,memmove Google U BM_Memmove/8/0_median 5.62 ns 5.62 ns 10 bytes_per_cycle=3.18747/s bytes_per_second=6.67927G/s items_per_second=177.983M/s __llvm_libc::memmove,memmove Google W BM_Memmove/9/0_median 101 ns 101 ns 10 bytes_per_cycle=9.77359/s bytes_per_second=20.4803G/s items_per_second=9.9333M/s __llvm_libc::memmove,uniform 384 to 4096 ``` After ``` BM_Memmove/0/0_median 3.57 ns 3.57 ns 10 bytes_per_cycle=1.71375/s bytes_per_second=3.59112G/s items_per_second=280.411M/s __llvm_libc::memmove,memmove Google A BM_Memmove/1/0_median 4.52 ns 4.52 ns 10 bytes_per_cycle=4.47557/s bytes_per_second=9.37843G/s items_per_second=221.427M/s __llvm_libc::memmove,memmove Google B BM_Memmove/2/0_median 5.70 ns 5.70 ns 10 bytes_per_cycle=7.37396/s bytes_per_second=15.4519G/s items_per_second=175.399M/s __llvm_libc::memmove,memmove Google D BM_Memmove/3/0_median 4.47 ns 4.47 ns 10 bytes_per_cycle=3.4148/s bytes_per_second=7.15563G/s items_per_second=223.743M/s __llvm_libc::memmove,memmove Google L BM_Memmove/4/0_median 4.53 ns 4.53 ns 10 bytes_per_cycle=2.86071/s bytes_per_second=5.99454G/s items_per_second=220.69M/s __llvm_libc::memmove,memmove Google M BM_Memmove/5/0_median 4.19 ns 4.19 ns 10 bytes_per_cycle=2.5484/s bytes_per_second=5.3401G/s items_per_second=238.924M/s __llvm_libc::memmove,memmove Google Q BM_Memmove/6/0_median 5.02 ns 5.02 ns 10 bytes_per_cycle=5.94164/s bytes_per_second=12.4505G/s items_per_second=199.14M/s __llvm_libc::memmove,memmove Google S BM_Memmove/7/0_median 4.03 ns 4.03 ns 10 bytes_per_cycle=2.47028/s bytes_per_second=5.17641G/s items_per_second=247.906M/s __llvm_libc::memmove,memmove Google U BM_Memmove/8/0_median 4.70 ns 4.70 ns 10 bytes_per_cycle=3.84975/s bytes_per_second=8.06706G/s items_per_second=212.72M/s __llvm_libc::memmove,memmove Google W BM_Memmove/9/0_median 90.7 ns 90.7 ns 10 bytes_per_cycle=10.8681/s bytes_per_second=22.7739G/s items_per_second=11.02M/s __llvm_libc::memmove,uniform 384 to 4096 ``` Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D152811	2023-06-14 08:29:15 +00:00
Guillaume Chatelet	5e32765c15	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-12 13:47:16 +00:00
Guillaume Chatelet	1ec995cc1c	Revert D148717 "[libc] Improve memcmp latency and codegen" This broke aarch64 debug buildbot https://lab.llvm.org/buildbot/#/builders/223/builds/21703 This reverts commit `bd4f978754`.	2023-06-12 08:32:00 +00:00
Guillaume Chatelet	bd4f978754	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-12 07:56:23 +00:00
Guillaume Chatelet	e49a608511	Revert D148717 "[libc] Improve memcmp latency and codegen" This reverts commit `9ec6ebd3ce`. The patch broke RISCV and aarch64 builtbots.	2023-06-05 09:50:30 +00:00
Guillaume Chatelet	9ec6ebd3ce	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-05 09:46:05 +00:00
Guillaume Chatelet	c76a3e795e	[libc][NFC] Fixing various typos	2023-05-31 12:11:09 +00:00
Roland McGrath	1d4e8f0ea6	[libc] Fix compilation issues in memory_check_utils.h Strict warnings require explicit static_cast to counteract default widening of types narrower than int. Functions in header files should have vague linkage (inline keyword), not internal linkage (static) or external linkage (no inline keyword) even for template functions. Note these don't use the LIBC_INLINE macro since this is only for test code. Reviewed By: abrachet Differential Revision: https://reviews.llvm.org/D151494	2023-05-25 14:09:14 -07:00
Guillaume Chatelet	298843cd66	[libc][test] Drastically reduce mem test runtime Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D151450	2023-05-25 15:05:21 +00:00
Tue Ly	111d274841	[libc][math] Implement double precision log2 function correctly rounded to all rounding modes. Implement double precision log2 function correctly rounded to all rounding modes. See https://reviews.llvm.org/D150014 for a more detail description of the algorithm. Performance - For `0.5 <= x <= 2`, the fast pass hitting rate is about 99.91%. - Reciprocal throughput from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 15.458 + 0.204 clc/call; Median-Min = 0.224 clc/call; Max = 15.867 clc/call; -- CORE-MATH reciprocal throughput -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 23.711 + 0.524 clc/call; Median-Min = 0.443 clc/call; Max = 25.307 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 14.807 + 0.199 clc/call; Median-Min = 0.211 clc/call; Max = 15.137 clc/call; -- LIBC reciprocal throughput -- with FMA [####################] 100 % Ntrial = 20 ; Min = 17.666 + 0.274 clc/call; Median-Min = 0.298 clc/call; Max = 18.531 clc/call; -- LIBC reciprocal throughput -- without FMA [####################] 100 % Ntrial = 20 ; Min = 26.534 + 0.418 clc/call; Median-Min = 0.462 clc/call; Max = 27.327 clc/call; ``` - Latency from CORE-MATH's perf tool on Ryzen 5900X: ``` $ ./perf.sh log2 --latency GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 46.048 + 1.643 clc/call; Median-Min = 1.694 clc/call; Max = 48.018 clc/call; -- CORE-MATH latency -- without FMA (-march=x86-64-v2) [####################] 100 % Ntrial = 20 ; Min = 62.333 + 0.138 clc/call; Median-Min = 0.119 clc/call; Max = 62.583 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 45.206 + 1.503 clc/call; Median-Min = 1.467 clc/call; Max = 47.229 clc/call; -- LIBC latency -- with FMA [####################] 100 % Ntrial = 20 ; Min = 43.042 + 0.454 clc/call; Median-Min = 0.484 clc/call; Max = 43.912 clc/call; -- LIBC latency -- without FMA [####################] 100 % Ntrial = 20 ; Min = 57.016 + 1.636 clc/call; Median-Min = 1.655 clc/call; Max = 58.816 clc/call; ``` - Accurate pass latency: ``` $ ./perf.sh log2 --latency --simple_stat GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with FMA 177.632 -- CORE-MATH latency -- without FMA (-march=x86-64-v2) 231.332 -- LIBC latency -- with FMA 459.751 -- LIBC latency -- without FMA 463.850 ``` Reviewed By: zimmermann6 Differential Revision: https://reviews.llvm.org/D150374	2023-05-23 10:49:30 -04:00
Guillaume Chatelet	f4a3549250	[libc] Add optimized memcpy for RISCV This patch adds two versions of memcpy optimized for architectures where unaligned accesses are either illegal or extremely slow. It is currently enabled for RISCV 64 and RISCV 32 but it could be used for ARM 32 architectures as well. Here is the before / after output of `libc.benchmarks.memory_functions.opt_host --benchmark_filter=BM_Memcpy` on a quad core Linux starfive RISCV 64 board running at 1.5GHz. Before: ``` Run on (4 X 1500 MHz CPU s) CPU Caches: L1 Instruction 32 KiB (x4) L1 Data 32 KiB (x4) L2 Unified 2048 KiB (x1) ------------------------------------------------------------------------ Benchmark Time CPU Iterations UserCounters... ------------------------------------------------------------------------ BM_Memcpy/0/0 474 ns 474 ns 1483776 bytes_per_cycle=0.243492/s bytes_per_second=348.318M/s items_per_second=2.11097M/s __llvm_libc::memcpy,memcpy Google A BM_Memcpy/1/0 210 ns 209 ns 3649536 bytes_per_cycle=0.233819/s bytes_per_second=334.481M/s items_per_second=4.77519M/s __llvm_libc::memcpy,memcpy Google B BM_Memcpy/2/0 1814 ns 1814 ns 396288 bytes_per_cycle=0.247899/s bytes_per_second=354.622M/s items_per_second=551.402k/s __llvm_libc::memcpy,memcpy Google D BM_Memcpy/3/0 89.3 ns 89.2 ns 7459840 bytes_per_cycle=0.217415/s bytes_per_second=311.014M/s items_per_second=11.2071M/s __llvm_libc::memcpy,memcpy Google L BM_Memcpy/4/0 134 ns 134 ns 3815424 bytes_per_cycle=0.226584/s bytes_per_second=324.131M/s items_per_second=7.44567M/s __llvm_libc::memcpy,memcpy Google M BM_Memcpy/5/0 52.8 ns 52.6 ns 11001856 bytes_per_cycle=0.194893/s bytes_per_second=278.797M/s items_per_second=19.0284M/s __llvm_libc::memcpy,memcpy Google Q BM_Memcpy/6/0 180 ns 180 ns 4101120 bytes_per_cycle=0.231884/s bytes_per_second=331.713M/s items_per_second=5.55957M/s __llvm_libc::memcpy,memcpy Google S BM_Memcpy/7/0 195 ns 195 ns 3906560 bytes_per_cycle=0.232951/s bytes_per_second=333.239M/s items_per_second=5.1217M/s __llvm_libc::memcpy,memcpy Google U BM_Memcpy/8/0 152 ns 152 ns 4789248 bytes_per_cycle=0.227507/s bytes_per_second=325.452M/s items_per_second=6.58187M/s __llvm_libc::memcpy,memcpy Google W BM_Memcpy/9/0 6036 ns 6033 ns 118784 bytes_per_cycle=0.249158/s bytes_per_second=356.423M/s items_per_second=165.75k/s __llvm_libc::memcpy,uniform 384 to 4096 ``` After: ``` BM_Memcpy/0/0 126 ns 126 ns 5770240 bytes_per_cycle=1.04707/s bytes_per_second=1.46273G/s items_per_second=7.9385M/s __llvm_libc::memcpy,memcpy Google A BM_Memcpy/1/0 75.1 ns 75.0 ns 10204160 bytes_per_cycle=0.691143/s bytes_per_second=988.687M/s items_per_second=13.3289M/s __llvm_libc::memcpy,memcpy Google B BM_Memcpy/2/0 333 ns 333 ns 2174976 bytes_per_cycle=1.39297/s bytes_per_second=1.94596G/s items_per_second=3.00002M/s __llvm_libc::memcpy,memcpy Google D BM_Memcpy/3/0 49.6 ns 49.5 ns 16092160 bytes_per_cycle=0.710161/s bytes_per_second=1015.89M/s items_per_second=20.1844M/s __llvm_libc::memcpy,memcpy Google L BM_Memcpy/4/0 57.7 ns 57.7 ns 11213824 bytes_per_cycle=0.561557/s bytes_per_second=803.314M/s items_per_second=17.3228M/s __llvm_libc::memcpy,memcpy Google M BM_Memcpy/5/0 48.0 ns 47.9 ns 16437248 bytes_per_cycle=0.346708/s bytes_per_second=495.97M/s items_per_second=20.8571M/s __llvm_libc::memcpy,memcpy Google Q BM_Memcpy/6/0 67.5 ns 67.5 ns 10616832 bytes_per_cycle=0.614173/s bytes_per_second=878.582M/s items_per_second=14.8142M/s __llvm_libc::memcpy,memcpy Google S BM_Memcpy/7/0 84.7 ns 84.6 ns 10480640 bytes_per_cycle=0.819077/s bytes_per_second=1.14424G/s items_per_second=11.8174M/s __llvm_libc::memcpy,memcpy Google U BM_Memcpy/8/0 61.7 ns 61.6 ns 11191296 bytes_per_cycle=0.550078/s bytes_per_second=786.893M/s items_per_second=16.2279M/s __llvm_libc::memcpy,memcpy Google W BM_Memcpy/9/0 981 ns 981 ns 703488 bytes_per_cycle=1.52333/s bytes_per_second=2.12807G/s items_per_second=1019.81k/s __llvm_libc::memcpy,uniform 384 to 4096 ``` It is not as good as glibc for now so there's room for improvement. I suspect a path pumping 16 bytes at once given the doubled numbers for large copies. ``` BM_Memcpy/0/1 146 ns 82.5 ns 8576000 bytes_per_cycle=1.35236/s bytes_per_second=1.88922G/s items_per_second=12.1169M/s glibc memcpy,memcpy Google A BM_Memcpy/1/1 112 ns 63.7 ns 10634240 bytes_per_cycle=0.628018/s bytes_per_second=898.387M/s items_per_second=15.702M/s glibc memcpy,memcpy Google B BM_Memcpy/2/1 315 ns 180 ns 4079616 bytes_per_cycle=2.65229/s bytes_per_second=3.7052G/s items_per_second=5.54764M/s glibc memcpy,memcpy Google D BM_Memcpy/3/1 85.3 ns 43.1 ns 15854592 bytes_per_cycle=0.774164/s bytes_per_second=1107.45M/s items_per_second=23.2249M/s glibc memcpy,memcpy Google L BM_Memcpy/4/1 105 ns 54.3 ns 13427712 bytes_per_cycle=0.7793/s bytes_per_second=1114.8M/s items_per_second=18.4109M/s glibc memcpy,memcpy Google M BM_Memcpy/5/1 77.1 ns 43.2 ns 16476160 bytes_per_cycle=0.279808/s bytes_per_second=400.269M/s items_per_second=23.1428M/s glibc memcpy,memcpy Google Q BM_Memcpy/6/1 112 ns 62.7 ns 11236352 bytes_per_cycle=0.676078/s bytes_per_second=967.137M/s items_per_second=15.9387M/s glibc memcpy,memcpy Google S BM_Memcpy/7/1 131 ns 65.5 ns 11751424 bytes_per_cycle=0.965616/s bytes_per_second=1.34895G/s items_per_second=15.2762M/s glibc memcpy,memcpy Google U BM_Memcpy/8/1 104 ns 55.0 ns 12314624 bytes_per_cycle=0.583336/s bytes_per_second=834.468M/s items_per_second=18.1937M/s glibc memcpy,memcpy Google W BM_Memcpy/9/1 932 ns 466 ns 1480704 bytes_per_cycle=3.17342/s bytes_per_second=4.43321G/s items_per_second=2.14679M/s glibc memcpy,uniform 384 to 4096 ``` Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D150202	2023-05-10 08:42:07 +00:00
Joseph Huber	632fa3798c	[libc] Enable running libc unit tests on AMDGPU The previous patches added the necessary support for global constructors used to register tests. This patch enables the AMDGPU target to build and run the unit tests on the GPU. Currently this only tests the `ctype` tests, but adding more should be straightforward from here on. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D149517	2023-05-04 06:32:52 -05:00
Joseph Huber	644b63bd31	[libc] Add 'UNIT_TEST_ONLY' and 'HERMETIC_TEST_ONLY' to 'add_libc_test' The `add_libc_test` option allows us to enable both kinds of tests in a single option. However, some tests cannot be made hermetic with the current approach. Such as tests that rely on system utilities or libraries. This patch adds two options `UNIT_TEST_ONLY` and `HERMETIC_TEST_ONLY` to offer more fine-grained control over which version gets built. This makes it explicit which version a test supports and why. Depends on D149662 Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D149691	2023-05-02 18:51:12 -05:00

1 2 3 4 5

220 Commits