clang-p2996

Author	SHA1	Message	Date
Jay Foad	92542f2a40	[AMDGPU] Add targets gfx1150 and gfx1151 This is the target definition only. Currently they are treated the same as GFX 11.0.x. Differential Revision: https://reviews.llvm.org/D155429	2023-07-17 13:06:12 +01:00
Guillaume Chatelet	b38dda74fa	[libc][NFC] Split memcmp implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155181	2023-07-17 11:35:31 +00:00
Guillaume Chatelet	83f3920854	[libc][NFC] Split memset implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155174	2023-07-17 11:12:19 +00:00
Guillaume Chatelet	8cc440b3e7	[libc][NFC] Split memcpy implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155099	2023-07-13 10:30:38 +00:00
Guillaume Chatelet	1c4e4e03bd	[libc][NFC] Split bcmp implementations per platform This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif. Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D155076	2023-07-13 10:19:00 +00:00
Dominic Chen	50414422ac	[libc][math] Fix floating-point test support on x86_64 Apple machines Provide platform-specific x87 FPU definitions and operations Differential Revision: https://reviews.llvm.org/D153823	2023-07-12 00:38:45 -07:00
Joseph Huber	a608076726	[libc][Obvious] Check if the state hasn't already been destroyed on shutdown This ensures that if someone calls the `rpc_shutdown` method multiple times it will not segfault and gracefully continue. This was causing problems in the OpenMP usage. This could point to other issues, but for now this is a safe fix. Differential Revision: https://reviews.llvm.org/D155005	2023-07-11 14:35:38 -05:00
Michael Jones	2cb4731902	[libc] adjust strtofloat precision for subnormals Subnormal floating point numbers have a lower effective precision than normal floating point numbers. This can cause issues for the fuzz test since the MPFR floats have a constant precision regardless of the exponent, and the precision must match exactly or else create rounding errors. To solve this problem, the precision of the MPFR floats is dynamically calculated. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D154909	2023-07-11 11:27:19 -07:00
Joseph Huber	a4f553fcde	[libc] Fix using the `libcgpu.a` for NVPTX in non-LTO builds CUDA requires a PTX feature to be compiled generally, because the `libcgpu.a` archive contains LLVM-IR we need to have one present to compile it. Currently, the wrapper fatbinary format we use to incorporate these into single-source offloading languages has a special option to provide this. Since this was not present in the builds, if the user did not specify it via `-foffload-lto` it would not compile from CUDA or OpenMP due to the missing PTX features. Fix this by passing it to the packager invocation. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154864	2023-07-10 13:54:47 -05:00
Joseph Huber	b454e7aa7c	[libc] Remove GPU string functions incompatible with C++ These functions have definitions differing between C and C++. GNU respects the C++ definitions while the LLVM libc does not. This causes many bugs and the current hack creates other issues. Rather than hack around this I'd rather temporarily disable these than regress with the integration into other offloading languages. We lose test support for them but we should be able to re-enable these once the `libc` headers provide these correctly. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154850	2023-07-10 10:40:10 -05:00
Petr Hosek	0ab14951db	[NFC][libc] Use the new style includes for tests This was accidentally omitted from D154746.	2023-07-10 07:42:13 +00:00
Petr Hosek	36c15be20b	[libc] Use LIBC_INCLUDE_DIR in CMake rules D152592 introduced LIBC_INCLUDE_DIR for the location of the include directory, use it in relevant CMake rules. Differential Revision: https://reviews.llvm.org/D154278	2023-07-10 07:32:24 +00:00
Guillaume Chatelet	bfd94882f2	[libc][NFC] Move aligned access implementations to separate header Follow up on https://reviews.llvm.org/D154770 Differential Revision: https://reviews.llvm.org/D154800	2023-07-09 22:17:05 +00:00
Guillaume Chatelet	dbaa5838c1	[libc][NFC] Move memfunction's byte per byte implementations to a separate header There will be subsequent patches to move things around and make the file layout more principled. Differential Revision: https://reviews.llvm.org/D154770	2023-07-09 07:21:58 +00:00
Petr Hosek	fb149e4beb	[libc] Use the new style includes for tests This is a follow up to D154529 covering tests. Differential Revision: https://reviews.llvm.org/D154746	2023-07-08 05:15:44 +00:00
Petr Hosek	9654bc3960	Revert "[libc] Set include directories for the str_to_float test" This reverts commit `147c0640a3` since it broke GPU builds.	2023-07-07 21:25:23 +00:00
Joseph Huber	2a65d0388c	[libc] Add support for creating wrapper headers for offloading in clang This is an alternate approach to the patches proposed in D153897 and D153794. Rather than exporting a single header that can be included on the GPU in all circumstances, this patch chooses to instead generate a separate set of headers that only provides the declarations. This can then be used by external tooling to set up what's on the GPU. This leaves room for header hacks for offloading languages without needing to worry about the `libc` implementation. Currently this generates a set of headers that only contain the declarations. These will then be installed to a new clang resource directory called `llvm_libc_wrappers/` which will house the shim code. We can then automaticlaly include this from `clang` when offloading to wrap around the headers while specifying what's on the GPU. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D154036	2023-07-07 16:02:33 -05:00
Petr Hosek	bf171aaa7a	Revert "[libc] Use LIBC_INCLUDE_DIR in CMake rules" This reverts commit `6e821f0b3a` since it broke the libc-aarch64-ubuntu-fullbuild-dbg bot.	2023-07-07 20:52:54 +00:00
Petr Hosek	6e821f0b3a	[libc] Use LIBC_INCLUDE_DIR in CMake rules D152592 introduced LIBC_INCLUDE_DIR for the location of the include directory, use it in relevant CMake rules. Differential Revision: https://reviews.llvm.org/D154278	2023-07-07 20:42:25 +00:00
Petr Hosek	147c0640a3	[libc] Set include directories for the str_to_float test This test uses libc headers and need to explicitly include them. Differential Revision: https://reviews.llvm.org/D154277	2023-07-07 20:33:54 +00:00
Joseph Huber	691dc2d10d	[Libomptarget] Begin implementing support for RPC services This patch adds the intial support for running an RPC server in libomptarget to handle host services. We interface with the library provided by the `libc` project to stand up a basic server. We introduce a new type that is controlled by the plugin and has each device intialize its interface. We then run a basic server to check the RPC buffer. This patch does not fully implement the interface. In the future each plugin will want to define special handlers via the interface to support things like malloc or H2D copies coming from RPC. We will also want to allow the plugin to specify t he number of ports. This is currently capped in the implementation but will be adjusted soon. Right now running the server is handled by whatever thread ends up doing the waiting. This is probably not a completely sound solution but I am not overly familiar with the behaviour of OpenMP tasks and what would be required here. This works okay with synchrnous regions, and somewhat fine with `nowait` regions, but I've observed some weird behavior when one of those regions calls `exit`. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154312	2023-07-07 12:36:46 -05:00
Joseph Huber	c012eb79e2	[libc] Enable aliasing on AMDGPU targets AMDGPU supports aliases now, so we can drop this case and leave it only for the NVPTX target. Unfortunately it's unlikely that NVPTX will be able to support this in the future due to their PTX language being very limited. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D154704	2023-07-07 11:49:16 -05:00
Guillaume Chatelet	cb1468d3cb	[libc] Adding a version of memcpy w/ software prefetching For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high. In this case it is desirable to turn off the hardware prefetcher completely. This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available. This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level. Reviewed By: rtenneti Differential Revision: https://reviews.llvm.org/D154494	2023-07-07 10:37:32 +00:00
Joseph Huber	6ca6cdb23e	Revert "[libc] Add support for creating wrapper headers for offloading in clang" This reverts commit `a4a26374aa`. This was causing some problems with the CPU build and CUDA buildbot. Revert until I can figure out what those issues are and fix them. I believe it is just some CMake.	2023-07-06 18:26:41 -05:00
Joseph Huber	a4a26374aa	[libc] Add support for creating wrapper headers for offloading in clang This is an alternate approach to the patches proposed in D153897 and D153794. Rather than exporting a single header that can be included on the GPU in all circumstances, this patch chooses to instead generate a separate set of headers that only provides the declarations. This can then be used by external tooling to set up what's on the GPU. This leaves room for header hacks for offloading languages without needing to worry about the `libc` implementation. Currently this generates a set of headers that only contain the declarations. These will then be installed to a new clang resource directory called `llvm_libc_wrappers/` which will house the shim code. We can then automaticlaly include this from `clang` when offloading to wrap around the headers while specifying what's on the GPU. Reviewed By: jdoerfert, JonChesterfield Differential Revision: https://reviews.llvm.org/D154036	2023-07-06 18:10:49 -05:00
Joseph Huber	c850ea1498	[libc] Support fopen / fclose on the GPU This patch adds the necessary support for the fopen and fclose functions to work on the GPU via RPC. I added a new test that enables testing this with the minimal features we have on the GPU. I will update it once we have `fread` and `fwrite` to actually check the outputted strings. For now I just relied on checking manually via the outpuot temp file. Reviewed By: JonChesterfield, sivachandra Differential Revision: https://reviews.llvm.org/D154519	2023-07-05 18:31:58 -05:00
Joseph Huber	7e88e26d38	[libc] Add GPU support for the 'inttypes.h' functions Another low hanging fruit we can put on the GPU, this ports the tests over to the hermetic framework so we can run them on the GPU. Reviewed By: sivachandra Differential Revision: https://reviews.llvm.org/D154540	2023-07-05 17:47:10 -05:00
Joseph Huber	515bd1c9b8	[libc][Obvious] Fix timing on AMDGPU not being initialized Summary: Reviewer requested that this routine not be a macro, however that means that it was not being intitialized as the static initializer was done before the memcpy from the device. Fix this so we can get timing information.	2023-07-05 16:08:37 -05:00
Joseph Huber	80504b06ad	[libc][Obvious] Fix bad macro check on NVPTX tests Summary: I forgot to add the `defined()` check on NVPTX.	2023-07-05 15:54:12 -05:00
Joseph Huber	5db39796bf	[libc] Support timing information in libc tests This patch adds the necessary support to provide timing information in `libc` tests. This is useful for determining which tests look what amount of time. We also can use this as a test basis for providing more fine-grained timing when implementing things on the GPU. The main difficulty with this is the fact that the AMDGPU fixed frequency clock operates at an unknown frequency. We need to read this on a per-card basis from the driver and then copy it in. NVPTX on the other hand has a fixed clock at a resolution of 1ns. I have also increased the resolution of the print-outs as the majority of these are below a millisecond for me. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154446	2023-07-05 14:27:08 -05:00
Michael Jones	cfbcbc8f88	[libc] fix MPFR rounding problems in fuzz test The accuracy for the MPFR numbers in the strtofloat fuzz test was set too high, causing rounding issues when rounding to a smaller final result. Reviewed By: lntue Differential Revision: https://reviews.llvm.org/D154150	2023-07-05 10:53:40 -07:00
Petr Hosek	8910cc2742	[libc] Use the new style includes We should be using the standard includes. Differential Revision: https://reviews.llvm.org/D154529	2023-07-05 17:51:41 +00:00
Petr Hosek	e1cb5924cb	Revert "[libc] Use LIBC_INCLUDE_DIR in CMake rules" This reverts commit `046deabd93` since it broke libc-aarch64-ubuntu-fullbuild-dbg.	2023-07-05 17:20:11 +00:00
Petr Hosek	046deabd93	[libc] Use LIBC_INCLUDE_DIR in CMake rules D152592 introduced LIBC_INCLUDE_DIR for the location of the include directory, use it in relevant CMake rules. Differential Revision: https://reviews.llvm.org/D154278	2023-07-05 17:16:19 +00:00
Petr Hosek	80368a104e	[libc] Check if the hermetic test target exists When crt1 isn't available, which is typical on baremetal, hermetic tests aren't created and the hermetic test target won't be available. Differential Revision: https://reviews.llvm.org/D154279	2023-07-05 17:09:01 +00:00
Siva Chandra	3db36d6a9b	[libc] Initiliaze the global pointer in riscv startup code. Reviewed By: mikhail.ramalho Differential Revision: https://reviews.llvm.org/D151539	2023-07-05 07:32:31 +00:00
Joseph Huber	f8cf210576	[libc] Remove flaky static assert from RPC interface Summary: This function is intended to only be used on the GPU as a shorthand. The static assert should only fire if it's called ,but it seems that its precence can sometimes cause issues and other times not. Simply remove it as it's causing build problems.	2023-07-04 11:06:06 -05:00
Alfred Persson Forsberg	cae84d8acf	[libc] Correct usage of __unix__ and __linux__ Reviewed By: michaelrj, thesamesam Differential Revision: https://reviews.llvm.org/D153729	2023-07-03 01:08:15 +01:00
Petr Hosek	1c241bb791	[libc] Missing FEnvImpl.h dependency on math.h FEnvImpl.h includes math.h and so needs an explicit dependency. Differential Revision: https://reviews.llvm.org/D154044	2023-07-01 18:27:36 +00:00
Roland McGrath	5bf8efd269	[libc] Fix more inline definitions Fix a bunch more instances of incorrect use of the `static` keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR macros. Note that even forward declarations and generic template declarations must follow the prescribed patterns for libc code so that they match every definition, all template specializations. Reviewed By: Caslyn Differential Revision: https://reviews.llvm.org/D154260	2023-06-30 14:46:25 -07:00
Roland McGrath	dbd38b1219	[libc] Add missing cast in x86 big_endian_cmp_mask Implicit narrowing conversions from int to uint16_t get a compiler warning with the warning settings used in the Fuchsia build. Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D154256	2023-06-30 14:15:59 -07:00
Joseph Huber	df52a22b1b	[libc] Make the RPC server target always available This patch makes sure that we always build the RPC server. The proposed used for this is to begin integrating this server implementation into `libomptarget`. That requires that we build this server ahead of time when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure that the GCC compiler which may be used for this build doesn't complain. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154105	2023-06-30 11:30:57 -05:00
Joseph Huber	62f57bc9b0	[libc] Add other RPC callback methods to the RPC server This patch adds the other two methods to the server so the external users can use the interface through the obfuscated interface. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154224	2023-06-30 11:29:37 -05:00
Guillaume Chatelet	1c814c99aa	[libc] Improve memcmp latency and codegen This is based on ideas from @nafi to: - use a branchless version of 'cmp' for 'uint32_t', - completely resolve the lexicographic comparison through vector operations when wide types are available. We also get rid of byte reloads and serializing '__builtin_ctzll'. I did not include the suggestion to replace comparisons of 'uint16_t' with two 'uint8_t' as it did not seem to help the codegen. This can be revisited in sub-sequent patches. The code been rewritten to reduce nested function calls, making the job of the inliner easier and preventing harmful code duplication. Reviewed By: nafi3000 Differential Revision: https://reviews.llvm.org/D148717	2023-06-30 13:00:58 +00:00
Joseph Huber	b15ac1fd89	[libc] Enable the 'div' routines on the GPU This patch simply enables the `div`, `ldiv,` and, `lldiv` functions on the GPU. This should be straightforward enough. Reviewed By: jdoerfert Differential Revision: https://reviews.llvm.org/D154143	2023-06-29 15:42:46 -05:00
Joseph Huber	667c10353e	[libc] Fix the implementation of exit on the GPU The RPC calls all have delays associated with them. Currently the `exit` function does an async send and immediately exits the GPU. This can have the effect that the RPC server never sees the exit call and we continue. This patch changes that to first sync with the server before continuing to perform its exit. There is still a hazard here, where the kernel can complete before the RPC call reads back its response, but this is simply multi-threaded hazards. This change ensures that the server will always exit some time after the GPU exits. Reviewed By: JonChesterfield Differential Revision: https://reviews.llvm.org/D154112	2023-06-29 13:22:23 -05:00
Guillaume Chatelet	177583c914	[libc][NFC] Use SIZE_MAX instead of size_t(-1)	2023-06-29 12:21:43 +00:00
Tue Ly	de19101e33	[libc][NFC] Set rounding mode for sincosf exhaustive test.	2023-06-28 20:30:54 -04:00
Tue Ly	f320fefc4a	[libc][math] Implement erff function correctly rounded to all rounding modes. Implement correctly rounded `erff` functions. For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`. For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval: ``` erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14). ``` For `x < 0`, we can use the same formula as above, since the odd part is factored out. Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X: Reciprocal throughput (clock cycles / op) ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH reciprocal throughput -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call; -- CORE-MATH reciprocal throughput -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call; -- System LIBC reciprocal throughput -- [####################] 100 % Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call; -- LIBC reciprocal throughput -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call; -- LIBC reciprocal throughput -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call; ``` and latency (clock cycles / op): ``` $ ./perf.sh erff --path2 GNU libc version: 2.35 GNU libc release: stable -- CORE-MATH latency -- with -march=native (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call; -- CORE-MATH latency -- with -march=x86-64-v2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call; -- System LIBC latency -- [####################] 100 % Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call; -- LIBC latency -- with -mavx2 -mfma (with FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call; -- LIBC latency -- with -msse4.2 (without FMA instructions) [####################] 100 % Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call; ``` Reviewed By: michaelrj Differential Revision: https://reviews.llvm.org/D153683	2023-06-28 13:58:37 -04:00
Guillaume Chatelet	b3b54131d0	[libc][NFC] Separate avx/no-avx x86 memcpy implementations Reviewed By: courbet Differential Revision: https://reviews.llvm.org/D153958	2023-06-28 13:56:56 +00:00

1 2 3 4 5 ...

2009 Commits