Commit Graph

2009 Commits

Author SHA1 Message Date
Jay Foad
92542f2a40 [AMDGPU] Add targets gfx1150 and gfx1151
This is the target definition only. Currently they are treated the same
as GFX 11.0.x.

Differential Revision: https://reviews.llvm.org/D155429
2023-07-17 13:06:12 +01:00
Guillaume Chatelet
b38dda74fa [libc][NFC] Split memcmp implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155181
2023-07-17 11:35:31 +00:00
Guillaume Chatelet
83f3920854 [libc][NFC] Split memset implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155174
2023-07-17 11:12:19 +00:00
Guillaume Chatelet
8cc440b3e7 [libc][NFC] Split memcpy implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155099
2023-07-13 10:30:38 +00:00
Guillaume Chatelet
1c4e4e03bd [libc][NFC] Split bcmp implementations per platform
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.

Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D155076
2023-07-13 10:19:00 +00:00
Dominic Chen
50414422ac [libc][math] Fix floating-point test support on x86_64 Apple machines
Provide platform-specific x87 FPU definitions and operations

Differential Revision: https://reviews.llvm.org/D153823
2023-07-12 00:38:45 -07:00
Joseph Huber
a608076726 [libc][Obvious] Check if the state hasn't already been destroyed on shutdown
This ensures that if someone calls the `rpc_shutdown` method multiple
times it will not segfault and gracefully continue. This was causing
problems in the OpenMP usage. This could point to other issues, but for
now this is a safe fix.

Differential Revision: https://reviews.llvm.org/D155005
2023-07-11 14:35:38 -05:00
Michael Jones
2cb4731902 [libc] adjust strtofloat precision for subnormals
Subnormal floating point numbers have a lower effective precision than
normal floating point numbers. This can cause issues for the fuzz test
since the MPFR floats have a constant precision regardless of the
exponent, and the precision must match exactly or else create rounding
errors. To solve this problem, the precision of the MPFR floats is
dynamically calculated.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D154909
2023-07-11 11:27:19 -07:00
Joseph Huber
a4f553fcde [libc] Fix using the libcgpu.a for NVPTX in non-LTO builds
CUDA requires a PTX feature to be compiled generally, because the
`libcgpu.a` archive contains LLVM-IR we need to have one present to
compile it. Currently, the wrapper fatbinary format we use to
incorporate these into single-source offloading languages has a special
option to provide this. Since this was not present in the builds, if the
user did not specify it via `-foffload-lto` it would not compile from
CUDA or OpenMP due to the missing PTX features. Fix this by passing it
to the packager invocation.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D154864
2023-07-10 13:54:47 -05:00
Joseph Huber
b454e7aa7c [libc] Remove GPU string functions incompatible with C++
These functions have definitions differing between C and C++. GNU
respects the C++ definitions while the LLVM libc does not. This causes
many bugs and the current hack creates other issues. Rather than hack
around this I'd rather temporarily disable these than regress with the
integration into other offloading languages. We lose test support for
them but we should be able to re-enable these once the `libc` headers
provide these correctly.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154850
2023-07-10 10:40:10 -05:00
Petr Hosek
0ab14951db [NFC][libc] Use the new style includes for tests
This was accidentally omitted from D154746.
2023-07-10 07:42:13 +00:00
Petr Hosek
36c15be20b [libc] Use LIBC_INCLUDE_DIR in CMake rules
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.

Differential Revision: https://reviews.llvm.org/D154278
2023-07-10 07:32:24 +00:00
Guillaume Chatelet
bfd94882f2 [libc][NFC] Move aligned access implementations to separate header
Follow up on https://reviews.llvm.org/D154770

Differential Revision: https://reviews.llvm.org/D154800
2023-07-09 22:17:05 +00:00
Guillaume Chatelet
dbaa5838c1 [libc][NFC] Move memfunction's byte per byte implementations to a separate header
There will be subsequent patches to move things around and make the file layout more principled.

Differential Revision: https://reviews.llvm.org/D154770
2023-07-09 07:21:58 +00:00
Petr Hosek
fb149e4beb [libc] Use the new style includes for tests
This is a follow up to D154529 covering tests.

Differential Revision: https://reviews.llvm.org/D154746
2023-07-08 05:15:44 +00:00
Petr Hosek
9654bc3960 Revert "[libc] Set include directories for the str_to_float test"
This reverts commit 147c0640a3 since
it broke GPU builds.
2023-07-07 21:25:23 +00:00
Joseph Huber
2a65d0388c [libc] Add support for creating wrapper headers for offloading in clang
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.

Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D154036
2023-07-07 16:02:33 -05:00
Petr Hosek
bf171aaa7a Revert "[libc] Use LIBC_INCLUDE_DIR in CMake rules"
This reverts commit 6e821f0b3a since
it broke the libc-aarch64-ubuntu-fullbuild-dbg bot.
2023-07-07 20:52:54 +00:00
Petr Hosek
6e821f0b3a [libc] Use LIBC_INCLUDE_DIR in CMake rules
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.

Differential Revision: https://reviews.llvm.org/D154278
2023-07-07 20:42:25 +00:00
Petr Hosek
147c0640a3 [libc] Set include directories for the str_to_float test
This test uses libc headers and need to explicitly include them.

Differential Revision: https://reviews.llvm.org/D154277
2023-07-07 20:33:54 +00:00
Joseph Huber
691dc2d10d [Libomptarget] Begin implementing support for RPC services
This patch adds the intial support for running an RPC server in
libomptarget to handle host services. We interface with the library
provided by the `libc` project to stand up a basic server. We introduce
a new type that is controlled by the plugin and has each device
intialize its interface. We then run a basic server to check the RPC
buffer.

This patch does not fully implement the interface. In the future each
plugin will want to define special handlers via the interface to support
things like malloc or H2D copies coming from RPC. We will also want to
allow the plugin to specify t he number of ports. This is currently
capped in the implementation but will be adjusted soon.

Right now running the server is handled by whatever thread ends up doing
the waiting. This is probably not a completely sound solution but I am
not overly familiar with the behaviour of OpenMP tasks and what would be
required here. This works okay with synchrnous regions, and somewhat
fine with `nowait` regions, but I've observed some weird behavior when
one of those regions calls `exit`.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D154312
2023-07-07 12:36:46 -05:00
Joseph Huber
c012eb79e2 [libc] Enable aliasing on AMDGPU targets
AMDGPU supports aliases now, so we can drop this case and leave it only
for the NVPTX target. Unfortunately it's unlikely that NVPTX will be
able to support this in the future due to their PTX language being very
limited.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D154704
2023-07-07 11:49:16 -05:00
Guillaume Chatelet
cb1468d3cb [libc] Adding a version of memcpy w/ software prefetching
For machines with a lot of cores, hardware prefetchers can saturate the memory bus when utilization is high.
In this case it is desirable to turn off the hardware prefetcher completely.
This has a big impact on the performance of memory functions such as `memcpy` that rely on the fact that the next cache line will be readily available.

This patch adds the 'LIBC_COPT_MEMCPY_X86_USE_SOFTWARE_PREFETCHING' compile time option that generates a version of memcpy with software prefetching. While not fully restoring the original performances it mitigates the impact to an acceptable level.

Reviewed By: rtenneti

Differential Revision: https://reviews.llvm.org/D154494
2023-07-07 10:37:32 +00:00
Joseph Huber
6ca6cdb23e Revert "[libc] Add support for creating wrapper headers for offloading in clang"
This reverts commit a4a26374aa.

This was causing some problems with the CPU build and CUDA buildbot.
Revert until I can figure out what those issues are and fix them. I
believe it is just some CMake.
2023-07-06 18:26:41 -05:00
Joseph Huber
a4a26374aa [libc] Add support for creating wrapper headers for offloading in clang
This is an alternate approach to the patches proposed in D153897 and
D153794. Rather than exporting a single header that can be included on
the GPU in all circumstances, this patch chooses to instead generate a
separate set of headers that only provides the declarations. This can
then be used by external tooling to set up what's on the GPU. This
leaves room for header hacks for offloading languages without needing to
worry about the `libc` implementation.

Currently this generates a set of headers that only contain the
declarations. These will then be installed to a new clang resource
directory called `llvm_libc_wrappers/` which will house the shim code.
We can then automaticlaly include this from `clang` when offloading to
wrap around the headers while specifying what's on the GPU.

Reviewed By: jdoerfert, JonChesterfield

Differential Revision: https://reviews.llvm.org/D154036
2023-07-06 18:10:49 -05:00
Joseph Huber
c850ea1498 [libc] Support fopen / fclose on the GPU
This patch adds the necessary support for the fopen and fclose functions
to work on the GPU via RPC. I added a new test that enables testing this
with the minimal features we have on the GPU. I will update it once we
have `fread` and `fwrite` to actually check the outputted strings. For
now I just relied on checking manually via the outpuot temp file.

Reviewed By: JonChesterfield, sivachandra

Differential Revision: https://reviews.llvm.org/D154519
2023-07-05 18:31:58 -05:00
Joseph Huber
7e88e26d38 [libc] Add GPU support for the 'inttypes.h' functions
Another low hanging fruit we can put on the GPU, this ports the tests
over to the hermetic framework so we can run them on the GPU.

Reviewed By: sivachandra

Differential Revision: https://reviews.llvm.org/D154540
2023-07-05 17:47:10 -05:00
Joseph Huber
515bd1c9b8 [libc][Obvious] Fix timing on AMDGPU not being initialized
Summary:
Reviewer requested that this routine not be a macro, however that means
that it was not being intitialized as the static initializer was done
before the memcpy from the device. Fix this so we can get timing
information.
2023-07-05 16:08:37 -05:00
Joseph Huber
80504b06ad [libc][Obvious] Fix bad macro check on NVPTX tests
Summary:
I forgot to add the `defined()` check on NVPTX.
2023-07-05 15:54:12 -05:00
Joseph Huber
5db39796bf [libc] Support timing information in libc tests
This patch adds the necessary support to provide timing information in
`libc` tests. This is useful for determining which tests look what
amount of time. We also can use this as a test basis for providing more
fine-grained timing when implementing things on the GPU.

The main difficulty with this is the fact that the AMDGPU fixed
frequency clock operates at an unknown frequency. We need to read this
on a per-card basis from the driver and then copy it in. NVPTX on the
other hand has a fixed clock at a resolution of 1ns. I have also
increased the resolution of the print-outs as the majority of these are
below a millisecond for me.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154446
2023-07-05 14:27:08 -05:00
Michael Jones
cfbcbc8f88 [libc] fix MPFR rounding problems in fuzz test
The accuracy for the MPFR numbers in the strtofloat fuzz test was set
too high, causing rounding issues when rounding to a smaller final
result.

Reviewed By: lntue

Differential Revision: https://reviews.llvm.org/D154150
2023-07-05 10:53:40 -07:00
Petr Hosek
8910cc2742 [libc] Use the new style includes
We should be using the standard includes.

Differential Revision: https://reviews.llvm.org/D154529
2023-07-05 17:51:41 +00:00
Petr Hosek
e1cb5924cb Revert "[libc] Use LIBC_INCLUDE_DIR in CMake rules"
This reverts commit 046deabd93 since
it broke libc-aarch64-ubuntu-fullbuild-dbg.
2023-07-05 17:20:11 +00:00
Petr Hosek
046deabd93 [libc] Use LIBC_INCLUDE_DIR in CMake rules
D152592 introduced LIBC_INCLUDE_DIR for the location of the include
directory, use it in relevant CMake rules.

Differential Revision: https://reviews.llvm.org/D154278
2023-07-05 17:16:19 +00:00
Petr Hosek
80368a104e [libc] Check if the hermetic test target exists
When crt1 isn't available, which is typical on baremetal, hermetic tests
aren't created and the hermetic test target won't be available.

Differential Revision: https://reviews.llvm.org/D154279
2023-07-05 17:09:01 +00:00
Siva Chandra
3db36d6a9b [libc] Initiliaze the global pointer in riscv startup code.
Reviewed By: mikhail.ramalho

Differential Revision: https://reviews.llvm.org/D151539
2023-07-05 07:32:31 +00:00
Joseph Huber
f8cf210576 [libc] Remove flaky static assert from RPC interface
Summary:
This function is intended to only be used on the GPU as a shorthand. The
static assert should only fire if it's called ,but it seems that its
precence can sometimes cause issues and other times not. Simply remove
it as it's causing build problems.
2023-07-04 11:06:06 -05:00
Alfred Persson Forsberg
cae84d8acf [libc] Correct usage of __unix__ and __linux__
Reviewed By: michaelrj, thesamesam

Differential Revision: https://reviews.llvm.org/D153729
2023-07-03 01:08:15 +01:00
Petr Hosek
1c241bb791 [libc] Missing FEnvImpl.h dependency on math.h
FEnvImpl.h includes math.h and so needs an explicit dependency.

Differential Revision: https://reviews.llvm.org/D154044
2023-07-01 18:27:36 +00:00
Roland McGrath
5bf8efd269 [libc] Fix more inline definitions
Fix a bunch more instances of incorrect use of the `static`
keyword and missing use of LIBC_INLINE and LIBC_INLINE_VAR
macros. Note that even forward declarations and generic template
declarations must follow the prescribed patterns for libc code so
that they match every definition, all template specializations.

Reviewed By: Caslyn

Differential Revision: https://reviews.llvm.org/D154260
2023-06-30 14:46:25 -07:00
Roland McGrath
dbd38b1219 [libc] Add missing cast in x86 big_endian_cmp_mask
Implicit narrowing conversions from int to uint16_t
get a compiler warning with the warning settings used
in the Fuchsia build.

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D154256
2023-06-30 14:15:59 -07:00
Joseph Huber
df52a22b1b [libc] Make the RPC server target always available
This patch makes sure that we always build the RPC server. The proposed
used for this is to begin integrating this server implementation into
`libomptarget`. That requires that we build this server ahead of time
when using a `LLVM_ENABLE_PROJECTS` build. Make a few tweaks to ensure
that the GCC compiler which may be used for this build doesn't complain.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154105
2023-06-30 11:30:57 -05:00
Joseph Huber
62f57bc9b0 [libc] Add other RPC callback methods to the RPC server
This patch adds the other two methods to the server so the external
users can use the interface through the obfuscated interface.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154224
2023-06-30 11:29:37 -05:00
Guillaume Chatelet
1c814c99aa [libc] Improve memcmp latency and codegen
This is based on ideas from @nafi to:
 - use a branchless version of 'cmp' for 'uint32_t',
 - completely resolve the lexicographic comparison through vector
   operations when wide types are available. We also get rid of byte
   reloads and serializing '__builtin_ctzll'.

I did not include the suggestion to replace comparisons of 'uint16_t'
with two 'uint8_t' as it did not seem to help the codegen. This can
be revisited in sub-sequent patches.

The code been rewritten to reduce nested function calls, making the
job of the inliner easier and preventing harmful code duplication.

Reviewed By: nafi3000

Differential Revision: https://reviews.llvm.org/D148717
2023-06-30 13:00:58 +00:00
Joseph Huber
b15ac1fd89 [libc] Enable the 'div' routines on the GPU
This patch simply enables the `div`, `ldiv,` and, `lldiv` functions on
the GPU. This should be straightforward enough.

Reviewed By: jdoerfert

Differential Revision: https://reviews.llvm.org/D154143
2023-06-29 15:42:46 -05:00
Joseph Huber
667c10353e [libc] Fix the implementation of exit on the GPU
The RPC calls all have delays associated with them. Currently the `exit`
function does an async send and immediately exits the GPU. This can have
the effect that the RPC server never sees the exit call and we continue.
This patch changes that to first sync with the server before continuing
to perform its exit. There is still a hazard here, where the kernel can
complete before the RPC call reads back its response, but this is simply
multi-threaded hazards. This change ensures that the server *will*
always exit some time after the GPU exits.

Reviewed By: JonChesterfield

Differential Revision: https://reviews.llvm.org/D154112
2023-06-29 13:22:23 -05:00
Guillaume Chatelet
177583c914 [libc][NFC] Use SIZE_MAX instead of size_t(-1) 2023-06-29 12:21:43 +00:00
Tue Ly
de19101e33 [libc][NFC] Set rounding mode for sincosf exhaustive test. 2023-06-28 20:30:54 -04:00
Tue Ly
f320fefc4a [libc][math] Implement erff function correctly rounded to all rounding modes.
Implement correctly rounded `erff` functions.

For `x >= 4`, `erff(x) = 1` for `FE_TONEAREST` or `FE_UPWARD`, `0x1.ffffep-1` for `FE_DOWNWARD` or `FE_TOWARDZERO`.

For `0 <= x < 4`, we divide into 32 sub-intervals of length `1/8`, and use a degree-15 odd polynomial to approximate `erff(x)` in each sub-interval:
```
  erff(x) ~ x * (c0 + c1 * x^2 + c2 * x^4 + ... + c7 * x^14).
```

For `x < 0`, we can use the same formula as above, since the odd part is factored out.

Performance tested with `perf.sh` tool from the CORE-MATH project on AMD Ryzen 9 5900X:

Reciprocal throughput (clock cycles / op)
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH reciprocal throughput --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 11.790 + 0.182 clc/call; Median-Min = 0.154 clc/call; Max = 12.255 clc/call;
-- CORE-MATH reciprocal throughput --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 14.205 + 0.151 clc/call; Median-Min = 0.159 clc/call; Max = 15.893 clc/call;

-- System LIBC reciprocal throughput --
[####################] 100 %
Ntrial = 20 ; Min = 45.519 + 0.445 clc/call; Median-Min = 0.552 clc/call; Max = 46.345 clc/call;

-- LIBC reciprocal throughput --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 9.595 + 0.214 clc/call; Median-Min = 0.220 clc/call; Max = 9.887 clc/call;
-- LIBC reciprocal throughput --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 10.223 + 0.190 clc/call; Median-Min = 0.222 clc/call; Max = 10.474 clc/call;
```

and latency (clock cycles / op):
```
$ ./perf.sh erff --path2
GNU libc version: 2.35
GNU libc release: stable
-- CORE-MATH latency --  with -march=native      (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 38.566 + 0.391 clc/call; Median-Min = 0.503 clc/call; Max = 39.170 clc/call;
-- CORE-MATH latency --  with -march=x86-64-v2      (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 43.223 + 0.667 clc/call; Median-Min = 0.680 clc/call; Max = 43.913 clc/call;

-- System LIBC latency --
[####################] 100 %
Ntrial = 20 ; Min = 111.613 + 1.267 clc/call; Median-Min = 1.696 clc/call; Max = 113.444 clc/call;

-- LIBC latency --  with -mavx2 -mfma     (with FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 40.138 + 0.410 clc/call; Median-Min = 0.536 clc/call; Max = 40.729 clc/call;
-- LIBC latency --  with -msse4.2     (without FMA instructions)
[####################] 100 %
Ntrial = 20 ; Min = 44.858 + 0.872 clc/call; Median-Min = 0.814 clc/call; Max = 46.019 clc/call;
```

Reviewed By: michaelrj

Differential Revision: https://reviews.llvm.org/D153683
2023-06-28 13:58:37 -04:00
Guillaume Chatelet
b3b54131d0 [libc][NFC] Separate avx/no-avx x86 memcpy implementations
Reviewed By: courbet

Differential Revision: https://reviews.llvm.org/D153958
2023-06-28 13:56:56 +00:00