- Replace `move_byte_forward()` with `memcpy`. In `memcpy` implementation,
it copies bytes forward from beginning to end. Otherwise, `memmove` unit
tests will break.
- Make `memmove` unit tests work.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D109316
Since the precondition for loop is `size >= T::kSize` we always expect
at least one run of the loop. This patch transforms the for-loop into a
do/while-loop which saves at least one test.
We also add a second template parameter to allow the Tail operation to
differ from the loop operation.
Add strncmp as a function to strings.h. Also adds unit tests, and adds
strncmp as an entrypoint for all current platforms.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D106901
While working and testing my refactoring of multiple string functions in libc, I came across a bug that needs to be addressed in a patch on its own: src is checked for nullptr and assigned to *saveptr if it is nullptr. However, saveptr is initially nullptr when it comes to reentry. This could cause a problem if both saveptr and src are null; we need to do the check first and return nullptr if both are nullptr.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D106885
__builtin_ctzl takes an unsigned long argument which need not be 64-bit
long on all platforms. Using __builtin_ctzll, which takes an unsigned
long long argument, ensures that 64-bit values will be handled on a
wider range of platforms.
Without this change, the test corresponding to M512 fails in Windows.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D104897
Resubmission of D100646 now making sure that we handle cases were `__builtin_memcpy_inline` is not available.
Original commit message:
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.
A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM
Resubmission of D100646 now making sure that we handle cases were `__builtin_memcpy_inline` is not available.
Original commit message:
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.
A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM
Resubmission of D100646 now making sure that we handle cases were `__builtin_memcpy_inline` is not available.
Original commit message:
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.
A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM
Resubmission of D100646 now making sure that we handle cases were `__builtin_memcpy_inline` is not available.
Original commit message:
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.
A self-contained version of this code is available at https://godbolt.org/z/e1x6xdaxM
Each of these elementary operations can be assembled to support higher order constructs (Overlapping access, Loop, Aligned Loop).
The patch does not compile yet as it depends on other ones (D100571, D100631) but it allows to get the conversation started.
Differential Revision: https://reviews.llvm.org/D100646
This is a roll forward of D101895 with two additional fixes:
Original Patch description:
> This is a follow up on D101524 which:
>
> - simplifies cpu features detection and usage,
> - flattens target dependent optimizations so it's obvious which implementations are generated,
> - provides an implementation targeting the host (march/mtune=native) for the mem* functions,
> - makes sure all implementations are unittested (provided the host can run them).
Additional fixes:
- Fix uninitialized ALL_CPU_FEATURES
- Use non pseudo microarch as it is only supported from Clang 12 on
Differential Revision: https://reviews.llvm.org/D102233
This reverts commit 541f107871 as the bots
are failing with unknown architecture "x86-64-v*". Will let the original
author decide on the right course of action to correct the problem and
reland.
This is a follow up on D101524 which:
- simplifies cpu features detection and usage,
- flattens target dependent optimizations so it's obvious which implementations are generated,
- provides an implementation targeting the host (march/mtune=native) for the mem* functions,
- makes sure all implementations are unittested (provided the host can run them),
- makes sure all implementations are benchmarkable (provided the host can run them).
Differential Revision: https://reviews.llvm.org/D101895
Current implementation defines LIBC_TARGET_MACHINE with the use of CMAKE_SYSTEM_PROCESSOR.
Unfortunately CMAKE_SYSTEM_PROCESSOR is OS dependent and can produce different results.
An evidence of this is the various matchers used to detect whether the architecture is x86.
This patch normalizes LIBC_TARGET_MACHINE and renames it LIBC_TARGET_ARCHITECTURE.
I've added many architectures but we may want to limit ourselves to x86 and ARM.
Differential Revision: https://reviews.llvm.org/D101524
Aligned copy used to be 'destination aligned' for x86 but this decision was reverted in D93457 where we noticed that it was better for ARM to be 'source aligned'.
More benchmarking confirmed that it can be up to 30% faster to align copy to destination for x86. This Patch offers both implementations and switches x86 back to destination aligned.
It also fixes alignment to 32 byte on x86.
Differential Revision: https://reviews.llvm.org/D101296
This is needed to prevent asan/msan instrumentation to redirect CopyBlock to `__asan_memcpy` (resp. `__msan_memcpy`).
These functions would then differ operation to `memcpy` which leads to reentrancy issues.
With this patch, `memcpy` is fully instrumented and covered by asan/msan.
If this turns out to be too expensive, instrumentation can be selectively or fully disabled through the use of the `__attribute__((no_sanitize(address, memory)))` annotation.
Differential Revision: https://reviews.llvm.org/D99598
We want to be able to build and test the string functions in contexts
like that of Fuchsia where LLVM pieces like tablegen are not available.
Since header generation uses tablegen, we are removing the dependency on
headergen here.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D97363
`ssize_t` is from POSIX and is not standard unfortunately.
Rewritting the code so it doesn't depend on it.
Differential Revision: https://reviews.llvm.org/D94760
- Adds LLVM_LIBC_IS_DEFINED macro to libc/src/__support/common.h
- Adds a few knobs to memcpy to help with experimentations:
- LLVM_LIBC_MEMCPY_X86_USE_ONLY_REPMOVSB replaces the implementation with a single call to rep;movsb
- LLVM_LIBC_MEMCPY_X86_USE_REPMOVSB_FROM_SIZE customizes where the usage of rep;movsb
Differential Revision: https://reviews.llvm.org/D94692
Use `memcpy` rather than copying bytes one by one, for there might be large
size structs to move.
Reviewed By: gchatelet, sivachandra
Differential Revision: https://reviews.llvm.org/D93195
Summary:
The new macro also inserts the C alias for the C++ implementations
without needing an objcopy based post processing step. The CMake
rules have been updated to reflect this. More CMake cleanup can be
taken up in future rounds and appropriate TODOs have been added for them.
Reviewers: mcgrathr, sivachandra
Subscribers:
We used to align destination buffer instead of source buffer for the loop of block copy.
This is a mistake.
Differential Revision: https://reviews.llvm.org/D93457
This switches all of the files in src/string, src/math, and
test/src/math from using relative paths (e.g. `#include “include/string.h”`)
to global paths (e.g. `#include <string.h>`) to make bringing up those
functions on other platforms, such as fuchsia, easier.
Reviewed By: sivachandra
Differential Revision: https://reviews.llvm.org/D91394