isComplete previously meant different things for different conversion
directions.
Refactored bytes_processed to bytes_stored which now consistently
increments on every push and decrements on pop making both directions
more consistent with each other
Updating x87 floating point register significantly affect the
performance of the functions.
All the floating point exception reads will merge the results from both
mxcsr and x87 registers anyway.
If you enable `LIBC_CONF_PRINTF_FLOAT_TO_STR_USE_FLOAT320` and use a
`%f` style printf format directive to print a nonzero number too small
to show up in the output digits, e.g. `printf("%.2f", 0.001)`, then the
output would be intermittently incorrect, because
`DyadicFloat::as_mantissa_type_rounded` would try to shift the 320-bit
mantissa right by more than 320 bits, invoking the 'undefined behavior'
clause commented in the `shift()` function in `big_int.h`.
There were already tests in the libc test suite exercising this case,
e.g. the subnormal tests in `LlvmLibcSPrintfTest.FloatDecimalConv` use
`%f` at the default precision of 6 decimal places on tiny numbers such
as 2^-1027. But because the behavior is undefined, they don't visibly
fail all the time, and in all previous test runs we'd tried with
USE_FLOAT320, they had got lucky.
The fix is simply to detect an out-of-range right shift before doing it,
and instead just set the output value to zero.
In #94078, `write_to_stdout` had not been fully implemented. However,
now that it has been implemented, to conform with the C standard
(7.23.6.3. The printf function, specifically point 2), we use `stdout`.
This issue is tracked in #94685.
- Also prefer `static constexpr`
- Made it explicit that we are writing to `stdout`
Removed strcmp, strlen, and memset calls from table.h and replaced them
with internal functions.
---------
Co-authored-by: Sriya Pratipati <sriyap@google.com>
The previous internal fcntl implementation modified errno directly, this
patch fixes that. This patch also moves open and close into OSUtil since
they are used in multiple places. There are more places that need
similar cleanup but only got comments in this patch to keep it
relatively reviewable.
Related to: https://github.com/llvm/llvm-project/issues/143937
Summary:
We need to set the bitfield memory to zero because the system does not
guarantee zeroed out memory. Even if fresh pages are zero, the system
allows re-use so we would need a `kfd` level API to skip this step.
Because we can't this patch updates the logic to perform the zero
initialization wave-parallel. This reduces the amount of time it takes
to allocate a fresh by up to a tenth.
This has the unfortunate side effect that the control flow is more
convoluted and we waste some extra registers, but it's worth it to
reduce the slab allocation latency.
After changing mbstate_t to mbstate we forgot to change the
character_converter files to reflect it.
Co-authored-by: Sriya Pratipati <sriyap@google.com>
The string_utils.h file previously included both memcpy and bzero. There
were no uses of bzero, and only one use of memcpy which was replaced
with __builtin_memcpy.
Also fix strsep which was broken by this change, fix a useless assert of
"sizeof(char) == sizeof(cpp::byte)", and update the bazel.
Summary:
This improves performance by reducing the amount of RMW operations we
need to do to a single slot. This improves repeated allocations without
much contention about ten percent.