This attribute is unsupported in GCC, so far it worked because before
GCC15 did not define this macros in _CHKFEAT_GCS in arm_acle.h [1]
With gcc15 compiler libunwind's check for this macros is succeeding and
it ends up enabling 'gcs' by using function attribute, this works with
clang but not with gcc.
We can see this in rust compiler bootstrap for aarch64/musl when system
uses gcc15, it ends up with these errors
Building libunwind.a for aarch64-poky-linux-musl
```
cargo:warning=/mnt/b/yoe/master/sources/poky/build/tmp/work/cortexa57-poky-linux-musl/rust/1.85.1/rustc-1.85.1-src/src/llvm-project/libunwind/src/UnwindLevel1.c:191:1: error: arch extension 'gcs' should be prefixed by '+' cargo:warning= 191 | unwind_phase2(unw_context_t *uc, unw_cursor_t *cursor, _Unwind_Exception *exception_object) {
cargo:warning= | ^~~~~~~~~~~~~
cargo:warning=/mnt/b/yoe/master/sources/poky/build/tmp/work/cortexa57-poky-linux-musl/rust/1.85.1/rustc-1.85.1-src/src/llvm-project/libunwind/src/UnwindLevel1.c:337:22: error: arch extension 'gcs' should be prefixed by '+'
cargo:warning= 337 | _Unwind_Stop_Fn stop, void *stop_parameter) {
cargo:warning= | ^~~~~~~~~~~~~~~
```
[1] https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=5a6af707f0af
Signed-off-by: Khem Raj <raj.khem@gmail.com>
AAPCS64 reserves any of X9-X15 for a compiler to choose to use for this
purpose, and says not to use X16 or X18 like GCC (and the previous
implementation) chose to use. The X18 register may need to get used by
the kernel in some circumstances, as specified by the platform ABI, so
it is generally an unwise choice. Simply choosing a different register
fixes the problem of this being broken on any platform that actually
follows the platform ABI (which is all of them except EABI, if I am
reading this linux kernel bug correctly
https://lkml2.uits.iu.edu/hypermail/linux/kernel/2001.2/01502.html). As
a side benefit, also generate slightly better code and avoids needing
the compiler-rt to be present. I did that by following the XCore
implementation instead of PPC (although in hindsight, following the
RISCV might have been slightly more readable). That X18 is wrong to use
for this purpose has been known for many years (e.g.
https://www.mail-archive.com/gcc@gcc.gnu.org/msg76934.html) and also
known that fixing this to use one of the correct registers is not an ABI
break, since this only appears inside of a translation unit. Some of the
other temporary registers (e.g. X9) are already reserved inside llvm for
internal use as a generic temporary register in the prologue before
saving registers, while X15 was already used in rare cases as a scratch
register in the prologue as well, so I felt that seemed the most logical
choice to choose here.
The canonical pattern for bitmasked mul is currently
```
%val = and %x, %bitMask // where %bitMask is some constant
%cmp = icmp eq %val, 0
%sel = select %cmp, 0, %C // where %C is some constant = C' * %bitMask
```
In certain cases, where we are combining multiple of these bitmasked
muls with common factors, we are able to optimize into and->mul (see
https://github.com/llvm/llvm-project/pull/135274 )
This optimization lends itself to further optimizations. This PR
addresses one of such optimizations.
In cases where we have
`or-disjoint ( mul(and (X, C1), D) , mul (and (X, C2), D))`
we can combine into
`mul( and (X, (C1 + C2)), D) `
provided C1 and C2 are disjoint.
Generalized proof: https://alive2.llvm.org/ce/z/MQYMui
Allows forwarding memset to memcpy for mismatching unknown sizes if
overread has undef contents. In that case we can refine the undef bytes
to the memset value.
Refs #140954 which laid some of the groundwork for this.
Reduce the direct use of libc_errno in stdio unit tests by adopting
ErrnoCheckingTest where appropriate.
Also removes the libc_errno.h inclusions from stdlib.h tests that were
accidentally added in d87eea35fa
Summary:
We need to set the bitfield memory to zero because the system does not
guarantee zeroed out memory. Even if fresh pages are zero, the system
allows re-use so we would need a `kfd` level API to skip this step.
Because we can't this patch updates the logic to perform the zero
initialization wave-parallel. This reduces the amount of time it takes
to allocate a fresh by up to a tenth.
This has the unfortunate side effect that the control flow is more
convoluted and we waste some extra registers, but it's worth it to
reduce the slab allocation latency.
The change adds folding for 4 vector intrinsics: `interleave2`,
`deinterleave2`, `vector_extract` and `vector_insert`. For the last 2
intrinsics the change does not use `ShuffleVector` fold mechanism as
it's much simpler to construct result vector explicitly.
Prior to this patch, generating test checks in place put the ATTR
definitions at the very top of the file, above the RUN lines and
autogenerated note. All CHECK lines should below the RUN lines and
autogenerated note.
This change ensures that the attribute definitions are emitted with the
rest of the CHECK lines.
---------
Co-authored-by: Michael Maitland <michaelmaitland@meta.com>
Prior to this PR, the script removed the already existing autogenerated
note if we came across a line that was equal to the note. But the
default note is multiple lines, so there would never be a match.
Instead, check to see if the current line is a substring of the
autogenerated note.
Co-authored-by: Michael Maitland <michaelmaitland@meta.com>
Changes `findCollapsingReassociation` to return nullopt in all cases
where source shape has `>=2` dynamic dims. `expand(collapse)` can
reshape to in any valid output shape but a collapse can only collapse
contiguous dimensions. When there are `>=2` dynamic dimensions it is
impossible to determine if it can be simplified to a collapse or if it
is preforming a more advanced reassociation.
This problem was uncovered by
https://github.com/llvm/llvm-project/pull/137963
---------
Signed-off-by: Ian Wood <ianwood2024@u.northwestern.edu>
It makes more sense for this functionality to be all in one place rather
than split up across two files—at least it caused me a bit of a headache
to try and find all places where we were actually forwarding the
diagnostic to the `DiagnosticConsumer`. Moreover, moving these functions
into `DiagnosticsEngine` simplifies the code quite a bit since we access
members of `DiagnosticsEngine` more frequently than those of
`DiagnosticIDs`. There was also a duplicated code snippet that I’ve
moved out into a new function.
After changing mbstate_t to mbstate we forgot to change the
character_converter files to reflect it.
Co-authored-by: Sriya Pratipati <sriyap@google.com>
When emitting the modules on which a module depends under the
-fhermetic-module-files options, eliminate duplicates by name rather
than by symbol addresses. This way, when a dependent module is in the
symbol table more than once due to the use of a nested hermetic module,
it doesn't get emitted multiple times to the new module file.
Directly check via GeneratedRTChecks if any checks have been added,
instead of needing to go through ILV. This simplifies the code and
enables further refactoring in follow-up patches.
This is a cli tool to that tests the conformance of LLVM's mustache
implementation against the public Mustache spec, hosted at
https://github.com/mustache/spec. This is a revised version of the
patches in #111487.
Co-authored-by: Peter Chou <peter.chou@mail.utoronto.ca>
This patch simplifies code by removing the values from
insert/try_emplace. Note that default values inserted by try_emplace
are immediately overrideen in all these cases.
The OpenACC specification for `acc loop` describe that a loop's
parallelism determination mode is either auto, independent, or seq. The
rules are as follows.
- As per OpenACC 3.3 standard section 2.9.6 independent clause: A loop
construct with no auto or seq clause is treated as if it has the
independent clause when it is an orphaned loop construct or its parent
compute construct is a parallel construct.
- As per OpenACC 3.3 standard section 2.9.7 auto clause: When the parent
compute construct is a kernels construct, a loop construct with no
independent or seq clause is treated as if it has the auto clause.
- Additionally, loops marked with gang, worker, or vector are not
guaranteed to be parallel. Specifically noted in 2.9.7 auto clause: If
not, or if it is unable to make a determination, it must treat the auto
clause as if it is a seq clause, and it must ignore any gang, worker, or
vector clauses on the loop construct.
The verifier for `acc.loop` was updated to enforce this marking because
the context in which a loop appears is not trivially determined once IR
transformations begin. For example, orphaned loops are implicitly
`independent`, but after inlining into an `acc.kernels` region they
would be implicitly considered `auto`. Thus now the verifier requires
that a frontend specifically generates acc dialect with this marking
since it knows the context.
This AutoConvert.h header frequently gets mislabeled as an unused
include because it is guarded by MVS internally and every usage is also
guarded. This refactors the change to remove this guard and instead make
these functions a noop on other non-z/OS platforms.
* Get rid of libc_errno assignments in str_to_* __support tests, since
those API have been migrated to return error in a struct instead.
* Migrate tests for atof and to strto* functions from <stdlib.h> and for
strdup from <string.h> to use ErrnoCheckingTest harness.
PR #143720 adds a requirement to the ACC dialect that every acc.loop
must have a seq, independent, or auto attribute for the 'default'
device_type. The standard has rules for how this can be intuited:
orphan/parallel/parallel loop: independent
kernels/kernels loop: auto
serial/serial loop: seq, unless there is a gang/worker/vector, at which
point it should be 'auto'.
This patch implements all of this rule as a 'cleanup' step on the IR
generation for combined/loop operations. Note that the test impact is
much less since I inadvertently have my 'operation' terminating curley
matching the end curley from 'attribute' instead of the front of the
line, so I've added sufficient tests to ensure I captured the above.
This builds upon #101101 from @jyu2-git, which used compiler-generated
mappers when mapping an array-section of structs with members that have
user-defined default mappers.
Now we do the same when mapping arrays of structs.