ArrayRef now has a new constructor that takes a parameter whose type
has data() and size() methods. Since the new constructor subsumes
another constructor that takes std::array, this patch removes that
constructor. Note that std::array also comes with data() and size()
methods.
The only problem is that ASTFileSignature in the clang frontend does
not work with the new ArrayRef constructor because it overrides size,
blocking access to std::array<uint8_t, 20>::size(). This patch adds
an implicit cast operator to ArrayRef. Note that ASTFileSignature is
defined as:
struct ASTFileSignature : std::array<uint8_t, 20> {
using BaseT = std::array<uint8_t, 20>;
static constexpr size_t size = std::tuple_size<BaseT>::value;
:
This updates the DIL implementation to handle smart pointers (accessing
field members and dereferencing) in the same way the current 'frame
variable' implementation does. It also adds tests for handling smart
pointers, as well as some additional DIL tests.
There's long standing behaviour in PlayStation to allow user-supplied
library search paths (`-L`) to influence lookup of CRT objects. This
seems to be a historical quirk that has persisted to the present day.
This usage of `-L` for CRT selection is deeply entrenched among users of
the PS5 toolchain. While this change is conceptually bothersome, it does
reflect what's shipped.
SIE tracker: TOOLCHAIN-17706
…ombined
This patch does the lowering of copyin (represented as a
acc.copyin/acc.delete), copyout (acc.create/acc.copyin), and create
(acc.create/acc.delete).
Additionally, it found a few problems with #144806, so it fixes those as
well.
instructionWaitsForSGPRWrites function covers ALL SALU instructions,
including those like s_waitcnt that don't read from sgpr. This results
in removing delay_alu instructions in cases like VALU->SGPR->VALU, which
results in performance regression. Change modifies the function so that
it checks if instruction also reads a sgpr.
Rather than creating a new device info tree for each call to
`olGetDeviceInfo`, we instead do it on device initialisation. As well
as improving performance, this fixes a few lifetime issues with returned
strings.
This does unfortunately mean that device information is immutable,
but hopefully that shouldn't be a problem for any queries we want to
implement.
This also meant allowing offload initialization to fail, which it can
now do.
AMD treats this value as a string, so for consistency require this in
NVIDIA as well. This shouldn't change the output of the
`llvm-offload-device-info` tool, but does fix an issue in liboffload
when it tries to query the version.
This patch adds a new document describing the LLVM Qualification Group,
modeled after the Security Group documentation. The goal is to create an
open working group focused on enabling LLVM use in safety-critical
applications, such as those requiring ISO 26262 qualification.
The group is intended to be non-enforcing and collaborative, and to act
as a public coordination point for contributors working on
safety-relevant concerns in LLVM.
See:
https://discourse.llvm.org/t/rfc-proposal-to-establish-a-safety-group-in-llvm/86916
In this review, I’d really appreciate your feedback on both the overall
structure and wording, especially if anything could be made clearer,
more balanced, or more aligned with LLVM’s values and documentation
tone. What feels right? What could be improved to better reflect LLVM
community expectations?
---------
Co-authored-by: Wendi Urribarri (Woven by Toyota <wendi.urribarri@woven-planet.global>
Most pacbti instructions are a nop when the target does not have pacbti,
and thus safe to execute, but bxaut is an undefined instruction. When we
don't have pacbti (e.g. if we're compiling compiler-rt with
-mbranch-protection=standard in order to be forward-compatible with
pacbti while still working on targets without it) then we need to use
separate aut and bx instructions.
This allows adding `__attribute__((swift_name("MyNamespace.MyType.my_method()")))` on standalone C++ functions to make Swift import them as member functions of a type that is nested in a C++ namespace.
rdar://138934888
This patch adds additional checks to the hoisting logic to prevent hoisting of
`vector.transfer_read` / `vector.transfer_write` pairs when the underlying
memref has users that introduce aliases via operations implementing
`ViewLikeOpInterface`.
Note: This may conservatively block some valid hoisting opportunities and could
affect performance. However, as demonstrated by the included tests, the current
logic is too permissive and can lead to incorrect transformations.
If this change prevents hoisting in cases that are provably safe, please share
a minimal repro - I'm happy to explore ways to relax the check.
Special treatment is given to `memref.assume_alignment`, mainly to accommodate
recent updates in:
* https://github.com/llvm/llvm-project/pull/139521
Note that such special casing does not scale and should generally be avoided.
The current hoisting logic lacks robust alias analysis. While better support
would require more work, the broader semantics of `memref.assume_alignment`
remain somewhat unclear. It's possible this op may eventually be replaced with
the "alignment" attribute added in:
* https://github.com/llvm/llvm-project/pull/144344
Added early return when mapping named constants.
This prevents linking error in the following example:
```
program test
use, intrinsic :: iso_c_binding, only: c_double
implicit none
real(c_double) :: x
integer :: i
x = 0.0_c_double
!$omp target teams distribute parallel do reduction(+:x)
do i = 0, 9
x = x + 1.0_c_double
end do
!$omp end target teams distribute parallel do
end program test
```
In gfx90a-gfx950, it's possible to emit MFMAs which use AGPRs or VGPRs
for vdst and src2. We do not want to do use the AGPR form, unless
required by register pressure as it requires cross bank register
copies from most other instructions. Currently we select the AGPR
or VGPR version depending on a crude heuristic for whether it's possible
AGPRs will be required. We really need the register allocation to
be complete to make a good decision, which is what this pass is for.
This adds the pass, but does not yet remove the selection patterns
for AGPRs. This is a WIP, and NFC-ish. It should be a no-op on any
currently selected code. It also does not yet trigger on the real
examples of interest, which require handling batches of MFMAs at
once.
AMDGPU: Add baseline tests for VGPR MFMA rewriting pass
Add baseline tests for a new pass that will rewrite VGPR MFMAs
with copies to AV_* classes into the AGPR form.
Add start of IR test that probably needs to be redone
If we are inserting into lane 0 of a zero vector, we can use the ldr
instructions to get the upper-lane zero for free. Do not attempt to make
post-inc operations in that case, which should be less micro-ops
overall.
This patch adds a separate `extensions` directory, since there are quite
a few extensions in libc++ that aren't necessarily libc++-specific. For
example, the tests currently in `libcxx/test/libcxx/extensions` should
also pass with libstdc++, since they originally added the extension.
This also "documents" what users are allowed to rely on and what parts
are just libc++ tests to make sure our implementation is behaving as we
expect, which may be subject to change.
This patch also formats the tests and refactors `.fail.cpp` tests to
`.verify.cpp` tests.
So far flang generates runtime derived type info global definitions (as
opposed to declarations) for all the types used in the current
compilation unit even when the derived types are defined in other
compilation units. It is using linkonce_odr to achieve derived type
descriptor address "uniqueness" aspect needed to match two derived type
inside the runtime.
This comes at a big compile time cost because of all the extra globals
and their definitions in apps with many and complex derived types.
This patch adds and experimental option to only generate the rtti
definition for the types defined in the current compilation unit and to
only generate external declaration for the derived type descriptor
object of types defined elsewhere.
Note that objects compiled with this option are not compatible with
object files compiled without because files compiled without it may drop
the rtti for type they defined if it is not used in the compilation unit
because of the linkonce_odr aspect.
I am adding the option so that we can better measure the extra cost of
the current approach on apps and allow speeding up some compilation
where devirtualization does not matter (and the build config links to
all module file object anyway).
Followup to #143760 which handled ISD::VSELECT
I've moved ISD::SELECT/VSELECT under the "No poison except from flags
(which is handled above)" subgroup to try to remind people that these
can have poison generating FMFs (NINF/NNAN), even though this hasn't
been well explained anywhere I can find :(
Helps with regressions from #145939
`ConversionPattern::getVoidPtrType` looks a little confusion since the
opaque pointer migration is already done. Also we cannot specify address
space in this method.
Maybe we can mark them as deprecated and add new method `getPtrType()`,
as this PR did : )
In C, a char array needs no "nonstring" attribute, if its initializer is
a string literal that 1) explicitly ends with '\0' and 2) fits in the
array after a possible truncation.
For example
`char a[4] = "ABC\0"; // fine, needs no "nonstring" attr`
rdar://152506883
This reland disables the test for linux so that it will not block the
buildbot: https://lab.llvm.org/buildbot/#/builders/144/builds/28591.
Until now the feature to enable vectorisation of some early exit
loops with uncountable exits was controlled under a flag, off by
default. Now that we have efficient code generation for
vectorising such loops (see PR #130766) and we still have some
time from the next LLVM release it seems like a good time point
to enable the feature by default. If any issues arise post-commit
it can be easily reverted.
Using this patch I built and ran the LLVM test suite successfully,
which on neoverse-v1 led to the vectorisation of 114 additional
early exit loops. I also built and ran SPEC2017 successfully for
both neoverse-v1 and neoverse-v2.
.. from the guts of GDBRemoteCommunication to ~top level.
This is motivated by #131519 and by the fact that's impossible to guess
whether the author of a symlink intended it to be a "convenience
shortcut" -- meaning it should be resolved before looking for related
files; or an "implementation detail" -- meaning the related files should
be located near the symlink itself.
This debate is particularly ridiculous when it comes to lldb-server
running in platform mode, because it also functions as a debug server,
so what we really just need to do is to pass /proc/self/exe in a
platform-independent manner.
Moving the location logic higher up achieves that as lldb-platform (on
non-macos) can pass `HostInfo::GetProgramFileSpec`, while liblldb can
use the existing complex logic (which only worked on liblldb anyway as
lldb-platform doesn't have a lldb_private::Platform instance).
Another benefit of this patch is a reduction in dependency from
GDBRemoteCommunication to the rest of liblldb (achieved by avoiding the
Platform dependency).
These were bypassing the ordinary libcall emission mechanism. Make sure
we have entries in RuntimeLibcalls, which should include all possible
calls the compiler could emit.
Fixes not emitting the # prefix in the arm64ec case.
Work towards separating the ABI existence of libcalls vs. the
lowering selection. Set libcall selection through enums, rather
than through raw string names.
Replace RuntimeLibcalls.def with a tablegenerated version. This
is in preparation for splitting RuntimeLibcalls into two components.
For now match the existing functionality.
Most of the diff is replacing `__parent_pointer` with
`__end_node_pointer`. The most interesting diff is that the pointer
aliases are now defined directly inside `__tree` instead of a separate
traits class.