Reland #118503. Added a fix for builds with `-DBUILD_SHARED_LIBS=ON`
(see last commit). Otherwise the changes are identical.
---
### New API
Previous discussions at the LLVM/Offload meeting have brought up the
need for a new API for exposing the functionality of the plugins. This
change introduces a very small subset of a new API, which is primarily
for testing the offload tooling and demonstrating how a new API can fit
into the existing code base without being too disruptive. Exact designs
for these entry points and future additions can be worked out over time.
The new API does however introduce the bare minimum functionality to
implement device discovery for Unified Runtime and SYCL. This means that
the `urinfo` and `sycl-ls` tools can be used on top of Offload. A
(rough) implementation of a Unified Runtime adapter (aka plugin) for
Offload is available
[here](https://github.com/callumfare/unified-runtime/tree/offload_adapter).
Our intention is to maintain this and use it to implement and test
Offload API changes with SYCL.
### Demoing the new API
```sh
# From the runtime build directory
$ ninja LibomptUnitTests
$ OFFLOAD_TRACE=1 ./offload/unittests/OffloadAPI/offload.unittests
```
### Open questions and future work
* Only some of the available device info is exposed, and not all the
possible device queries needed for SYCL are implemented by the plugins.
A sensible next step would be to refactor and extend the existing device
info queries in the plugins. The existing info queries are all strings,
but the new API introduces the ability to return any arbitrary type.
* It may be sensible at some point for the plugins to implement the new
API directly, and the higher level code on top of it could be made
generic, but this is more of a long-term possibility.
This patch adds a lit test to clang format to ensure that the
ClangFormatStyleOptions doc page has been updated appropriately. The
test just runs the automatic update script and diffs the outputs to
ensure they are the same.
We can support these via libcalls in libgcc/compiler-rt or integer
operations for fneg/fabs/fcopysign. fp128 values will be passed in two
64-bit GPRs according to the psABI.
Supporting RV32 requires sret which is not supported by libcall handling
in LegalizerHelper.cpp yet. It doesn't call canLowerReturn.
Extend MLIR to LLVM lowering by adding support for `omp.wsloop` for
delayed privatization. This also refactors a few bit of code to isolate
the logic needed for `firstprivate` initialization in a shared util that
can be used across constructs that need it. The same is done for
`dealloc`
regions.
Parent PR: https://github.com/llvm/llvm-project/pull/118447. Only latest
commit is relevant for this PR.
This refactors the logic needed to emit init logic for reductions by
moving some duplicated code into a shared util. The logic for doing is
quite involved and is needed for any construct that has reductions.
Moreover, when a construct has both private and reduction clauses, both
sets of clauses need to cooperate with each other when emitting the
logic needed for allocation and initialization. Therefore, this PR
clearly sets the boundaries for the logic needed to initialize
reductions.
For Frames, we prefer the inline notation for the brevity.
For PortableMemInfoBlock, we go through all member fields and print
out those that are populated.
This relands #118157 with a fix for the use of an uninitialized
variable and additional tests.
The System V ABI
(https://www.sco.com/developers/gabi/latest/ch5.pheader.html#note_section)
states that the note entries and their descriptor fields must be aligned
to 4 or 8 bytes for 32-bit or 64-bit objects respectively. In practice,
64-bit systems can use both alignments, with the actual format being
determined by the alignment of the segment. For example, the Linux
gABI extension (https://github.com/hjl-tools/linux-abi/wiki/linux-abi-draft.pdf)
contains a special note on this, see 2.1.7 "Alignment of Note Sections".
This patch adjusts the format of the generated notes to the specified
section alignment. Since `llvm-readobj` was fixed in a similar way in
https://reviews.llvm.org/D150022, "[Object] Fix handling of Elf_Nhdr
with sh_addralign=8", the generated notes can now be parsed
successfully by the tool.
This is a patch in preparation for the support stream ordered memory
allocator in CUDA Fortran.
This patch adds an asynchronous id to the AllocatableAllocate runtime
function and to Descriptor::Allocate so it can be passed down to the
registered allocator. It is up to the allocator to use this value or
not.
A follow up patch will implement that asynchronous allocator for CUDA
Fortran.
In #86751 we moved the IRELATIVE relocations to .rela.plt when
--pack-dyn-relocs=android was enabled but we neglected to also move
the __rela_iplt_{start,end} symbols. As a result, static binaries
linked with this flag were unable to find their IRELATIVE relocations.
Fix it by having the symbols surround the correct section.
Reviewers: MaskRay, smithp35
Reviewed By: MaskRay
Pull Request: https://github.com/llvm/llvm-project/pull/118585
This patch fixes:
clang/lib/AST/MicrosoftMangle.cpp:1006:11: error: enumeration value
'S_PPCDoubleDoubleLegacy' not handled in switch [-Werror,-Wswitch]
* Add missing semantics to the `Semantics` enum.
* Move all documentation of the semantics to the header file.
* Also rename some functions for consistency.
The AMDGPUAnnotateKernelFeatures pass infers the "amdgpu-calls" and
"amdgpu-stack-objects" attributes, which are used to infer whether we
need to initialize flat scratch. This is, however, not precise. Instead,
we should use AMDGPUAttributor and infer amdgpu-no-flat-scratch-init on
kernels. Refer to https://github.com/llvm/llvm-project/issues/63586 .
Further cleanups from old hdrgen removal. I didn't realize there were
cmake
variables related to old hdrgen spread out throughout more of the source
tree.
Link: #117220
Link: #117208
Adds support for the following MSVC intrinsics:
* `_InterlockedCompareExchangePointer_acq`
* `_InterlockedCompareExchangePointer_rel`
* `_InterlockedExchangePointer_acq`
* `_InterlockedExchangePointer_nf`
* `_InterlockedExchangePointer_rel`
These are documented at:
<https://learn.microsoft.com/en-us/cpp/intrinsics/arm64-intrinsics?view=msvc-170#interlocked-intrinsics>
NOTE: `_InterlockedCompareExchangePointer_nf` is not being added since
it already exists, although it was incorrectly added for all
architectures instead of being Arm & AArch64 specific.
This change also unifies how the pointer and non-pointer interlocked
compare-exchange intrinsics are being handled.
Add pattern that update fir.declare memref when it comes from a device
global and is not a descriptor. In that case, we recover the device
address that needs to be used in ops like `fir.array_coor` and so on.