Followup to #136022, this ensures formatting tests are run with an empty
`.clang-format-ignore` in their root directory, to prevent failures if
the file also exists higher in the tree.
The MSVC STL includes specializations of `_Is_memfunptr` for every
function pointer type, including every calling convention.
The problem is the AMDGPU target doesn't support the x86 `vectorcall`
calling convention so clang sets it to the default CC. This ends up
clashing with the already-existing overload for the default CC, so we
get a duplicate definition error when including `type_traits` (which we
heavily use in the SYCL STL) and compiling for AMDGPU on Windows.
This doesn't happen for pure AMDGPU non-SYCL because it doesn't include
the C++ STL, and it doesn't happen for CUDA/HIP because a similar
workaround was done
[here](fa49c3a888).
I am not an expert in Sema, so I did a kinda of hardcoded fix, please
let me know if there is a better way to fix this.
As far as I can tell we can't do exactly the same fix that was done for
CUDA because we can't differentiate between device and host code so
easily.
---------
Signed-off-by: Sarnie, Nick <nick.sarnie@intel.com>
This change removes the uint64_t constructor on LocationSize
preventing implicit conversion, and fixes up the using APIs to adapt to
the change. Note that I'm adding a couple of explicit conversion points
on routines where passing in a fixed offset as an integer seems likely
to have well understood semantics.
We had an unfortunate case which arose if you tried to pass a TypeSize
value to a parameter of LocationSize type. We'd find the implicit
conversion path through TypeSize -> uint64_t -> LocationSize which works
just fine for fixed values, but looses information and fails assertions
if the TypeSize was scalable. This change breaks the first link in that
implicit conversion chain since that seemed to be the easier one.
Repack `amdgpu.swizzle_bitmode` arguments and lower it to
`rocdl.ds_swizzle`.
Repacking logic is follows:
* `sizeof(arg) < sizeof(i32)`: bitcast to integer and zext to i32 and
then trunc and bitcast back.
* `sizeof(arg) == sizeof(i32)`: just bitcast to i32 and back if not i32
* `sizeof(arg) > sizeof(i32)`: bitcast to `vector<Nxi32>`, extract
individual elements and do a series of `rocdl.ds_swizzle` and then
compose vector and bitcast back.
Added repacking logic to LLVM utils so it can be used elsewhere. I'm
planning to use it for `gpu.shuffle` later.
`FunctionOpInterface` assumed the fact that the function type (attribute
of the operation) can be cloned with arbirary lists of function
arguments and results to support argument and result list mutation. This
is not always correct, in particular, LLVM dialect functions require
exactly one result making it impossible to erase the result.
Allow function type cloning to fail and propagate this failure through
various APIs that use it. The common assumption is that existing IR has
not been modified.
Fixes#131142.
This also demonstrates a bug that's a consequence of the two different
paths for the single and multithreaded cases. The parallel path goes
through bitcode serialization and does preserve the uselistorder. It
therefore survives and we can observe a reduced uselistorder with deleted
instructions. In the CloneModule case, nothing is reduced.
This patch makes the __config_site header modular, which solves various
problems with non-modular headers. This requires going back to
generating the modulemap file, since we only know how to make
__config_site modular when we're not using the per-target runtime dir.
The patch also adds a test that we support
-Wnon-modular-include-in-module, which warns about non-modular includes
from modules.
---------
Co-authored-by: Konstantin Varlamov <varconst@apple.com>
This commit is in preparation of the One-Shot Dialect Conversion
refactoring, which removes the rollback from the dialect conversion
framework.
`GenericAtomicRMWOpLowering` (`generic_atomic_rmw`) triggered a rollback
in two test cases. The lowering pattern adds additional basic blocks to
the enclosing operation, which used to be a `func.func` (now
`llvm.func`). Adding a basic block triggers legalization of the op that
owns the basic block. This fails when running
`--convert-to-llvm="filter-dialects=memref"` because no lowering
patterns for the `func` dialect were populated and only `llvm` ops are
considered "legal" by the `convert-to-llvm` pass, causing a rollback of
the entire `GenericAtomicRMWOpLowering` pattern.
Also add extra `CHECK-INTERFACE` to make sure that all test cases are
correctly lowered with `--convert-to-llvm="filter-dialects=memref"`.
Summary:
Currently, we build a single `libomptarget.devicertl.a` which is a
fatbinary. It is a host object file that contains the embedded archive
files for both the NVIDIA and AMDGPU targets. This was done primarily as
a convenience due to naming conflicts. Now that the clang driver for the
GPU targets can appropriate link via the per-target runtime-dir, we can
just make two separate static libraries and remove the indirection.
This patch creates two new static libraries that get installed into
```
/lib/amdgcn-amd-amdhsa/libomp.a
/lib/nvptx64-nvidia-cuda/libomp.a
```
for AMDGPU and NVPTX respectively. The link job created by the linker
wrapper now simply needs to do `-lomp` and it will search those
directories and link those static libraries. This requires far less
special handling.
This patch is a precursor to changing the build system entirely to be a
runtimes based one. Soon this target will be a standard `add_library`
and done through the GPU runtime targets.
NOTE that this actually does remove an additional optimization step.
Previously we merged all of the files into a single bitcode object and
forcibly internalized some definitions. This, instead, just treats them
like a normal static library. This may possibly affect performance for
some files, but I think it's better overall to use static library
semantics because it allows us to have an 'include-what-you-use'
relationship with the library.
Performance testing will be required. If we really need the merged blob
then we can simply pack that into a new static library.
getMergedLocation uses a common parent scope of the two input locations
for an output location.
It doesn't consider the case when the common parent scope is from a file
other than L1's and L2's files. In that case, it produces a merged location
with an erroneous scope (https://github.com/llvm/llvm-project/issues/122846).
In some cases, such as https://github.com/llvm/llvm-project/pull/125780#issuecomment-2651657856,
L1, L2 having a common parent scope from another file indicate that
the code at L1 and L2 is included from the same source location.
With this commit, getMergedLocation detects that L1, L2, or their common parent
scope files are different. If so, it assumes that L1 and L2 were included
from some source location, and tries to attach the output location to a scope
with the nearest common source location with regard to L1 and L2.
If the nearest common location is also from another file, getMergedLocation returns it
as a merged location, assuming that L1 and L2 belong to files that were both included
in the nearest common location.
Fixes https://github.com/llvm/llvm-project/issues/122846.
It needs to be `TEST_F` to access `received_entries`.
Disabling also works based on the test not the fixture name.
Build failure:
```
lldb/unittests/Core/TelemetryTest.cpp:110:17: error: use of undeclared identifier 'received_entries'
110 | ASSERT_EQ(1U, received_entries.size());
| ^
lldb/unittests/Core/TelemetryTest.cpp:112:61: error: use of undeclared identifier 'received_entries'
112 | llvm::dyn_cast<lldb_private::FakeTelemetryInfo>(received_entries[0])
| ^
```
Fixes: 159b872b37
- Move the code generated by DecoderEmitter to anonymous namespace.
- Move AMDGPU's usage of this code from header file to .cpp file.
Note, we get build errors like "call to function 'decodeInstruction'
that is neither visible in the template definition nor found by
argument-dependent lookup" if we do not change AMDGPU.
As reported in issue #103477, visibility of instantiated member
functions used to be ignored when calculating visibility of a
specialization.
This patch modifies `getLVForClassMember` to look up for a source
template for an instantiated member, and changes `mergeTemplateLV` to
apply it.
A similar issue was reported in #31462, but it seems that `extern`
declaration with visibility prevents the function from being emitted as
hidden. This behavior seems correct, even though GCC emits it as with
default visibility instead.
Both tests from #103477 and #31462 are added as LIT tests `test72` and
`test73` respectively.
As title, we need to call `Timer::clear` to avoid extra log like this:
```
===-------------------------------------------------------------------------===
...
===-------------------------------------------------------------------------===
Total Execution Time: 0.0000 seconds (0.0000 wall clock)
---Wall Time--- --- Name ---
----- ....
----- Total
```
In current `FlattenCFG`, using `isNoAlias` for two instructions is
imprecise. For example, when passing a store instruction and a load
instruction directly into `AA->isNoAlias`, it will always return
`NoAlias`. This happens because when checking the types of the two
Values, the store instruction (which has a `void` type) causes the
analysis to return `NoAlias`.
For instructions, we should use `getModRefInfo` instead of `isNoAlias`,
as aliasing is a concept of memory locations.
In this patch, `AAResults::getModRefInfo` is supported to take in two
instructions. It will check whether two instructions may access the same
memory location or not. And in `FlattenCFG`, we use this new helper
function to do the check instead of `isNoAlias`.
Unit tests and lit tests are also included to this patch.
This avoids the need to have special handling at every use site.
Unfortunately this means we unnecessarily emit AssertZext in the DAG
(where we already directly understand the range of the intrinsic), andt
we regress in undefined cases as we don't fold out asserts on undef.
925e195 introduced a regression since which we started to accept invalid
trailing commas in many expression lists where they're not allowed by
the grammar. The issue came from the fact that an additional invalid
state - previously handled by ParseExpressionList - was overlooked in
that patch.
Fixes https://github.com/llvm/llvm-project/issues/136254
No release entry because I want to backport it.
Note that we don't have to worry about CallstackProfileData[Id]
default-constructing the value side of a new map entry. If that
happens, AccessHistogramSize > 0 wouldn't be true, and the new map
entry gets deleted right away.
This patch replaces:
llvm::copy(Src, std::back_inserter(Dst));
with:
llvm::append_range(Dst, Src);
for breavity.
One side benefit is that llvm::append_range eventually calls
llvm::SmallVector::reserve if Dst is of llvm::SmallVector.
The fixup output is a debug aid and should not be used to test
target-specific relocation generation implementation. The llvm-mc
-filetype=obj output is what truly matters.
For RELA targets, fixup kinds that force relocations (GOT, TLS, ALIGN,
RELAX, etc) can bypass `applyFixup` and be encoded as
`FirstRelocationKind+i`, as seen in LoongArch. This patch removes
redundant fixup kinds and adopts the `FirstRelocationKind+i` encoding.
The `llvm-mc -show-encoding` output no longer displays descriptive fixup
names, as this information is removed from
`RISCVAsmBackend::getFixupKindInfo`. While a backend hook could be added
to call `llvm::object::getELFRelocationTypeName`, it's unnecessary since
the relocation in `-filetype=obj` output is what truly matters.
Pull Request: https://github.com/llvm/llvm-project/pull/136088