This relands commit f890f010f6.
The result value of `getelementptr inbounds (TY, null, not zero)` is a poison value.
We can think of it as undefined behavior.
async just takes an integral value, but it has a little bit of special
rules in sema, so it is implemented slightly differently than int-expr.
This patch implements async parsing.
```
AppProperties app;
```
is marked as a weak symbol in header now. One can just use `&app !=
nullptr` to check if `app` is defined. There is no need to define it for
overlay mode.
Fixes a regression from #78041 as reported in the review. The original
patch failed to compare the canonical type, which this adds. A slightly
modified test of the original report is added.
Unconditionally change std::string's alignment to 8.
This change saves memory by providing the allocator more freedom to
allocate the most
efficient size class by dropping the alignment requirements for
std::string's
pointer from 16 to 8. This changes the output of std::string::max_size,
which makes it ABI breaking.
That said, the discussion concluded that we don't care about this ABI
break. and would like this change enabled universally.
The ABI break isn't one of layout or "class size", but rather the value
of "max_size()" changes, which in turn changes whether `std::bad_alloc`
or `std::length_error` is thrown for large allocations.
This change is the child of PR #68807, which enabled the change behind
an ABI flag.
Summary:
The offloading wrapper is a object file that contains code necessary to
register offloading entries for the given runtime. Currently, we
expected only one of these to be present when we make the final
executable. However, in the case of redistributable linking with `-r` we
can end up with multiple of these being generated before finally
creating the executable.
This patch simply changes the defintiions of these globals to be
mergable. This allows multiples of these to participate in a single link
job. For ELF, we just make the dummy variable internal and used so it
sets up the section as expected. For COFF we make the entries weak_odr
so they merge to a single symbol
Summary:
A relocatable link through `clang -r` can go through the
clang-linker-wrapper if offloading is enabled. This will have the effect
of linking the device code and creating the wrapper module. It will then
be merged into the final file. This is useful behavior on its own, but
is likely not what is expected for a `-r` job.
This patch makes the linker wrapper ignore the device code when doing a
reloctable link. This has the effect of the linker merging the
`.llvm.offloading` sections in the output object. These will then be
parsed as normal when the executable is finally created.
Even though this doesn't actually perform a reloctable link on the
device code itself, it has a similar effect of combining multiple files
into a single one.
before erasing.
Before trying to erase the extractelement instruction, not enough to
check for single use, need to check that it is not used in several nodes
because of the preliminary nodes reordering.
The implementation of raw_string_ostream::write_impl() was moved to
raw_socket_stream.cpp when the raw_socket_ostream support was separated.
This patch moves it back to facilitate disabling socket support in
downstream projects.
Machines with vector support handle i128 in vector registers and
therefore only have the small displacement available for memory
accesses. Update isLegalAddressingMode() to reflect this.
The faults on invalid vec range in mincore seems to be handled
differently by the OS (it is an erroneous edge case after all). Remove
the tests for now.
When creating a new UsingType, the underlying type may change if it is a
declaration. This creates an inconsistency between the type searched and
type created. Update member and non-member Profile functions so that
they return the same ID.
Akin the `py_binary` rules for `lit`, these are scoped to binaries,
rather than exposing the library - binary split. The latter is available
to the package (pip package) users.
Tested:
```
cd utils/bazel
bazel build @llvm-project//llvm:extract_ir
bazel-bin/external/llvm-project/llvm/extract_ir --help
```
...and observed expected output (rather than import not found errors)
(Same for the other 2 targets).
b1ae461a53 renamed Cycle ->
ReleaseAtCycle.
7e09239e24 was committed without rebasing
but used the old Cycle syntax.
This caused a build failure when
7e09239e24 was squash-and-merged. This
patch fixes this problem.
TargetSchedule.td explicitly allows the usage of a ProcResource for zero
cycles, in order to represent that the ProcResource must be available
but is not consumed by the instruction. On the other hand,
ResourceSegments explicitly does not allow for a zero sized interval. In
order to remedy this, this patch handles the special case of when there
is an empty interval usage of a resource by not adding an empty
interval.
We ran into this issue downstream, but it makes sense to have
this upstream since it is explicitly allowed by TargetSchedule.td.
In Bazel, Clang current separates the clang executable into a
clang-driver library, and the actual clang executable. This allows
downstream users to make their own variations of clang, without having
to redo/maintain separate build pipelines.
This adds the same for opt for both CMake and Bazel.
Fix missing test coverage after #70740#70760
When compiling for CUDA/HIP, the driver creates a cc1 job to compile for
amdgcn/nvptx triple using most options.
Certain target-specific options should be ignored, not lead to an error
(`err_drv_unsupported_opt_for_target`).
Background: BPI stores a collection of edge branch-probabilities, and
also a set of Callback value-handles for the blocks in the
edge-collection. When a block is deleted, BPI's eraseBlock method is
called to clear the edge-collection of references to that block, to
avoid dangling pointers.
However, when move-constructing or assigning a BPI object, the
edge-collection gets moved, but the value-handles are discarded. This
can lead to to stale entries in the edge-collection when blocks are
deleted without the callback -- not normally a problem, but if a new
block is allocated with the same address as an old block, spurious
branch probabilities will be recorded about it. The fix is to transfer
the handles from the source BPI object.
This was exposed by an unrelated debug-info change, it probably just
shifted around allocation orders to expose this. Detected as
nondeterminism and reduced by Zequan Wu:
f1b0a54451 (commitcomment-136737090)
(No test because IMHO testing for a behaviour that varies with memory
allocators is likely futile; I can add the reproducer with a CHECK for
the relevant branch weights if it's desired though)
This patch addresses the issue regarding the call of bcopy function in a
conditional expression.
It is analogous to the already accepted patch which deals with the same
problem, just regarding the bzero function [0].
Here is the testcase which illustrates the issue:
```
void bcopy(const void *, void *, unsigned long);
void foo(void);
void test_bcopy() {
char dst[20];
char src[20];
int _sz = 20, len = 20;
return (_sz
? ((_sz >= len)
? bcopy(src, dst, len)
: foo())
: bcopy(src, dst, len));
}
```
When processing it with clang, following issue occurs:
Instruction does not dominate all uses!
%arraydecay2 = getelementptr inbounds [20 x i8], ptr %dst, i64 0, i64 0,
!dbg !38
%cond = phi ptr [ %arraydecay2, %cond.end ], [ %arraydecay5,
%cond.false3 ], !dbg !33
fatal error: error in backend: Broken module found, compilation aborted!
This happens because an incorrect phi node is created. It is created
because bcopy function call is lowered to the call of llvm.memmove
intrinsic and function memmove returns void *. Since llvm.memmove is
called in two places in the same return statement, clang creates a phi
node in the final basic block for the return value and that phi node is
incorrect. However, bcopy function should return void in the first
place, so this phi node is unnecessary. This is what this patch
addresses. An appropriate test is also added and no existing tests fail
when applying this patch.
Also, this crash only happens when LLVM is configured with
-DLLVM_ENABLE_ASSERTIONS=On option.
[0] https://reviews.llvm.org/D39746
The PTX ISA specifies that initializers may be incomplete ([5.4.4.
Initializers](https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#initializers))
> As in C, array initializers may be incomplete, i.e., the number of
initializer elements may be less than the extent of the corresponding
array dimension, with remaining array locations initialized to the
default value for the specified array type.
Emitting initializers in this form is preferable because it reduces the
size of the PTX, in some cases significantly, and can improve compile
time of ptxas as a result.
Moving the contents of Coefficients saves 0.43% of heap allocations
during the compilation of a large preprocessed file, namely
X86ISelLowering.cpp, for the X86 target.
The two single source cases aren't effected by the swap or select matching
as those are dual operand specific. Similarly, a two source shuffle can't
be a rotate.
We can extend this idea for some of the shuffle types above, but some of
them are validly either single or dual source. We don't want to loose that
and the code complexity of versioning early and having to repeat some shuffle
kinds doesn't (currently) seem worth it.
After building the ClangdXPC target, `ninja clean` fails with the
following error:
```
ninja: error: remove(lib/ClangdXPC.framework): Directory not empty
ninja: error: remove(<build>/lib/ClangdXPC.framework): Directory not empty
```
I did not find better way to make this work. I guess we could list all
generated files (and directories) in `OUTPUT` of the custom command, but
that seems fairly tedious/fragile.