That type trait represents whether move-assigning an object is
equivalent to destroying it and then move-constructing a new one from
the same argument. This will be useful in a few places where we may want
to destroy + construct instead of doing an assignment, in particular
when implementing some container operations in terms of relocation.
This is effectively adding a library emulation of P2786R12's
is_replaceable trait, similarly to what we do for trivial relocation.
Eventually, we can replace this library emulation by the real
compiler-backed trait.
This is building towards #129328.
When designating an array component that has non-default lower bounds
the bridge was producing hlfir designates yielding reference types,
which did not preserve the bounds information. Then, when creating
components, unadjusted indices were used when initializing the
structure.
We could look at the declaration to get the shape parameter, but this
would not be preserved if the component were passed as a block argument.
These results must be boxed, but we also must not lose the contiguity
information either. To address contiguity, annotate these boxes with the
`contiguous` attribute during designation.
Note that other designated entities are handled inside the
HlfirDesignatorBuilder while component designators are built in
HlfirBuilder. I am not sure if this handling should be moved into the
designator builder or left in the general builder, so feedback is
welcome.
Also, I wouldn't mind finding a test that demonstrates a box-designated
component with the contiguous attribute really is determined to be
contiguous by any passes down the line checking for that. I don't have a
test like that yet.
(NFC modulo debug output changes)
Add generic helper to print phi operands (incoming values) together with
their incoming blocks.
As more and more transforms are added, keeping the incoming blocks of
phis becomes more important. Print incoming blocks via VPPhiAcessors, to
make debugging easier.
Move the Darwin framework search path logic from
InitHeaderSearch::AddDefaultIncludePaths to
DarwinClang::AddClangSystemIncludeArgs. Add a new -internal-iframework
cc1 argument to support the tool chain adding these paths.
Now that the tool chain is adding search paths via cc1 flag, they're
only added if they exist, so the Preprocessor/cuda-macos-includes.cu
test is no longer relevant.
Change Driver/driverkit-path.c and Driver/darwin-subframeworks.c to do
-### style testing similar to the darwin-header-search and
darwin-embedded-search-paths tests. Rename darwin-subframeworks.c to
darwin-framework-search-paths.c and have it test all framework search
paths, not just SubFrameworks.
Add a unit test to validate that the myriad of search path flags result
in the expected search path list.
Fixes https://github.com/llvm/llvm-project/issues/75638
Commit 776476c282 (PR117404), which
introduced the radix tree representation of allocation context summary
records, incorrectly changed the description of the
FS_COMBINED_CALLSITE_INFO record instead of the intended
FS_COMBINED_ALLOC_INFO record.
This PR updates the `do concurrent` to OpenMP mapping pass to use the
newly added `fir.do_concurrent` ops that were recently added upstream
instead of handling nests of `fir.do_loop ... unordered` ops.
Parent PR: https://github.com/llvm/llvm-project/pull/137928.
`GlobalOp` was parsing `thread_local` after `unnamed_addr`, but printing in the reverse order.
While here, make `AliasOp` match the same behavior and share common parts of global and alias printing.
Note that this change is possibly not NFC. The prior routines used
getConstant with XLenVT. The new wrappers will used getVectorIdxConstant
instead. Digging through the code, the type used for the index will be
the integer of pointer width from DL. For typical RV32 and RV64
configurations the pointer will be of equal width to XLEN, but you could
have a 32b pointer on an RV64 machine.
Address post-commit review feedback for PR139092 (and fix another
instance of the same code). Save and restore option values via a saved
bool value, instead of invoking cl::ResetAllOptionOccurrences.
On AArch64, we create optional/weak relocations that may not be
processed due to the relocated value overflow. When the overflow
happens, we used to enforce patching for all functions in the binary via
--force-patch option. This PR relaxes the requirement, and enforces
patching only for functions that are target of optional relocations.
Moreover, if the compact code model is used, the relocation overflow is
guaranteed not to happen and the patching will be skipped.
Add a new instrumentation section type `[sample-coldcov]` to
support`-fprofile-list` for sample pgo based cold function coverage.
Note that the current cold function coverage is based on sampling PGO
pipeline, which is incompatible with the existing [llvm] option(see
[PGOOptions](https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/Support/PGOOptions.h#L27-L43)),
so we can't reuse the IR-PGO(-fprofile-instrument=llvm) flag.
This PR adds support for moving scalar uniform (gpu index ops, constants
etc) outside the `gpu.warp_execute_on_lane0` op. These kinds of ops do
not require distribution and are safe to move out of the warp op. This
also avoid adding separate distribution patterns for these ops.
Example:
```
%1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) {
...
%block_id_x = gpu.block_id x
gpu.yield %block_id_x
}
// use %1
```
To:
```
%block_id_x = gpu.block_id x
%1 = gpu.warp_execute_on_lane_0(%laneid) -> (index) {
...
gpu.yield %block_id_x
}
// use %1
```
Update the f64 to f16 lowering for targets which support f16 types.
For unsafe mode, lowered to two FP_ROUND. (This patch
https://reviews.llvm.org/D154528 stops from combining these two FP_ROUND
back). In safe mode, select LowerF64ToF16 (round-to-nearest-even
rounding mode)
The C++26 standard relocatable type traits has slightly different
semantics, so we introduced a new
``__builtin_is_cpp_trivially_relocatable``
when implementing trivial relocation in #127636.
However, having multiple relocatable traits would be confusing
in the long run, so we deprecate the old trait.
As discussed in #127636
`__builtin_is_cpp_trivially_relocatable` should be used instead.
Credit to @krzysz00 who discovered this subtle bug in `MemRefUtils`. The
problem is in `getLinearizedMemRefOffsetAndSize()` utility. In
particular, how this subroutine computes the linearized size of a memref
is incorrect when given a non-packed memref.
### Background
As context, in a packed memref of `memref<8x8xf32>`, we'd compute the
size by multiplying the size of dimensions together. This is implemented
by composing an affine_map of `affine_map<()[s0, s1] -> (s0 * s1)>` and
then computing the result of size via `%size = affine.apply #map()[%c8,
%c8]`.
However, this is wrong for a non-packed memref of `memref<8x8xf32,
strided<[1024, 1]>>`. Since the previous computed multiplication map
will only consider the dimension sizes, it'd continue to conclude that
the size of the non-packed memref to be 64.
### Solution
This PR come up with a fix such that the linearized size computation
take strides into consideration. It computes the maximum of (dim size *
dim stride) for each dimension. We'd compute the size via the affine_map
of `affine_map<()[stride0, size0, stride1] -> ((stride0 * size0), 1 *
size1)>` and then computing the size via `%size = affine.max
#map()[%stride0, %size0, %size1]`. In particular for the new non-packed
memref, the size will be derived as max(1024\*8, 1\*8) = 8192 (rather
than the wrong size 64 computed by packed memref equation).
Most of the processing in emitAction is in an unneeded else-block--
reduce indentation by exiting after the recursive call.
`XXXGenCallingConv.inc` are identical before and after this patch for
all targets.
Update initial VPlan construction to include exit conditions and edges.
The loop region is now first constructed without entry/exiting. Those
are set after inserting the region in the CFG, to preserve the original
predecessor/successor order of blocks.
For now, all early exits are disconnected before forming the regions,
but a follow-up will update uncountable exit handling to also happen
here. This is required to enable VPlan predication and remove the
dependence any IR BBs
(https://github.com/llvm/llvm-project/pull/128420).
PR: https://github.com/llvm/llvm-project/pull/137709
This patch adds a couple of improvements to the LLVM emission of DWARF
variant parts. One of these is desirable for Ada, and the other is
required.
Currently, when emitting a discriminant, LLVM follows the precise letter
of the DWARF standard, which says:
If the variant part has a discriminant, the discriminant is
represented by a separate debugging information entry which is a
child of the variant part entry.
However, for Ada this does not really make sense. In Ada, the
discriminant field exists outside of any variant part, and it makes more
sense to emit it separately rather than redundantly emit the field once
for each variant part.
This extension was arrived at when this was implemented in GCC, and was
accepted for DWARF 6, see:
https://dwarfstd.org/issues/180123.1.html
Here the patch simply lifts this restriction: if the discriminant field
was already emitted, it isn't re-emitted. This approach allows the Ada
compiler to do what it needs without affecting the Rust output.
Second, this patch extends the discriminant to allow multiple values.
This is needed by Ada. Here, I chose to use a ConstantDataArray of pairs
of integers, with each pair representing a range, as Ada also allows
ranges here. This seemed like a reasonably convenient representation.
Follow up to 6e654caab, use the new routines in more places. Note that
I've excluded from this patch any case which uses a getConstant index
instead of a getVectorIdxConstant index just to minimize room for
error. I'll get those in a separate follow up.
RISCVVectorPeepholePass would replace instructions with all-ones mask
with their unmask variant, so there isn't really a point to keep
separate versions of intrinsics.
Note that `riscv.segN.load/store.mask` does not take pointer type (i.e.
address space) as part of its overloading type signature, because RISC-V
doesn't really use address spaces other than the default one.
Fixes#137209
This PR:
- Adds a case to `expandIntrinsic()` in `DXILIntrinsicExpansion.cpp` to
expand the `Intrinsic::is_fpclass` in the case of
`FPClassTest::fcNegZero`
- Defines the `IsNaN`, `IsFinite`, `IsNormal` DXIL ops in `DXIL.td`
- Adds a case to `lowerIntrinsics()` in `DXILOpLowering.cpp` to handle
the lowering of `Intrinsic::is_fpclass` to the DXIL ops `IsNaN`,
`IsInf`, `IsFinite`, `IsNormal` when the FPClassTest is `fcNan`,
`fcInf`, `fcFinite`, and `fcNormal` respectively
- Creates a test `llvm/test/CodeGen/DirectX/is_fpclass.ll` to exercise
the intrinsic expansion and DXIL op lowering of `Intrinsic::is_fpclass`
~~A separate PR will be made to remove the now-redundant `dx_isinf`
intrinsic to address #87777.~~
A proper implementation for the lowering of the `llvm.is.fpclass`
intrinsic to handle all possible combinations of FPClassTest can be
implemented in a separate PR. This PR's implementation focuses primarily
on addressing the current use-cases for DirectML and HLSL intrinsics.
Change the default statusline format to print "no target" when lldb is
launched without a target. Currently, the statusline is empty, which
looks rather odd.
According to the documentation
> A header declaration that does not contain `exclude` nor `textual`
specifies a header that contributes to the enclosing module.
Which means that `exclude` and `textual` header don't contribute to the
enclosing module and their presence isn't required to build such a
module. The keywords tell clang how a header should be treated in a
context of the module but they don't add headers to the module.
When a textual header *is* used, clang still emits "file not found"
error pointing to the location where the missing file is included.
Add constant-folding support for the maximumnum and minimumnum
intrinsics, and extend the tests to show the qnan vs snan behavior
differences between maxnum/maximum/maximumnum.
This PR fixes `ExtractSliceOfPadTensorSwapPattern` to support
rank-reducing `tensor.extract_slice` ops, which were previously
unhandled and could cause crashes. To support this, an additional
`tensor.extract_slice` is inserted after `tensor.pad` to reduce the
result rank.
Now that the clang.arc.attachedcall bundle requires having an operand,
which we emit a call to in the RVMARKER sequence, we can achieve our
real goal: make the marker NOP optional.
The intention is that a new ObjC runtime call will be introduced, which
doesn't require the NOP to be present, but must be adjacent to the
possibly-autorelease-returning call (that the bundle is attached to).
This is achieved by having ISel embed whether the marker is necessary
with an additional boolean target immediate operand.
Co-authored-by: Ahmed Bougacha <ahmed@bougacha.org>