Despite previous efforts in fixing accidentally setting deduplication factor and avoiding enforcing a callsite debug loc for pseudo probes, I'm still seeing an IR probe having a non-zero discriminator. This time it is due to the merge of two probes with irreconsilable debug locations and the merged probe ends up getting the original callsite locs. Therefore I'm removing the assert about IR probe should always have a zero discriminator. This safe since
- Probe discriminators are only emitted in FS-AFDO mode; and
- The first FS discriminator assigning pass always clears non-discriminators left over from IR passes.
Reviewed By: wenlei
Differential Revision: https://reviews.llvm.org/D155252
This is a preparation for the upcoming LLVM 17 release.
Reviewed By: ldionne, jloser, H-G-Hristov, #libc
Differential Revision: https://reviews.llvm.org/D154874
Fat LTO objects contain both LTO compatible IR, as well as generated
object code. This allows users to defer the choice of whether to use LTO
or not to link-time. This is a feature available in GCC for some time,
and makes the existing -ffat-lto-objects flag functional in the same
way as GCC's.
This patch adds support for that flag in the driver, as well as setting the
necessary codegen options for the backend. Largely, this means we select
the newly added pass pipeline for generating fat objects.
Users are expected to pass -ffat-lto-objects to clang in addition to one
of the -flto variants. Without the -flto flag, -ffat-lto-objects has no
effect.
// Compile and link. Use the object code from the fat object w/o LTO.
clang -fno-lto -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Select full LTO at link time.
clang -flto -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Select ThinLTO at link time.
clang -flto=thin -ffat-lto-objects -fuse-ld=lld foo.c
// Compile and link. Use ThinLTO with the UnifiedLTO pipeline.
clang -flto=thin -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c
// Compile and link. Use full LTO with the UnifiedLTO pipeline.
clang -flto -ffat-lto-objects -funified-lto -fuse-ld=lld foo.c
// Link separately, using ThinLTO.
clang -c -flto=thin -ffat-lto-objects foo.c
clang -flto=thin -fuse-ld=lld foo.o -ffat-lto-objects # pass --lto=thin --fat-lto-objects to ld.lld
// Link separately, using full LTO.
clang -c -flto -ffat-lto-objects foo.c
clang -flto -fuse-ld=lld foo.o # pass --lto=full --fat-lto-objects to ld.lld
Original RFC: https://discourse.llvm.org/t/rfc-ffat-lto-objects-support/63977
Depends on D146776
Reviewed By: tejohnson, MaskRay
Differential Revision: https://reviews.llvm.org/D146777
The post-condition on the functions is that the buffer is not full.
This post-conditon is used as pre-condition of the push_back function.
When a copy, fill, of transform function exactly fit in the buffer this
post-condition was validated.
Reviewed By: #libc, ldionne
Differential Revision: https://reviews.llvm.org/D155397
During call stack analysis skip called noexcept functions
as they wont throw exceptions, they will crash.
Check will emit warnings for those functions separately.
Fixes: #43667, #49151, #51596, #54668, #54956
Reviewed By: carlosgalvezp
Differential Revision: https://reviews.llvm.org/D153458
Previously we returned i32 on RV32 and i64 on RV64. The instructions
only consume 32 bits and only produce 32 bits. For RV64, the result
is sign extended to 64 bits like *W instructions.
This patch removes this detail from the interface to improve
portability and consistency. This matches the proposal for scalar
intrinsics here https://github.com/riscv-non-isa/riscv-c-api-doc/pull/44
I've included IR autoupgrade support as well.
I'll be doing this for other builtins/intrinsics that currently use
'long' in other patches.
Reviewed By: VincentWu
Differential Revision: https://reviews.llvm.org/D154647
Delete the backslash. It was there to compile tablegen file. It looks like space also works fine.
Reviewed By: springerm
Differential Revision: https://reviews.llvm.org/D155474
When building with debug info enabled, some load/store instructions do
not have a DebugLocation attached. When using the default IRBuilder, it
attempts to copy the DebugLocation from the insertion-point instruction.
When there's no DebugLocation, no attempt is made to add one.
Add a fallback DebugLocation with the help of InstrumentationIRBuilder for
memintrinsics. In particular, the compiler may optimize load/store without
debug info into memintrinsics, which then are missing debug info as well.
When building the kernel with LTO, KCOV & debug information enabled,
multiple inlinable SanitizerCoverage functions require debug information
present.
In such cases we repurpose the InstrumentationIRBuilder that ensures
the necessary debug information is added if necessary.
This has been done analogous to the work for the ThreadSanitizer
in D124937.
Bug: https://github.com/ClangBuiltLinux/linux/issues/1721
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D155377
When building the kernel with LTO, KASAN & debug information enabled,
multiple inlinable AddressSanitizer functions require debug information
present.
In such cases we repurpose the InstrumentationIRBuilder that ensures
the necessary debug information is added if necessary.
This has been done analogous to the work for the ThreadSanitizer
in D124937.
Bug: https://github.com/ClangBuiltLinux/linux/issues/1721
Reviewed By: melver
Differential Revision: https://reviews.llvm.org/D155376
This revision fixes `hasTensorSemantics` and `hasBufferSemantics` for vector transfer ops, which may have a vector operand. `VectorType` implements `ShapedType` and such operands do not affect whether an op has tensor or buffer semantics. Also implement `DestinationStyleOpInterface` on `TransferReadOp` so that `hasTensorSemantics`/`hasBufferSemantics` can be called. (The op has no inits, but this makes it symmetric to `TransferWriteOp`.)
Differential Revision: https://reviews.llvm.org/D155469
This mirrors the test-lower-to-llvm pass pipeline that provides some sanity when running e2e examples.
One peculiarity of the GPU pipeline is that we want to allow 32b indexing in kernels.
This is currently not straightforward as there are dependencies between passes.
This new test pass orders passes in a way that connects end-to-end.
Differential Revision: https://reviews.llvm.org/D155463
This work introduce `cp.async.bulk.tensor.shared.cluster.global` in NVVM dialect that executes load using TMA.
Depends on D155056
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155060
This work improves verifier for invalid cases. It is NFC.
Reviewed By: nicolasvasilache, springerm
Differential Revision: https://reviews.llvm.org/D155448
Inspired by some of the cases from D145468
Let SimplifyDemandedBits handle the narrowing of lshr to half-width if we don't require the upper bits, the narrowed shift is profitable and the zext/trunc are free.
A future patch will propose the equivalent shl narrowing combine.
Differential Revision: https://reviews.llvm.org/D146121
Add a simple transform operation to the NVGPU extension that performs
software pipelining of copies to shared memory. The functionality is
extremely minimalistic in this version and only supports copies from
global to shared memory inside an `scf.for` loop with either
`vector.transfer` or `nvgpu.device_async_copy` operations when
pipelining preconditions are already satisfied in the IR. This is the
minimally useful version that uses the more general loop pipeliner in an
NVGPU-specific way. Further extensions and orthogonalizations will be
necessary.
This required a change to the loop pipeliner itself to properly
propagate errors should the predicate generator fail.
This is loosely inspired from the vesion in IREE, but has less unsafe
assumptions and more principled way of communicating decisions.
Reviewed By: nicolasvasilache
Differential Revision: https://reviews.llvm.org/D155223
We have guarantees that induction variable will not overflow in the main
loop after the loop constrained. Therefore we can add no wrap flags on
its base in order not to miss info that loop is countable.
Add NSW flag now, since adding NUW flag requires a bit more complicated
analysis.
Reviewed By: skatkov
Differential Revision: https://reviews.llvm.org/D154954
XRayFileHeader storage was obtained from std::aligned_storage
using its default alignment and not the struct's alignment
requirement. This was causing a bus error on AArch32, on armv8
machines, where vld1.64/vst1.64 instructions with 128-bit
alignment requirement were being used to copy XRayFileHeader.
There is still another issue with fdr-single-thread.cpp test on
armv7. Now it runs until completion and produces a valid log file,
but for some reason the function name appears as _end in it,
instead of the expected mangled fn name.
Reviewed By: MaskRay
Differential Revision: https://reviews.llvm.org/D155013
The ObjC-block detection code only supports a single token as the return type. Add support to detect pointers, too (ObjC has lots of object-pointers).
For example, using `BasedOnStyle: WebKit`, the following is stable output:
```
int* p = ^int*(void)
{ //
return nullptr;
}
();
```
After the patch, this is stable:
```
int* p = ^int*(void) { //
return nullptr;
}();
```
Differential Review: https://reviews.llvm.org/D146434
Our threading support layer is currently a huge mess. There are too many
configurations with too many confusing names, and none of them are tested
in the usual CI. Here's a list of names related to these configurations:
LIBCXX_BUILD_EXTERNAL_THREAD_LIBRARY
_LIBCPP_BUILDING_THREAD_LIBRARY_EXTERNAL
LIBCXXABI_BUILD_EXTERNAL_THREAD_LIBRARY
_LIBCPP_HAS_THREAD_LIBRARY_EXTERNAL
LIBCXX_HAS_EXTERNAL_THREAD_API
_LIBCPP_HAS_THREAD_API_EXTERNAL
This patch cleans this up by removing the ability to build libc++ with
an "external" threading library for testing purposes, removing 4 out of
6 "names" above. That setting was meant to be used by libc++ developers,
but we don't use it in-tree and it's not part of our CI.
I know the ability to use an external threading API is used by some folks
out-of-tree, and this patch doesn't change that. This only changes the
way they will have to test their external threading support. After this
patch, the intent would be for them to set `-DLIBCXX_HAS_EXTERNAL_THREAD_API=ON`
when building the library, and to provide their usual `<__external_threading>`
header when they are testing the library. This can be done easily now
that we support custom lit configuration files in test suites.
The motivation for this patch is that our threading support layer is
basically unmaintainable -- anything beyond adding a new "backend" in
the slot designed for it requires incredible attention. The complexity
added by this setting just doesn't pull its weigh considering the
available alternatives.
Concretely, this will also allow future patches to clean up
`<__threading_support>` significantly.
Differential Revision: https://reviews.llvm.org/D154466
This pass will mark functions called from TargetOp's
and declare target functions as implicitly declare
target by adding the MLIR declare target attribute
directly to the function.
This pass executes after the initial lowering of Fortran's PFT
to MLIR (FIR/OMP+Arith etc.) and is one of a series of passes
that aim to clean up the MLIR for offloading (seperate passes
in different patches, one for early outlining, another for declare
target function filtering).
Reviewers: jsjodin, skatrak, kiaranchandramohan
Differential Revision: https://reviews.llvm.org/D154247
* Move passes to `Transforms` directory.
* Add `Utils.h` (will be utilized in a subsequent change).
Differential Revision: https://reviews.llvm.org/D155427
We don't have any code to point at here, so the diagnostics would just
point to the record declaration. Make them point to the call site
intead.
Differential Revision: https://reviews.llvm.org/D154761
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155181
We are trying to build the compiler-rt as big-endian. And found that the tests compiler-rt/test/builtins/Unit/arm/aeabi_cdcmpeq_test.c and compiler-rt/test/builtins/Unit/arm/aeabi_cfcmpeq_test.c do not work on big endian at the moment. This patch makes these tests work on big endian as well.
Reviewed By: peter.smith, simon_tatham
Differential Revision: https://reviews.llvm.org/D155208
This is a follow up on D154800 and D154770 to make the code structure more principled and avoid too many nested #ifdef/#endif.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D155174
This is to address a static analizer warning:
The pointer field will point to an arbitrary memory location, any
attempt to write may cause corruption. In <unnamed>
R600DAGToDAGISel.:R600DAGToDAGISel (llvm::TargetMachine &,
livm::CodeGenOpt::Level): A pointer field is not initialized in the
constructor (CWE-457)
Differential Revision: https://reviews.llvm.org/D154414
If somehow a vXi64 bool sign_extend_inreg pattern has been lowered to vector shifts (without PSRAQ support), then try to canonicalize to vXi32 shifts to improve likelihood of value tracking being able to fold them away.
Using a PSLLQ and bitcasted PSRAD node make it very difficult for later fold to recover from this.