This patch moves the CommonArgs utilities into a location visible by the
Frontend Drivers, so that the Frontend Drivers may share option parsing
code with the Compiler Driver. This is useful when the Frontend Drivers
would like to verify that their incoming options are well-formed and
also not reinvent the option parsing wheel.
We already see code in the Clang/Flang Drivers that is parsing and
verifying its incoming options. E.g. OPT_ffp_contract. This option is
parsed in the Compiler Driver, Clang Driver, and Flang Driver, all with
slightly different parsing code. It would be nice if the Frontend
Drivers were not required to duplicate this Compiler Driver code. That
way there is no/low maintenance burden on keeping all these parsing
functions in sync.
Along those lines, the Frontend Drivers will now have a useful mechanism
to verify their incoming options are well-formed. Currently, the
Frontend Drivers trust that the Compiler Driver is not passing back junk
in some cases. The Language Drivers may even accept junk with no error
at all. E.g.:
`clang -cc1 -mprefer-vector-width=junk test.c'
With this patch, we'll now be able to tighten up incomming options to
the Frontend drivers in a lightweight way.
---------
Co-authored-by: Cameron McInally <cmcinally@nvidia.com>
Co-authored-by: Shafik Yaghmour <shafik.yaghmour@intel.com>
When using `-fcs-generate-profile` with distributed thin-lto in the same
fashion we do for local thin-lto, we hit the following assertion:
6041c745f3/llvm/lib/Support/PGOOptions.cpp (L36)
Using local thin-lto with LLD for MachO, we set the missing path
automatically to a default value: https://reviews.llvm.org/D151589. In
this fix we add the same behavior.
---------
Co-authored-by: Nuri Amari <nuriamari@fb.com>
Sometimes, when a user writes invalid code, the minimization used for
scanning can create a stream of tokens that is invalid at lex time. This
patch protects against the case where there are valid (non-c++20) import
directives discovered in the middle of an invalid `import` declaration.
Mostly authored by: @akyrtzi
resolves: rdar://152335844
Add `CLOCKS_PER_SEC` and the older `CLK_TCK`. Allows the user to define
a `__CLK_TCK` to override if necessary.
Also add an extra column for embedded AArch64 in `time.rst`
Extend folding for `X Pred C2 ? X BOp C1 : C2 BOp C1` to `min/max(X, C2)
BOp C1` to allow min and max as `BOp`. This ensures a constant clamping
pattern is folded into a pair of min/max instructions. Here is a
simplified example of a case where this folding is not occurring
currently.
int clampToU8(int v) {
if (v < 0) return 0;
if (v > 255) return 255;
return v;
}
https://godbolt.org/z/78jhKPWbv
Generic proof: https://alive2.llvm.org/ce/z/cdpLYy
We currently generate code like this on x86 for a jump table with 5 elements,
assuming the call target is in rbx:
lea global_addr(%rip), %rax # initialize temporary rax with base address
mov %rbx, %rcx # initialize another temporary rcx for index (rbx will be used for the call, so it is still live)
sub %rax, %rcx # compute `address - base`
ror $0x3, %rcx # compute `(address - base) ror 3` i.e. index
cmp $0x4, %rcx # check index <= 4
ja .Ltrap
[...]
.Ltrap:
ud1
A more efficient instruction sequence, that only needs one temporary
register and one fewer instruction, is possible by subtracting the
address we are testing from the fixed address instead of vice versa:
lea (global_addr + 4*8)(%rip), %rax # initialize temporary rax with address of last element
sub %rbx, %rax # compute `last element - address`
ror $0x3, %rax # compute `(last element - address) ror 3` i.e. 4 - index
cmp $0x4, %rax # check 4 - index <= 4 (same as above)
ja .Ltrap
[...]
.Ltrap:
ud1
Change LowerTypeTests to generate that sequence. As a consequence, the
order of bits in the bitsets is reversed. Because it doesn't matter how we
do the subtraction on other architectures (to the best of my knowledge),
do so unconditionally.
Reviewers: fmayer, vitalybuka
Reviewed By: fmayer
Pull Request: https://github.com/llvm/llvm-project/pull/142887
The commands to run the compilation when printed with `-###` contain
various irrelevant lines for the perf-training. Most of them are
filtered out already but when configured with
`CLANG_CONFIG_FILE_SYSTEM_DIR` a new line like the following is
added and needs to be filtered out:
`Configuration file: /etc/clang/x86_64-redhat-linux-gnu-clang.cfg`
The string_utils.h file previously included both memcpy and bzero. There
were no uses of bzero, and only one use of memcpy which was replaced
with __builtin_memcpy.
Also fix strsep which was broken by this change, fix a useless assert of
"sizeof(char) == sizeof(cpp::byte)", and update the bazel.
'host_data' has its own Op kind, so this handles the lowering there, it
looks exactly like the other ones we've done so far, so nothing novel
here.
host_data takes 3 clauses, 1 of which is required.
'use_device' is required, and results in an acc.use_device operation,
which then feeds into the dataOperands list on acc.host_data.
'if_present' is a simple attribute on the operand.
'if' is a condition on the operand, identical to our other handling of
'if'.
This patch handles all of these.
This patch introduces support for the DWARF operation
`DW_OP_GNU_implicit_pointer `(value 0xf3) within LLVM's DWARF parsing
and expression evaluation infrastructure. This GNU extension is used to
describe the location of a variable that is itself a pointer, where the
value of this pointer is stored at an address derived from another DWARF
location expression plus a constant offset.
Motivation:
Compilers like GCC use `DW_OP_GNU_implicit_pointer `to represent the
location of certain variables.Without support for this opcode, debuggers
like LLDB (and other tools relying on LLVM's DWARF libraries) cannot
correctly resolve the location of such variables, leading to an
inability to inspect their values or an incorrect debugging experience.
Some of the QC relocations are relaxable, in particular there are
relaxations defined for:
- `R_RISCV_QC_E_JUMP_PLT`
- `R_RISCV_QC_E_32`
- `R_RISCV_QC_ABS20_U`
This change ensures that llvm-mc correctly emits R_RISCV_RELAX
relocations for the relevant fixups.
During `hlfir.assign` inlining and `ElementalAssignBufferization`
we can deduce the optimal shape from `lhs` and `rhs` shapes.
It is probably better be done in a separate pass that propagates
constant shapes, but I have not seen any benchmarks that would
benefit from this yet. So consider this as a workaround for a bigger
TODO issue.
The `ElementalAssignBufferization` case is from 465.tonto,
but I do not have performance results yet (I do not expect much).
If there is a read-effect operation inside `hlfir.elemental`,
there is no reason to block moving it to the assignment point
unless there are write-effect operations between the elemental
and the assignment. The previous code was disallowing the optimization
even if there were only read-effect operations in between.
This case is from 465.tonto, though this change does not improve
performance at all.
AArch64ISelLowering.cpp currently fails -Wunused-variable because SrcVT
is only used in assert(), so it is an unused variable if not using debug
builds. This behavior was introduced in 2c0a2261.
See discussion in https://cplusplus.github.io/LWG/issue4239
std::flat_map<std::string, int, std::less<>> m;
m.try_emplace("abc", 5); // hard error
The reason is that we specify in 23.6.8.7 [flat.map.modifiers]/p21
the effect to be as if `ranges::upper_bound` is called.
`ranges::upper_bound` requires indirect_strict_weak_order, which
requires the comparator to be invocable for all combinations. In this
case, it requires
const char (&)[4] < const char (&)[4]
to be well-formed, which is no longer the case in C++26 after
https://wg21.link/P2865R6.
This patch uses `std::upper_bound` instead.
This folds away subs(csel(1, 0, cc)) from brcond, that can be produced
in certain places from compares that are not already subs (like adc/sbc
generated from i128 add_with_overflow intrinsics).
This is the GISel equivalent of scalar_to_vector, making sure that when
we insert into undef we use a fmov that avoids the artificial dependency
on the previous register. This adds v2i32 and v2i64 patterns too for
similar reasons.
Currently, DWARFDataExtractor can extract data without performing
relocations, (eg, by checking if the section pointer is null) but is
coded such that it still depends on all the relocation machinery, like
DWARFSections and similar. All at build time.
Extract most functionality into a new class, DWARFDataExtractorBase, and
have DWARFDataExtractor add the relocation dependent pieces via CRTP.
Add a new class, DWARFDataExtractorSimple, which does no relocation at
all. This will allow moving DWARFDataExtractorSimple into a new lower-level,
lighter-weight library with fewer external build-time dependencies.
This is another in a series of refactoring changes to create a new
better-layered, low-level Dwarf library that can be called from
lower-level code without circular dependencies.
Move the parts that aren't dependent on the subtarget into
RuntimeLibcallInfo, which should contain the superset of all possible
runtime calls and be accurate outside of codegen.
This patch ensures we can find decls in submodules during expression
evaluation. Previously, submodules would have all their decls marked as
`Hidden`. When Clang asked LLDB for decls, it would see them in the
submodule but `clang::Sema` would reject them because they weren't
`Visible` (specifically, `getAcceptableDecl` would fail during
`CppNameLookup`). Here we just mark the submodule as visible to work
around this problem.
Transform `select_cc t1, t2, -1, 0` for floats into a vector comparison
which generates a mask, which is later on combined with potential
vectorized DUPs.
In 8de51375f1 and related patches, we
added some code to avoid triggering -Wenum-constexpr-conversion in some
cases. This isn't necessary anymore because -Wenum-constexpr-conversion
doesn't exist anymore. And the checks are subtly wrong: they exclude
cases where we actually do need to check the conversion. This patch gets
rid of the unnecessary checks.
To map `fir.boxchar` types reliably onto an offload target, such as a
GPU, the `omp.map.info` operation is used to map the underlying data
pointer (`fir.ref<fir.char<k, ?>>`) wrapped by the `fir.boxchar` MLIR
value. The `omp.map.info` operation needs a pointer to the underlying
data pointer.
Given a reference to a descriptor (`fir.box`), the `fir.box_offset` is
used to obtain the address of the underlying data pointer. This PR
extends `fir.box_offset` to provide the same functionality for
`fir.boxchar` as well.