We currently have a bug where the legalizer, when dealing with phi operands,
may create instructions in the phi's incoming blocks at points which are effectively
dead due to a possible exception throw.
Say we have:
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
B returnbb
bb:
%v = phi i1 %true, throwbb, %false....
When legalizing we may need to widen the i1 %true value, and to do that we need
to create new extension instructions in the incoming block. Our insertion point
currently is the MBB::getFirstTerminator() which puts the IP before the unconditional
branch terminator in throwbb. These extensions may never be executed if the call
throws, and therefore we need to emit them before the call (but not too early, since
our new instruction may need values defined within throwbb as well).
throwbb:
EH_LABEL
x0 = %callarg1
BL @may_throw_call
EH_LABEL
%true = G_CONSTANT i32 1 ; <<<-- ruh'roh, this never executes if may_throw_call() throws!
B returnbb
bb:
%v = phi i32 %true, throwbb, %false....
To fix this, I've added two new instructions. The main idea is that G_INVOKE_REGION_START
is a terminator, which tries to model the fact that in the IR, the original invoke inst
is actually a terminator as well. By using that as the new insertion point, we
make sure to place new instructions on always executing paths.
Unfortunately we still need to make the legalizer use a new insertion point API
that I've added, since the existing `getFirstTerminator()` method does a reverse
walk up the block, and any non-terminator instructions cause it to bail out. To
avoid impacting compile time for all `getFirstTerminator()` uses, I've added a new
method that does a forward walk instead.
Differential Revision: https://reviews.llvm.org/D137905
Noticed while trying to use llvm-exegesis to get some accurate capture numbers on some old Atom/Silverment hardware as part of the work with D103695.
These targets' frontends are particularly poor and the use of the xmm8-xmm15 SSE registers results in longer instruction encodings which were affecting the latency/throughput estimates.
Thanks to @lebedev.ri for the --skip-measurements command line argument which made testing much easier!
Differential Revision: https://reviews.llvm.org/D138832
Sometimes we only want to ensure that we can produce snippets (all the way
through `SnippetRepetitor`!), but don't care for the execution.
E.g. all of our tests are this way.
I've built LLVM without PFM and removed my CPU from `X86PfmCounters.td`,
and this produces the expected results in that configuration.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D139448
This reverts commit 7883e5b061.
The original commit was reverted that it didn't update test files after D136263
landed. The recommit fixed those.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139509
The patch made VectorLegalizer expand ISD::VP_FSHL and ISD::VP_FSHR to
achieve the codegen.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D138379
These deprecated functions are incompatible with opaque pointers,
and have replacements that accept an explicit type. Drop them now
as a final warning to consumers of the C API to migrate their code
(while LLVMGetElementType still exists as a temporary workaround).
Differential Revision: https://reviews.llvm.org/D135271
Leaves the implementation and tests files in-place for right now, but
deletes the ability to build the old sanitizer-common based scudo. This
has been on life-support for a long time, and the newer scudo_standalone
is much better supported and maintained.
Also patches up some GWP-ASan wording, primarily related to the fact
that -fsanitize=scudo now is scudo_standalone, and therefore the way to
reference the GWP-ASan options through the environment variable has
changed.
Future follow-up patches will delete the original scudo, and migrate all
its tests over to be part of the scudo_standalone test suite.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D138157
It may be necessary to build additional targets before running
perf-training, the typical use case would be builtins and runtimes.
This change allows users to specify those dependencies as:
set(CLANG_PERF_TRAINING_DEPS builtins runtimes CACHE STRING "")
Differential Revision: https://reviews.llvm.org/D138974
Since opt no longer supports to run default (O0/O1/O2/O3/Os/Oz)
pipelines using the legacy PM, there are no in-tree uses of
TargetMachine::adjustPassManager remaining. This patch removes the
no longer used adjustPassManager functions.
Reviewed By: aeubanks
Differential Revision: https://reviews.llvm.org/D137796
It broke the build, see comments on code review.
> Leaves the implementation and tests files in-place for right now, but
> deletes the ability to build the old sanitizer-common based scudo. This
> has been on life-support for a long time, and the newer scudo_standalone
> is much better supported and maintained.
>
> Also patches up some GWP-ASan wording, primarily related to the fact
> that -fsanitize=scudo now is scudo_standalone, and therefore the way to
> reference the GWP-ASan options through the environment variable has
> changed.
>
> Future follow-up patches will delete the original scudo, and migrate all
> its tests over to be part of the scudo_standalone test suite.
>
> Reviewed By: vitalybuka
>
> Differential Revision: https://reviews.llvm.org/D138157
This reverts commit ab1a5991fe.
Leaves the implementation and tests files in-place for right now, but
deletes the ability to build the old sanitizer-common based scudo. This
has been on life-support for a long time, and the newer scudo_standalone
is much better supported and maintained.
Also patches up some GWP-ASan wording, primarily related to the fact
that -fsanitize=scudo now is scudo_standalone, and therefore the way to
reference the GWP-ASan options through the environment variable has
changed.
Future follow-up patches will delete the original scudo, and migrate all
its tests over to be part of the scudo_standalone test suite.
Reviewed By: vitalybuka
Differential Revision: https://reviews.llvm.org/D138157
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in 5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. The change with fixes
were committed in b88b8307bf but reverted once again e8e92c8313 due to more
build errors.
This commit adds the prior changes and fixes the build error.
Differential Revision: https://reviews.llvm.org/D137440
Since D129288, we no longer use BlockAddress constants as operands of
callbr.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D138080
nearbyint has the property to execute without exception.
For not modifying fflags, the patch added new machine opcode
PseudoVFROUND_NOEXCEPT_V that expands vfcvt.x.f.v and vfcvt.f.x.v between a pair
of frflags and fsflags.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137685
The patch also added function expandVPBSWAP to expand ISD::VP_BSWAP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D137928
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
A prior version of this patch was commited in. It was reverted in
5e82ee5373. 2323a4ee61 reverted
that change because the unit test files caused build errors. This commit adds the original changes
and the fixed test files.
Differential Revision: https://reviews.llvm.org/D137440
On x86 and AArch, SIMD instructions encode all of the scheduling information in the instruction
itself. For example, VADD.I16 q0, q1, q2 is a neon instruction that operates on 16-bit integer
elements stored in 128-bit Q registers, which leads to eight 16-bit lanes in parallel. This kind
of information impacts how the instruction takes to execute and what dependencies this may cause.
On RISCV however, the data that impacts scheduling is encoded in CSR registers such as vtype or
vl, in addition with the instruction itself. But MCA does not track or use the data in these
registers. This patch fixes this problem by introducing Instruments into MCA.
* Replace `CodeRegions` with `AnalysisRegions`
* Add `Instrument` and `InstrumentManager`
* Add `InstrumentRegions`
* Add RISCV Instrument and `InstrumentManager`
* Parse `Instruments` in driver
* Use instruments to override schedule class
* RISCV use lmul instrument to override schedule class
* Fix unit tests to pass empty instruments
* Add -ignore-im clopt to disable this change
Differential Revision: https://reviews.llvm.org/D137440
Instruction G_IS_FPCLASS had an operand that represented floating-point
semantics of its first operand. It allowed types that have the same length,
like `bfloat16` and `half`, to be distinguished. Unfortunately, it is
not sufficient, as other operation still cannot distinguish such types.
Solution of this problem must be more general, so now this operand is removed.
Differential Revision: https://reviews.llvm.org/D138004
This change provides an implementation of the XVentanaCondOps vendor extension. This extension is defined in version 1.0.0 of the VTx-family custom instructions specification (https://github.com/ventanamicro/ventana-custom-extensions/releases/download/v1.0.0/ventana-custom-extensions-v1.0.0.pdf) by Ventana Micro Systems.
In addition to the technical contribution, this change is intended to be a test case for our vendor extension policy.
Once this lands, I plan to use this extension to prototype selection lowering to conditional moves. There's an RVI proposal in flight, and the expectation is that lowering to these and the new RVI instructions is likely to be substantially similar.
Differential Revision: https://reviews.llvm.org/D137350
This patch adds documentation into the advanced builds documentation on
how to use the BOLT caches, including the combinations with the PGO
multistage builds and (Thin)LTO.
Reviewed By: sylvestre.ledru, Amir
Differential Revision: https://reviews.llvm.org/D137899
This patch adds documentation on the AdvancedBuilds page on how to do
PGO builds with (Thin)LTO with the currently undocumented (as far as I
can tell) PGO_INSTRUMENT_LTO option in the Clang PGO caches.
Reviewed By: sylvestre.ledru
Differential Revision: https://reviews.llvm.org/D137898
This patch makes some minor fixups in the PGO section of the advanced
builds documentation in preparation for some future changes. Some minor
formatting and wording changes are included to hopefully make the
documentation more clear.
Reviewed By: sylvestre.ledru
Differential Revision: https://reviews.llvm.org/D137880
More sub-projects will be added to the table once they have been verified
to be buildable in stand-alone mode.
Reviewed By: MaskRay, mgorny
Differential Revision: https://reviews.llvm.org/D123968
GlobalVariable and Function can be available_externally. GlobalAlias is used
similarly. Allowing available_externally is a natural extension and helps
ThinLTO discard GlobalAlias in a non-prevailing COMDAT (see D135427).
For now, available_externally GlobalAlias must point to an
available_externally GlobalValue (not ConstantExpr).
Differential Revision: https://reviews.llvm.org/D137441
This switches everything to use the memory attribute proposed in
https://discourse.llvm.org/t/rfc-unify-memory-effect-attributes/65579.
The old argmemonly, inaccessiblememonly and inaccessiblemem_or_argmemonly
attributes are dropped. The readnone, readonly and writeonly attributes
are restricted to parameters only.
The old attributes are auto-upgraded both in bitcode and IR.
The bitcode upgrade is a policy requirement that has to be retained
indefinitely. The IR upgrade is mainly there so it's not necessary
to update all tests using memory attributes in this patch, which
is already large enough. We could drop that part after migrating
tests, or retain it longer term, to make it easier to import IR
from older LLVM versions.
High-level Function/CallBase APIs like doesNotAccessMemory() or
setDoesNotAccessMemory() are mapped transparently to the memory
attribute. Code that directly manipulates attributes (e.g. via
AttributeList) on the other hand needs to switch to working with
the memory attribute instead.
Differential Revision: https://reviews.llvm.org/D135780