Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.
There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
Before Windows 11 and Windows Server 2022, only one 'processor group' is assigned by default to a starting process, then the program is responsible for dispatching its own threads on more 'processor groups'. That is what 8404aeb56a was doing, allowing LLVM tools to automatically use all hardware threads in the machine.
After Windows 11 and Windows Server 2022, the OS takes care of that. This has an adverse effect reported in #56618 which is that using `GetProcessAffinityMask()` API in some edge cases seems buggy now. That API is used to detect if an affinity mask was set, and adjust accordingly the available threads for a ThreadPool.
With this patch, on one hand, we let the OS dispatch threads on all 'processor groups', but only for Windows 11 & Windows Server 2022 and after. We retain the old behavior for older OS versions. On the other hand, a workaround was added to mitigate the `GetProcessAffinityMask()` issue described above (see Threading.inc, L226).
Differential Revision: https://reviews.llvm.org/D138747
Prior to this patch, variadic DIExpressions (i.e. ones that contain
DW_OP_LLVM_arg) could only be created by salvaging debug values to create
stack value expressions, resulting in a DBG_VALUE_LIST being created. As of
the previous patch in this patch stack, DBG_INSTR_REF's syntax has been
changed to match DBG_VALUE_LIST in preparation for supporting variadic
expressions. This patch adds some minor changes needed to allow variadic
expressions that aren't stack values to exist, and allows variadic expressions
that are trivially reduceable to non-variadic expressions to be handled
similarly to non-variadic expressions.
Reviewed by: jmorse
Differential Revision: https://reviews.llvm.org/D133926
This patch fixes an error in commit e10e9363 in which the
added documentation contained an incorrectly-styled underline
for the title "Debug Instruction Reference Operands".
This patch makes two notable changes to the MIR debug info representation,
which result in different MIR output but identical final DWARF output (NFC
w.r.t. the full compilation). The two changes are:
* The introduction of a new MachineOperand type, MO_DbgInstrRef, which
consists of two unsigned numbers that are used to index an instruction
and an output operand within that instruction, having a meaning
identical to first two operands of the current DBG_INSTR_REF
instruction. This operand is only used in DBG_INSTR_REF (see below).
* A change in syntax for the DBG_INSTR_REF instruction, shuffling the
operands to make it resemble DBG_VALUE_LIST instead of DBG_VALUE,
and replacing the first two operands with a single MO_DbgInstrRef-type
operand.
This patch is the first of a set that will allow DBG_INSTR_REF
instructions to refer to multiple machine locations in the same manner
as DBG_VALUE_LIST.
Reviewed By: jmorse
Differential Revision: https://reviews.llvm.org/D129372
Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires
uniform work group size (i.e. each dimension of global size is a multiple of
corresponding dimension of work group size). hipExtModuleLaunchKernel allows to
launch HIP kernel with non-uniform workgroup size, which makes it necessary for
runtime to check and enforce uniform workgroup size if kernel requires it. To
let runtime be able to enforce that, this metadata is needed to indicate that
the kernel requires uniform workgroup size.
Reviewed By: kzhuravl, arsenm
Differential Revision: https://reviews.llvm.org/D141012
While "skip measurements mode" is super useful for test coverage,
i've come to discover it's trade-offs. It still calls back-end
to actually codegen the target assembly, and that is what is taking
80%+ of the time regardless of whether or not we skip the measurements.
On the other hand, just being able to see that exegesis can come up
with a snippet to measure something, is already very useful,
and takes maybe a second for a all-opcode sweep.
Reviewed By: gchatelet
Differential Revision: https://reviews.llvm.org/D140702
Use deduction guides instead of helper functions.
The only non-automatic changes have been:
1. ArrayRef(some_uint8_pointer, 0) needs to be changed into ArrayRef(some_uint8_pointer, (size_t)0) to avoid an ambiguous call with ArrayRef((uint8_t*), (uint8_t*))
2. CVSymbol sym(makeArrayRef(symStorage)); needed to be rewritten as CVSymbol sym{ArrayRef(symStorage)}; otherwise the compiler is confused and thinks we have a (bad) function prototype. There was a few similar situation across the codebase.
3. ADL doesn't seem to work the same for deduction-guides and functions, so at some point the llvm namespace must be explicitly stated.
4. The "reference mode" of makeArrayRef(ArrayRef<T> &) that acts as no-op is not supported (a constructor cannot achieve that).
Per reviewers' comment, some useless makeArrayRef have been removed in the process.
This is a follow-up to https://reviews.llvm.org/D140896 that introduced
the deduction guides.
Differential Revision: https://reviews.llvm.org/D140955
By default, all benchmark results are analysed, but sometimes it may be useful
to only look at those that to not involve memory, or vice versa. This option
allows to either keep all benchmarks, or filter out (ignore) either all the
ones that do involve memory (involve instructions that may read or write to
memory), or the opposite, to only keep such benchmarks.
Personally, so far i have found the benchmarks that do involve memory
to have dubious results. But the ones that do not involve memory,
are generally actionable. So i would like to have a toggle to declutter results.
Reviewed By: courbet
Differential Revision: https://reviews.llvm.org/D140734
The patch also adds expandVPCTLZ and expandVPCTTZ to expand vp.ctlz/cttz nodes
and the cost model of vp.ctlz/cttz.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D140370
1. Minor editorial corrections.
2. Allow different call frames to be associated with different target
architectures in a single thread.
Reviewed By: scott.linder
Differential Revision: https://reviews.llvm.org/D140646
This patch fixes two incorrect descriptions in TestingGuide.rst.
1. test/lit.site.cfg --> test/lit.site.cfg.py
After https://reviews.llvm.org/D37838 , the `test/lit.site.cfg` had been added a .py extension.
So it should be `test/lit.site.cfg.py`.
2. $(LLVM_OBJ_ROOT)/$(BuildMode)/bin --> $(LLVM_OBJ_ROOT)/bin
The current build system doesn't create a $(BuildMode) directory any more.
So it should be removed.
Reviewed By: mehdi_amini, MaskRay
Differential Revision: https://reviews.llvm.org/D140780
This patch enables Function Specialization by default at all
optimization levels except Os, Oz.
Compilation Time Overhead:
--------------------------
Measured the Instruction Count increase (Geomean) for CTMark from
the llvm-testsuite as in https://llvm-compile-time-tracker.com.
* {-O3, Non-LTO}: +0.136% Instruction Count
* {-O3, LTO}: +0.346% Instruction Count
Performance Uplift:
-------------------
Measured +9.121% score increase for 505.mcf_r from SPEC Int 2017
(Tested on Neoverse N1 with -O3 + LTO)
Correctness Testing:
--------------------
* Passes bootstrap Clang with ASAN + LTO + FuncSpec aggressive options:
{ MaxClonesThreshold=10,
SmallFunctionThreshold=10,
AvgLoopIterationCount=30,
SpecializeOnAddresses=true,
EnableSpecializationForLiteralConstant=true,
FuncSpecializationMaxIters=10 }
* Builds Chromium and passes its unittests with the above options + ThinLTO.
For more info please refer to
https://discourse.llvm.org/t/rfc-should-we-enable-function-specialization/61518
Differential Revision: https://reviews.llvm.org/D140210
As topic, the variable to specify the python executable now should be this.
This is probably something that was left out in D78762.
Reviewed By: JDevlieghere
Differential Revision: https://reviews.llvm.org/D140652
As there have been a couple of questions about this recently, this
gives a hard timeline on typed pointers support.
Given that we are about a month away from LLVM 16 branching, I think
we should retain best-effort typed pointer support in LLVM 16 even
if we get all tests migrated before that point.
Conversely, regardless of what the actual test migration state will
be at that point, I believe we should un-support typed pointers as
a matter of policy immediately after branching. Once release/16.x
has been branched, typed pointers on main will no longer be
supported (and can be actively broken). We only need to keep
not-yet-migrated tests working, if there are any left at that point.
Differential Revision: https://reviews.llvm.org/D140487
Add `nooutline` + update LangRef to say it exists.
This makes it possible to say "don't outline from this function ever."
We want to be able to toggle whether or not a function should be in the search
set regardless of default behaviour.
Add testcases for the IR Outliner + Machine Outliner.
Also remove an unnecessary check for an empty function in the Machine Outliner.
Differential Revision: https://reviews.llvm.org/D140438
This is an alternative way of D139627 suggested by Craig. Creently only X86 backend uses this attribute. Let's just emit for X86 only.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139701
Target-extension types represent types that need to be preserved through
optimization, but otherwise are not introspectable by target-independent
optimizations. This patch doesn't add any uses of these types by an existing
backend, it only provides basic infrastructure such that these types would work
correctly.
Reviewed By: nikic, barannikov88
Differential Revision: https://reviews.llvm.org/D135202
Summary of changes:
- Enable register tuples with 9, 10, 11 and 12 registers (https://reviews.llvm.org/D138205).
- Small improvements and clarifications.
- Correct typos.
Uniformity analysis is a generalization of divergence analysis to
include irreducible control flow:
1. The proposed spec presents a notion of "maximal convergence" that
captures the existing convention of converging threads at the
headers of natual loops.
2. Maximal convergence is then extended to irreducible cycles. The
identity of irreducible cycles is determined by the choices made
in a depth-first traversal of the control flow graph. Uniformity
analysis uses criteria that depend only on closed paths and not
cycles, to determine maximal convergence. This makes it a
conservative analysis that is independent of the effect of DFS on
CycleInfo.
3. The analysis is implemented as a template that can be
instantiated for both LLVM IR and Machine IR.
Validation:
- passes existing tests for divergence analysis
- passes new tests with irreducible control flow
- passes equivalent tests in MIR and GMIR
Based on concepts originally outlined by
Nicolai Haehnle <nicolai.haehnle@amd.com>
With contributions from Ruiling Song <ruiling.song@amd.com> and
Jay Foad <jay.foad@amd.com>.
Support for GMIR and lit tests for GMIR/MIR added by
Yashwant Singh <yashwant.singh@amd.com>.
Differential Revision: https://reviews.llvm.org/D130746
Address the inconsistency between FLT_ROUNDS_ and SET_ROUNDING SDAG
node. Rename FLT_ROUNDS_ to GET_ROUNDING and add llvm.get.rounding
intrinsic to replace flt.rounds.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D139507
This operand bundle on an assume informs alias analysis that the
arguments point to regions of memory that were allocated separately
(i.e. different heap allocations, different allocas, or different
globals).
As a safety measure, we leave the analysis flag-disabled by default.
Reviewed By: nikic
Differential Revision: https://reviews.llvm.org/D136514
Always read bitcode according to the -opaque-pointers mode. Do not
perform auto-detection to implicitly switch to typed pointers.
This is a step towards removing typed pointer support, and also
eliminates the class of problems where linking may fail if a typed
pointer module is loaded before an opaque pointer module. (The
latest place where this was encountered is D139924, but this has
previously been fixed in other places doing bitcode linking as well.)
Differential Revision: https://reviews.llvm.org/D139940
The patch also adds expandVPCTPOP in TargetLowering to expand VP_CTPOP nodes.
Reviewed By: craig.topper
Differential Revision: https://reviews.llvm.org/D139920