Invert the sense of the attribute and let the attributor figure this
out like everything else. If needed we can have the not-OpenCL
languages set amdgpu-no-default-queue and amdgpu-no-completion-action
up front so they never have to pay the cost.
There are also so many of these now, the offset use API should
probably consider all of them at once. Maybe they should merge into
one attribute with used fields. Having separate functions for each
field in AMDGPUBaseInfo is also not the greatest API (might as well
fix this when the patch to get the object version from the module
lands).
This was looking for a specific constant cast of the function, when
the type doesn't matter. Doesn't bother trying to handle typed
pointers, it will just assert.
Things probably don't work completely correctly if the block kernel
address is captured somewhere else, but that wouldn't work before
either. The uses should really be loads out of the handle, and the
handle initializer should contain the kernel address.
In function SITargetLowering::performExtractVectorElt,
the output type was not considered which could lead to type mismatches
later.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D139943
Amdgpu kernel with function attribute "uniform-work-group-size"="true" requires
uniform work group size (i.e. each dimension of global size is a multiple of
corresponding dimension of work group size). hipExtModuleLaunchKernel allows to
launch HIP kernel with non-uniform workgroup size, which makes it necessary for
runtime to check and enforce uniform workgroup size if kernel requires it. To
let runtime be able to enforce that, this metadata is needed to indicate that
the kernel requires uniform workgroup size.
Reviewed By: kzhuravl, arsenm
Differential Revision: https://reviews.llvm.org/D141012
Since the divergence-driven ISel was fully enabled we have more VGPRs available.
MachineScheduler trying to take advantage of that bumps up the occupancy sacrificing
the hiding of memory access latency. This really spoils the initially good schedule.
A new metric that reflects the latency hiding quality of the schedule has been created
to make it to balance between occupancy and latency. The metric is based on the latency
model which computes the bubble to working cycles ratio. Then we use this ratio to decide
if the higher occupancy schedule is profitable as follows:
Profit = NewOccupancy/OldOccupancy * OldMetric/NewMetric
Reviewed By: rampitec
Differential Revision: https://reviews.llvm.org/D139710
This was completely broken with opaque pointers because it was
specifically looking for a constant expression with the global
variable as the first operand. Strip casts like normal, and properly
validate all of the restrictions rather than silently ignoring any
unhandled cases. Also be stricter that we aren't calling into some
unresolved or non-constant format string.
Also converts the test to opaque pointers and generated tests. There's
more broken initializer handling for strings inside the format string
processing too, but there's just no test coverage for this at all.
Add G_BUILD_VECTOR and G_BUILD_VECTOR_TRUNC to the list of opcodes in
`shouldCSEOpc`. This simplifies the code generated for vector splats.
Differential Revision: https://reviews.llvm.org/D140965
src2 was incorrectly defined as VSrc_f16 but it is tied to dst which is VGPR_32. As a result, disassembler failed to decode src2.
Differential Revision: https://reviews.llvm.org/D140299
It turns out we can codegen these on targets without flat addressing,
although the runtime probably didn't put anything useful there. The
proper diagnostic would be to disallow flat pointer uses or languages
with them, not this one edge case. Allows removing one of the special
cases requiring subtarget support in the device libraries.
If the dividend has leading zeros, we can use them to reduce the
size of the multiplier and avoid the fixup cases.
This patch is for scalars only, but we might be able to do this
for vectors in a follow up.
Differential Revision: https://reviews.llvm.org/D140750
Use the readfirstlane hack for the scalar cases as a hack to
combine globalisel and sdag tests. gfx6 stores are a bit broken
in globalisel, and scalar returns are totally broken in sdag.
D72845 implemented the equivalent IR optimization in InstCombine so it
seems that there's no advantage to doing it during isel too.
This partially reverts D72844.
Differential Revision: https://reviews.llvm.org/D140546
Change EmitCopyFromReg to check all users of cloned nodes (as well as
non-cloned nodes) instead of assuming that they all copy the defined
value back to the same physical register.
This partially reverts 968e2e7b3d (svn r62356) which claimed:
CreateVirtualRegisters does trivial copy coalescing. If a node def is
used by a single CopyToReg, it reuses the virtual register assigned to
the CopyToReg. This won't work for SDNode that is a clone or is itself
cloned. Disable this optimization for those nodes or it can end up
with non-SSA machine instructions.
This is true for CreateVirtualRegisters but r62356 also updated
EmitCopyFromReg where it is not true. Firstly EmitCopyFromReg only
coalesces physical register copies, so the concern about SSA form does
not apply. Secondly making the loop over users in EmitCopyFromReg
conditional on `!IsClone && !IsCloned` breaks the handling of cloned
nodes, because it leaves MatchReg set to true by default, so it assumes
that all users will copy the defined value back to the same physical
register instead of actually checking.
Differential Revision: https://reviews.llvm.org/D140417
vectorize-buffer-fat-pointer.ll required a manual check line fix.
vector-alloca-addrspacecast.ll required a manual fixup of a check
line. partial-regcopy-and-spill-missed-at-regalloc.ll required
re-running update_mir_test_checks. The HSA metadata tests required
avoiding the script touching the type name in the metadata.
annotate-noclobber.ll ran into one update script bug. It deleted a
check line with a 0 offset GEP, moving the following -NEXT check
logically up one line.
This was trying to merge 2 32-bit pointers into a 64-bit pointer. The
artifact combiner was assuming merges to pointers use scalar sources,
and ended up inserting invalid bitcast from a pointer to a scalar. It
should probably be a verifier error to have pointer merge sources with
a pointer result.
Fixes verifier errors with EXPENSIVE_CHECKS.
Due to poor support for non-0 null pointers, clang always emits
addrspacecast from a null flat constant for private/local null. We can
trivially handle this case for old hardware.
Should fix issue 55679.
Fixes inconsistent handling of constant-32bit case. Turns out we can
lower all the casts just fine, it's just accessing the flat results
that's a problem.
Previously reverted in 8b446ea2ba
Reapplying because this commit is NOT DEPENDENT on the reverted commit
fc21f2d7ba, which broke the ASAN buildbot.
See https://reviews.llvm.org/rGfc21f2d7bae2e0be630470cc7ca9323ed5859892 for
more information.
The arguments to a PHI may represent a recurrence by eventually using the output
of the PHI itself. This is now handled by checking for cycles in the control
flow. If a PHI is not in a recurrence, it is now able to report multiple offsets
instead of conservatively reporting unknown.
Reviewed By: jdoerfert
Differential Revision: https://reviews.llvm.org/D138991
Currently, the custom SGPR spill lowering pass spills
SGPRs into physical VGPR lanes and the remaining VGPRs
are used by regalloc for vector regclass allocation.
This imposes many restrictions that we ended up with
unsuccessful SGPR spilling when there won't be enough
VGPRs and we are forced to spill the leftover into
memory during PEI. The custom spill handling during PEI
has many edge cases and often breaks the compiler time
to time.
This patch implements spilling SGPRs into virtual VGPR
lanes. Since we now split the register allocation for
SGPRs and VGPRs, the virtual registers introduced for
the spill lanes would get allocated automatically in
the subsequent regalloc invocation for VGPRs.
Spill to virtual registers will always be successful,
even in the high-pressure situations, and hence it avoids
most of the edge cases during PEI. We are now left with
only the custom SGPR spills during PEI for special registers
like the frame pointer which isn an unproblematic case.
This patch also implements the whole wave spills which
might occur if RA spills any live range of virtual registers
involved in the whole wave operations. Earlier, we had
been hand-picking registers for such machine operands.
But now with SGPR spills into virtual VGPR lanes, we are
exposing them to the allocator.
Reviewed By: arsenm
Differential Revision: https://reviews.llvm.org/D124196