This function is deceptive at best: it doesn't return what you'd expect.
If you have an arbitrary GlobalValue and you want to determine the
alignment of that pointer, Value::getPointerAlignment() returns the
correct value. If you want the actual declared alignment of a function
or variable, GlobalObject::getAlignment() returns that.
This patch switches all the users of GlobalValue::getAlignment to an
appropriate alternative.
Differential Revision: https://reviews.llvm.org/D80368
This was passing in all the parameters needed to construct a
LegalizerHelper in the custom legalization, when it's simpler to just
pass in the existing helper.
This is slightly more annoying to use in the common case where you
don't need the legalizer helper, but we could add back the common
parameters back in addition to the helper.
I didn't propagate this to all the internal target changes that this
logically implies, but did update a sample one for
legalizeMinNumMaxNum.
This is in preparation for moving AMDGPU load/store legalization
entirely into custom lowering. The current set of legalization actions
is really constraining and not really capable of expressing all the
actions needed to legalize loads/stores. In particular there's no way
to express when the memory access itself needs to change size vs. the
result type. There's also a lot of redundancy since the same
split/widen actions need to be applied in both vector and scalar
cases. All of the sub-cases logically belong as steps in the legalizer
helper, but it will be easier to consider everything at once in custom
lowering.
The logic is written for what loads/stores should be selectable. There
are a set of cases that should be selectable, but due to missing MVTs
and/or selection patterns, will fail to select. I think eventually
load/store select patterns should ignore the type and only look at the
value size, but until that happens, bitcast these to equivalent i32
vectors.
This was implicitly assuming the branch instruction was the next after
the pseudo. It's possible for another non-terminator instruction to be
inserted between the intrinsic and the branch, so adjust the insertion
point. Fixes a non-terminator after terminator verifier error (which
without the verifier, manifested itself as an infinite loop in
analyzeBranch much later on).
The baffling thing is this passed the OpenCL conformance test for
32-bit integer divisions, but only failed in the 32-bit path of
BypassSlowDivisions for the 64-bit tests.
This was promoting booleans to i32 to perform a comparison against
them to feed to a select condition. Just use the booleans
directly. This produces the same final code, since the combiner is
unable to undo the mess this creates. I untangled this logic when I
ported this code to GlobalISel, so port the cleanups back.
It was annoying enough that every custom lowering needed to set the
insert point, but this was made worse since now these all needed to be
updated to setInstrAndDebugLoc. Consolidate these so every
legalization action has the right insert position by default.
This should fix dropping debug info in every custom AMDGPU
legalization.
The current set is an incomprehensible mess riddled with ordering
hacks for various limitations in the legalizer at the time of writing,
many of which have been fixed. This takes a very small step in
correcting this.
The core first change is to start checking for fully legal cases
first, rather than trying to figure out all of the actions that could
need to be performed. It's recommended to check the legal cases first
for faster legality checks in the common case. This still has a table
listing some common cases, but it needs measuring whether this really
helps or not.
More significantly, stop trying to allow any arbitrary type with a
legal bitwidth as a legal memory type, and start using the bitcast
legalize action for them. Allowing loads of these weird vector types
produced new burdens we don't need for handling all of the
legalization artifacts. Unlike the SelectionDAG handling, this is
still not casting 64 or 16-bit element vectors to 32-bit
vectors. These cases should still be handled by increasing/decreasing
the number of 16-bit elements. This is primarily to fix 8-bit element
vectors.
Another change is to stop trying to handle the load-widening based on
a higher alignment. We should still do this, but the way it was
handled wasn't really correct. We really need to modify the MMO's size
at the same time, and not just increase the result type. The
LegalizerHelper does not do this, and I think this would really
require a separate WidenMemory action (or to add a memory action
payload to the LegalizeMutation). These will now fail to legalize.
The structure of the legalizer rules makes writing concise rules here
difficult. It would be easier if the same function could answer the
query the query, and report the action to perform at the same
time. Instead these two are split into distinct predicate and action
functions. This is mostly tolerable for other cases, but the
load/store rules get pretty complicated so it's difficult to keep two
versions of these functions in sync.
Tweak a few constant expressions involving numbers::pi etc to avoid
rounding errors. NFCI though it's possible some of these will now be
more accurate in the last bit.
I get confused by a lot of the predicate names here, since I would
assume they apply to vectors as well. Rename to reflect they only
apply to scalars.
Also add a few predicates AMDGPU uses that should be generally useful.
Also add any() to complement all. I've wanted to use this a few times
but then worked around it not being there.
Confusingly, these were unrelated and had different semantics. The
G_PTR_MASK instruction predates the llvm.ptrmask intrinsic, but has a
different format. G_PTR_MASK only allows clearing the low bits of a
pointer, and only a constant number of bits. The ptrmask intrinsic
allows an arbitrary mask. Replace G_PTR_MASK to match the intrinsic.
Only selects the cases that look like the old instruction. More work
is needed to select the general case. Also new legalization code is
still needed to deal with the case where the incoming mask size does
not match the pointer size, which has a specified behavior in the
langref.
Unlike SelectionDAGBuilder, IRTranslator omits the unconditional
branch in fallthrough cases. Confusingly, the control flow pseudos
function in the opposite way the intrinsics are used, and the branch
targets always need to be swapped. We're inverting the target blocks,
so we need to figure out the old fallthrough block and insert a branch
to the original unconditional branch target.
Currently this code exists in widenScalar for G_MERGE_VALUE
sources. I'm not sure if the existing expansion in widenScalar should
be removed or not. The widenScalar variant tries to extend to the
requested size, but this just uses the original bitwidth.
We currently don't have a way to map to the equivalent intrinsic
opcode, so track immediate 0s in place of the address for the
selection to know to change the final opcode.
This reverts commit 9bca8fc4cf.
Rearrange handling to avoid changing the instruction in the case where
it's going to be erased and replaced with undef.
For normal loads, fully eliminate the load. For the TFE case, adjust
the dmask value in the instruction so the selector doesn't need to
handle it. For the TFE special case, I guess it would be possible to
replace the loaded data register with undef, but as-is this will start
treating it as a well defined value.
Trim elements that won't be written. The equivalent still needs to be
done for writes. Also start widening 3 elements to 4
elements. Selection will get the count from the dmask.