This commit moves the logb and ilogb builtins to the CLC library.
It simultaneously optimizes them both for vector types and for half
types. Vector types were being scalarized in some cases. Half types were
previously promoting to float, whereas this commit provides them a
native implementation.
Everything passes the OpenCL-CTS.
I had to intuit some magic numbers used by these implementations in
order to generate the half variants. I gave them clearer definitions
derived from what I believe are their actual component numbers, but
named them 'magic' to convey that they weren't derived from first
principles.
This commit also refactors how geometric builtins are defined and
declared, by sharing more helpers. It also removes an unnecessary
gentype-like helper in favour of the more complete math/gentype.inc.
There are no changes to the IR for any of these four builtins.
The 'normalize' builtin will follow in a subsequent commit because it
would involve the addition of missing halfn-type overloads for
completeness.
When -use-constant-{int,fp}-for-fixed-length-splat are enabled, constant
vector splats take the form of ConstantInt/FP instead of ConstantVector.
These constants get linked to MachineInstrs via constant pools for later
processing. The processing assumes ConstantInt/FP to always represent
scalar constants with this PR extending the code to support vector
types.
NOTE: The test choices are somewhat artificial because pretty much all
the vector tests failed without these changes when the new constants are
enabled.
---------
Co-authored-by: Matt Arsenault <arsenm2@gmail.com>
We cannot consume annotation tokens with ConsumeToken(), so any pragmas
present in an invalid initializer would previously crash. Now we handle
annotation tokens more generally and avoid the crash.
Fixes#113722
GlobalISel was previously inefficient in handling bitreverses of vector
types. This deals with i16, i32, i64 vector types and converts them into
i8 bitreverses and rev instructions.
This patch fixes several typographical errors in comments and test
files:
1. Corrected "achive" to "archive" in archive-update.test.
2. Fixed "achive" to "achieve" in a comment in
XeGPUSubgroupDistribute.cpp.
3. Corrected "achived" to "achieved" in a test note in
SimpleSIVNoValidityCheckFixedSize.ll.
These changes are non-functional and intended to improve readability and
documentation accuracy.
Signed-off-by: Kane Wang <wangqiang1@kylinos.cn>
Co-authored-by: Kane Wang <wangqiang1@kylinos.cn>
When a read(first)lane is used on a binary operator and the intrinsic is
the only user of the operator, we can move the read(first)lane into the
operand if the other operand is uniform.
Unfortunately IC doesn't let us access UniformityAnalysis and thus we
can't truly check uniformity, we have to do with a basic uniformity
check which only allows constants or trivially uniform intrinsics calls.
We can also do the same for unary and cast operators.
Step 2 tells you to checkout "llvm-test-suite" to "test-suite", but I
don't see a particular reason to use a non-default path.
If you're following the instructions exactly, it all works, but if you
autopilot that step it is surprising later when things do not work.
It's not hard for an individual to fix later, but we should suggest the
least surprising thing where we can.
Give every leaf register a unique regunit, even if it has ad hoc
aliases.
Previously only leaf registers *without* ad hoc aliases would get a
unique regunit, but that caused situations where regunits could not be
used to distinguish a register from its subregs. For example:
- Registers A and B alias. They both get regunit 0 only.
- Register C has subregs A and B. It inherits regunits from its subregs,
so it also gets regunit 0 only.
After this fix, registers A and B will get a unique regunit in addition
to the regunit representing the alias, for example:
- A will get regunits 0 and 1.
- B will get regunits 0 and 2.
- C will get regunits 0, 1 and 2.
If we have `icmp eq or(a, shl(b)), 0` then the shift can be removed so
long as it is nuw or nsw. It is still comparing that some bits are
non-zero.
https://alive2.llvm.org/ce/z/nhrBVX.
This is also true of ne, and true for longer or chains.
Fixes buildbot failure
https://lab.llvm.org/buildbot/#/builders/16/builds/18873 originating
from #138448. Normally ignored silently but fails on higher error
levels.
Buildbot errors:
```
/b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc < /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll -mtriple=aarch64-unknown-linux-gnu | /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll # RUN: at line 6
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/FileCheck /b/1/llvm-clang-x86_64-expensive-checks-debian/llvm-project/llvm/test/CodeGen/AArch64/reserveXreg.ll
+ /b/1/llvm-clang-x86_64-expensive-checks-debian/build/bin/llc -mtriple=aarch64-unknown-linux-gnu
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x8' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x16' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
'+reserve-x17' is not a recognized feature for this target (ignoring feature)
```
Three main changes:
- The pass createRequestCWrappersPass is renamed as
createLLVMRequestCWrappersPass
- createOptimizeForTargetPass is now under the LLVM namespace. It’s
unclear why the NVVM namespace was used initially, as all passes in
LLVMIR/Transforms/Passes.h consistently reside in the LLVM namespace.
- DuplicateFunctionEliminationPass is now in the func namespace.
Fixes#136583
Normally the flush argument list would contain a DataRef to some
variable. All DataRefs are handled generically in resolve-names and so
the problem wasn't observed. But when a common block name is specified,
this is not parsed as a DataRef. There was already handling in
resolve-directives for OmpObjectList but not for argument lists. I've
added a visitor for FLUSH which ensures all of the arguments have been
resolved.
The test is there to make sure the compiler doesn't crashed encountering
the unresolved symbol. It shows that we currently deny flushing a common
block. I'm not sure that it is right to restrict common blocks from
flush argument lists, but fixing that can come in a different patch.
This one is fixing an ICE.
This reapplies 067caaa and 382a085 (reverting b35f6e2) with fixes to
issues detected by the address sanitizer (MIs have to be removed from
live intervals before being removed from their parent MBB).
Original commit description below.
AMDGPU scheduler's `PreRARematStage` attempts to increase function
occupancy w.r.t. ArchVGPR usage by rematerializing trivial
ArchVGPR-defining instruction next to their single use. It first
collects all eligible trivially rematerializable instructions in the
function, then sinks them one-by-one while recomputing occupancy in all
affected regions each time to determine if and when it has managed to
increase overall occupancy. If it does, changes are committed to the
scheduler's state; otherwise modifications to the IR are reverted and
the scheduling stage gives up.
In both cases, this scheduling stage currently involves repeated queries
for up-to-date occupancy estimates and some state copying to enable
reversal of sinking decisions when occupancy is revealed not to
increase. The current implementation also does not accurately track
register pressure changes in all regions affected by sinking decisions.
This commit refactors this scheduling stage, improving RP tracking and
splitting the stage into two distinct steps to avoid repeated occupancy
queries and IR/state rollbacks.
- Analysis and collection (`canIncreaseOccupancyOrReduceSpill`). The
number of ArchVGPRs to save to reduce spilling or increase function
occupancy by 1 (when there is no spilling) is computed. Then,
instructions eligible for rematerialization are collected, stopping as
soon as enough have been identified to be able to achieve our goal
(according to slightly optimistic heuristics). If there aren't enough of
such instructions, the scheduling stage stops here.
- Rematerialization (`rematerialize`). Instructions collected in the
first step are rematerialized one-by-one. Now we are able to directly
update the scheduler's state since we have already done the occupancy
analysis and know we won't have to rollback any state. Register
pressures for impacted regions are recomputed only once, as opposed to
at every sinking decision.
In the case where the stage attempted to increase occupancy, and if both
rematerializations alone and rescheduling after were unable to improve
occupancy, then all rematerializations are rollbacked.
The change would also merge *non-equivalent* symbols under some circumstances,
see comment with a reproducer on the PR.
> Fixes a correctness issue for AArch64 when ADRP and LDR instructions are
> outlined in separate sections and sections are fed to ICF for
> deduplication.
>
> See test case (based on
> https://github.com/llvm/llvm-project/issues/129122) for details. All
> rodata.* sections are folded into a single section with ICF. This leads
> to all f2_* function sections getting folded into one (as their
> relocation target symbols g* belong to .rodata.g* sections that have
> already been folded into one). Since relocations still refer original g*
> symbols, we end up creating duplicate GOT entry for all such symbols.
> This PR addresses that by tracking such folded symbols and create one
> GOT entry for all such symbols.
>
> Fixes https://github.com/llvm/llvm-project/issues/129122
>
> Co-authored by: @jyknight
This reverts commit 8389d6fad7.
Here we were initializing & locking a shared_mutex in a thread, while
releasing it in the parent which may/often turned out to be a different
thread (shared_mutex::unlock_shared is undefined behavior if called from
a thread that doesn't hold the lock).
Switch to counter to more simply keep track of number of readers and
simply lock/unlock rather than utilizing reader mutex to verify last
freed (and so requiring this matching thread init/destroy behavior).
method body or default impl must be true empty. Even they contain only
spaces, ``mlir-tblgen`` considers they are non-empty and generates
invalid code lead to segment fault. It's very hard to debug.
```c++
InterfaceMethod<
...
/*methodBody=*/ [{ }], // This must be true empty. Leaving a space here can lead to segment fault which is hard to figure out why
/*defaultImpl=*/ [{
...
}]
```
This PR trim spaces when method body or default implementation of
interface method is not empty. Now ``mlir-tblgen`` generates valid code
even when they contain only spaces.
---------
Co-authored-by: Fung Xie <ftse@nvidia.com>
Co-authored-by: Mehdi Amini <joker.eph@gmail.com>
The MemoryDepChecker::DepCandidates instance in each LoopAccessInfo had multiple names (AccessSets, DepCands, DependentAccesses), which was confusing. This patch renames all references to DepCands for consistency.
Currently the apss tries to combine floating point reductions, without
checking for the correct fast-math flags and it also creates invalid IR
(using llvm.reduce.add for FP types).
For now, just bail out for non-integer types.
PR: https://github.com/llvm/llvm-project/pull/139469
Match icmps of binops where both operands are select with constant arms,
i.e., `icmp pred (select A ? C1 : C2) binop (select B ? C3 : C4), C5`.
Fold such patterns by creating a truth table of the possible four
constant variants, and materialize back the optimal logic from it via
`createLogicFromTable` helper. This also generalizes an existing fold,
which has therefore been dropped.
Proofs: https://alive2.llvm.org/ce/z/NS7Vzu.
Fixes: https://github.com/llvm/llvm-project/issues/138212.
The `DXILResourceImplicitBinding` pass uses the results of
`DXILResourceBindingAnalysis` to assigns register slots to resources
that do not have explicit binding. It replaces all
`llvm.dx.resource.handlefromimplicitbinding` calls with
`llvm.dx.resource.handlefrombinding` using the newly assigned binding.
If a binding cannot be found for a resource, the pass will raise a
diagnostic error. Currently this diagnostic message does not include the
resource name, which will be addressed in a separate task (#137868).
Part 2/2 of #136786Closes#136786