Commit Graph

4610 Commits

Author SHA1 Message Date
LiqinWeng
c3edeaa61b [Test] Rename the test function name suffix. NFC (#114504) 2024-11-01 13:49:34 +08:00
Manish Kausik H
0856592f6f Ensure collectTransitivePredecessors returns Pred only from the Loop. (#113831)
It's possible that we encounter Irreducible control flow, due to which,
we may find that a few predecessors of BB are not a part of the CurLoop.
Currently we crash in the function for such cases. This patch ensures
that we only return Predecessors that are a part of CurLoop and
gracefully ignore other Predecessors.

For example, consider Irreducible IR of this form:
```
define i64 @baz() {
bb:
  br label %bb1

bb1:                                              ; preds = %bb3, %bb
  br label %bb3

bb2:                                              ; No predecessors!
  br label %bb3

bb3:                                              ; preds = %bb2, %bb1
  %load = load ptr addrspace(1), ptr addrspace(1) null, align 8
  br label %bb1
}
```

This crashes when `collectTransitivePredecessors` is called on the
`%bb1<Header>, %bb3<latch>` loop, because the loop body has a
predecessor `%bb2` which is not a part of the loop.

See https://godbolt.org/z/E9fM1q3cT for the crash
2024-10-31 11:08:15 -07:00
Luke Lau
9c7188871c [RISCV] Cost ordered bf16/f16 w/ zvfhmin reductions as invalid (#114250)
In #111000 we removed promotion of fadd/fmul reductions for bf16 and f16
without zvfh, and marked the cost as invalid to prevent the vectorizers
from emitting them. However it inadvertently didn't change the cost for
ordered reductions, so this moves the check earlier to fix this.

This also uses BasicTTIImpl instead which now assigns a valid but
expensive cost for fixed-length vectors, which reflects how codegen will
actually scalarize them.
2024-10-31 23:36:09 +08:00
Luke Lau
e989e31a47 [RISCV] Mark f16/bf16 lrint and llrint cost as invalid (#113924)
We currently can't lower scalable vector lrint and llrint nodes for bf16
and f16, even with zvfh, and will crash.

Mark the cost as invalid for now to prevent the vectorizers from
emitting them.

Note that we can actually lower fixed-length vectors fine by scalarizing
them, but we were still undercosting these too so I've also included
them. I presume there's an opportunity to improve the codegen later on.
2024-10-30 17:21:18 +02:00
David Sherwood
7f498a865f [CostModel][LoopVectorize] Move some loop vectoriser tests (#113702)
Many tests that were in test/Analysis/CostModel were actually
loop vectoriser tests. I've moved them as follows:

Analysis/CostModel/X86 -> Transforms/LoopVectorize/X86/CostModel
Analysis/CostModel/AArch64/arith-fp-frem.ll ->
  Transforms/LoopVectorize/AArch64/arith-fp-frem-costs.ll
2024-10-30 13:50:02 +00:00
Luke Lau
8b55162e19 [RISCV] Add cost model tests for scalable FP reductions. NFC
There are already some in reduce-scalable-fp.ll but this makes it a
bit easier to see the difference alongside their fixed-length
counterparts.
2024-10-29 23:58:06 +02:00
Fangrui Song
318bdd0aeb [StackSafetyAnalysis] Bail out when calling ifunc
An assertion failure arises when a call instruction calls a GlobalIFunc.
Since we cannot reason about the underlying function, just bail out.

Fix #87923

Pull Request: https://github.com/llvm/llvm-project/pull/113841
2024-10-29 09:26:47 -07:00
Luke Lau
40363d506d [RISCV] Add cost model tests for fp rounding ops for bf16. NFC 2024-10-28 14:59:06 +00:00
Yingwei Zheng
f78610af3f [InstCombine] Add function attribute instcombine-no-verify-fixpoint (#113822)
This patch introduces a function attribute
`instcombine-no-verify-fixpoint` to avoids disabling fix-point
verification for unrelated tests in the same file.
Address comment
https://github.com/llvm/llvm-project/pull/112642#discussion_r1804714387.
2024-10-28 17:45:08 +08:00
Kyungwoo Lee
0dd9fdcf83 [StructuralHash] Support Differences (#112638)
This computes a structural hash while allowing for selective ignoring of
certain operands based on a custom function that is provided. Instead of
a single hash value, it now returns FunctionHashInfo which includes a
hash value, an instruction mapping, and a map to track the operand
location and its corresponding hash value that is ignored.

Depends on https://github.com/llvm/llvm-project/pull/112621.
This is a patch for
https://discourse.llvm.org/t/rfc-global-function-merging/82608.
2024-10-26 20:02:05 -07:00
Paul Walker
5bb34803a4 [NFC] Migrate tests to use autoupdate for CHECK lines. 2024-10-22 12:55:15 +00:00
Ramkumar Ramachandra
d897ea37db LAA: check nusw on GEP in place of inbounds (#112223)
With the introduction of the nusw flag in GEPNoWrapFlags, it should be
safe to weaken the check in LoopAccessAnalysis to just check the nusw
flag on the GEP, instead of inbounds.
2024-10-22 09:58:54 +01:00
Ramkumar Ramachandra
f719cfa868 LAA: be less conservative in isNoWrap (#112553)
isNoWrap has exactly one caller which handles Assume = true separately,
but too conservatively. Instead, pass Assume to isNoWrap, so it is
threaded into getPtrStride, which has the correct handling for the
Assume flag. Also note that the Stride == 1 check in isNoWrap is
incorrect: getPtrStride returns Strides == 1 or -1, except when
isNoWrapAddRec or Assume are true, assuming ShouldCheckWrap is true; we
can include the case of -1 Stride, and when isNoWrapAddRec is true. With
this change, passing Assume = true to getPtrStride could return a
non-unit stride, and we correctly handle that case as well.
2024-10-22 09:55:51 +01:00
Han-Kuan Chen
12bcea3292 [RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#111459)
reference: https://github.com/llvm/llvm-project/pull/110457
2024-10-18 20:16:56 +07:00
Yingwei Zheng
095d49da76 [InstCombine] Set samesign when converting signed predicates into unsigned (#112642)
Alive2: https://alive2.llvm.org/ce/z/6cqdt-
2024-10-17 20:43:48 +08:00
Elvis Wang
566012a64e [RISCV][TTI] Implement instruction cost for vp_merge. (#112327)
This patch implement the instruction for `vp_merge`, which will generate
similar instruction sequence to the `select` instruction.
2024-10-17 07:47:43 +08:00
Luke Lau
2b6b7f664d [RISCV] Mark math functions as expanded for zvfhmin/zvfbfmin (#112508)
For regular floating point types we mark these as expanded on scalable
vectors so they're not legal in the cost model, so this does the same
for f16 w/ zvfhmin and bf16.
2024-10-16 21:40:37 +01:00
Luke Lau
e88bcc1204 [RISCV] Lower vector_splice on zvfhmin/zvfbfmin (#112579)
Similar to other permutation ops, we can just reuse the existing
lowering.
2024-10-16 21:40:18 +01:00
Luke Lau
1d40fefb08 [RISCV] Add zvfhmin/zvfbfmin cost model tests for libcall ops. NFC 2024-10-16 10:09:34 +01:00
Elvis Wang
f3648046ec [RISCV] Fix vp-intrinsics args in cost model tests. NFC (#112463)
This patch contains following changes to fix vp intrinsics tests.
1. v\*float -> v\*f32, v\*double -> v\*f64 and v\*half -> v\*f16
2. Fix the order of the vp-intrinsics.
2024-10-16 12:57:43 +08:00
Luke Lau
4c894730a1 [RISCV] Fix bf16 cost model tests. NFC
These were inadvertently changed in #112393
2024-10-15 23:01:53 +01:00
Florian Hahn
7f06d8afb0 [SCEV] Retain SCEVSequentialMinMaxExpr if an operand may trigger UB. (#110824)
Retain SCEVSequentialMinMaxExpr if an operand may trigger UB, e.g. if
there is an UDiv operand that may divide by 0 or poison

PR: https://github.com/llvm/llvm-project/pull/110824
2024-10-14 13:08:49 +01:00
Tim Renouf
76007138f4 [LLVM] New NoDivergenceSource function attribute (#111832)
A call to a function that has this attribute is not a source of
divergence, as used by UniformityAnalysis. That allows a front-end to
use known-name calls as an instruction extension mechanism (e.g.
https://github.com/GPUOpen-Drivers/llvm-dialects ) without such a call
being a source of divergence.
2024-10-12 09:34:45 +01:00
Luke Lau
a3cd269fbe [RISCV] Remove {s,u}int_to_fp custom op action for f16/bf16 (#111471)
It turns out that {s,u}int_to_fp nodes get their operation action from
their operand's type, not the result type, so we don't need to set it
for fp16 or bf16. vp_{s,u}int_to_fp uses the result type though so we
need to keep it.

This also means that we can lower int_to_fp for fixed length bf16
vectors already, so this adds tests for that.

The cost model test changes are due to BasicTTIImpl's getCastInstrCost
not taking into account that int_to_fp needs its legal type swapped.
This can be fixed in a later patch, but its worth noting that the
affected types in the tests currently crash when lowered anyway (due to
them needing split at LMUL > 8)
2024-10-10 14:40:24 +01:00
Philip Reames
f11568bcb0 Revert "[RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457)"
This reverts commit 554eaec639.  Change was not approved when landed.
2024-10-07 11:31:57 -07:00
Luke Lau
20864d2cf6 [ValueTypes][RISCV] Add v1bf16 type (#111112)
When trying to add RISC-V fadd reduction cost model tests for bf16, I
noticed a crash when the vector was of <1 x bfloat>.

It turns out that this was being scalarized because unlike f16/f32/f64,
there's no v1bf16 value type, and the existing cost model code assumed
that the legalized type would always be a vector.

This adds v1bf16 to bring bf16 in line with the other fp types.

It also adds some more RISC-V bf16 reduction tests which previously
crashed, including tests to ensure that SLP won't emit fadd/fmul
reductions for bf16 or f16 w/ zvfhmin after #111000.
2024-10-06 22:20:51 +08:00
Han-Kuan Chen
554eaec639 [RISCV][TTI] Recognize CONCAT_VECTORS if a shufflevector mask is multiple insert subvector. (#110457) 2024-10-05 14:58:44 +08:00
Florian Hahn
dec4cfdb09 [LAA] Use loop guards when checking invariant accesses.
Apply loop guards to start and end pointers like done in other places to
improve results.
2024-10-04 12:23:13 +01:00
Florian Hahn
972353fdfa [LAA] Add tests where results can be improved using loop guards. 2024-10-04 11:26:16 +01:00
Luke Lau
3b0e120336 [RISCV] Add tests for @llvm.vector.reduce.fmul. NFC 2024-10-04 14:27:45 +08:00
RolandF77
06c8210a67 update P7 32-bit partial vector load cost (#108261)
Update cost model to reflect codegen change to use lfiwzx 
for 32-bit partial vector loads on pwr7 with
https://github.com/llvm/llvm-project/pull/104507.
2024-10-03 12:28:43 -04:00
Luke Lau
487686b82e [SDAG][RISCV] Don't promote VP_REDUCE_{FADD,FMUL} (#111000)
In https://reviews.llvm.org/D153848, promotion was added for a variety
of f16 ops with zvfhmin, including VP reductions.

However I don't believe it's correct to promote f16 fadd or fmul
reductions to f32 since we need to round the intermediate results.

Today if we lower @llvm.vp.reduce.fadd.nxv1f16 on RISC-V, we'll get two
different results depending on whether we compiled with +zvfh or
+zvfhmin, for example with a 3 element reduction:

	; v9 = [0.1563, 5.97e-8, 0.00006104]

	; zvfh
	vsetivli x0, 3, e16, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v9, v8
	vfmv.f.s fa0, v8
	; fa0 = 0.1563

	; zvfhmin
	vsetivli x0, 3, e16, m1, ta, ma
	vfwcvt.f.f.v v10, v9
	vsetivli x0, 3, e32, m1, ta, ma
	vmv.v.i v8, 0
	vfredosum.vs v8, v10, v8
	vfmv.f.s fa0, v8
	fcvt.h.s fa0, fa0
	; fa0 = 0.1564

This same thing happens with reassociative reductions e.g. vfredusum.vs,
and this also applies for bf16.

I couldn't find anything in the LangRef for reductions that suggest the
excess precision is allowed. There may be something we can do in Clang
with -fexcess-precision=fast, but I haven't looked into this yet.

I presume the same precision issue occurs with fmul, but not with
fmin/fmax/fminimum/fmaximum.

I can't think of another way of lowering these other than scalarizing,
and we can't scalarize scalable vectors, so this just removes the
promotion and adjusts the cost model to return an invalid cost. (It
looks like we also don't currently cost fmul reductions, so presumably
they also have an invalid cost?)

I think this should be enough to stop the loop vectorizer or SLP from
emitting these intrinsics.
2024-10-04 00:17:45 +08:00
Florian Hahn
dce5bf8efc [ValueTracking] AllowEphemerals for alignment assumptions. (#108632)
Allow AllowEphemerals in isValidAssumeForContext, as the CxtI might
be the producer of the pointer in the bundle. At the moment, align
assumptions aren't optimized away.

This allows using the assumption in the computeKnownBits call in
getConstantMultipleImpl.

We could extend the computeKnownBits API to allow callers to specify if
ephemerals are allowed, if the info from computeKnownBitsFromContext is
used to remove alignment assumptions.

PR: https://github.com/llvm/llvm-project/pull/108632
2024-10-03 16:02:34 +01:00
Florian Hahn
bdd40e39a4 [SCEV] Add tests for umin_seq change in #92177
SCEV-only tests for https://github.com/llvm/llvm-project/pull/92177
2024-10-02 11:06:00 +01:00
Nikita Popov
9f3d1695eb [SCEVExpander] Preserve gep nuw during expansion (#102133)
When expanding SCEV adds to geps, transfer the nuw flag to the resulting
gep. (Note that this doesn't apply to IV increment GEPs, which go
through a different code path.)
2024-10-02 11:45:00 +02:00
Florian Hahn
383a67042a [SCEV] Add early exit tests with alignment assumptions.
Precommit tests from https://github.com/llvm/llvm-project/pull/108632.
2024-10-02 10:30:04 +01:00
Ramkumar Ramachandra
7eea55fd4b LoopLoadElim: re-org tests after invalid #96656 (#97598)
After pr96656.ll were added to LAA and LoopVersioning, it was decided
that the bug is in a caller of LoopVersioning, not in LAA or
LoopVersioning itself. The new candidate was LoopLoadElim, but #96656
has since been marked invalid. Hence, re-organize the added tests to
avoid confusion, and the testcase from the investigation to
LoopLoadElim.
2024-09-30 15:46:34 +01:00
Florian Hahn
2f7ccaf4a8 [SCEV] Add predicate in SolveLinEq to ensure B is a multiple of A. (#108777)
This can help in cases where pointer alignment info is missing, e.g.
https://github.com/llvm/llvm-project/pull/108210

The predicate is formed for the complex expression that's passed to
SolveLinEquationWithOverflow and the checks could probably be pushed
closer to the root nodes, which in some cases may be cheaper to check.


PR: https://github.com/llvm/llvm-project/pull/108777
2024-09-28 14:19:57 +01:00
Florian Hahn
ac946e615c [SCEV] Re-organize tests requiring remainder predicates.
Also adds additional test coverage in
Analysis/ScalarEvolution/trip-count-urem.ll

Extra test coverage is for https://github.com/llvm/llvm-project/pull/108777.
2024-09-27 21:03:52 +01:00
Philip Reames
50afafbf29 [RISCV][TTI] Adjust constant materialization cost for (z/s)ext from i1 (#110282)
When we're lowering to a split sequence, we only need one
materialization of the zero constant. Our codegen looks something like
this:

  vmv.v.i	v24, 0
  vmerge.vim	v8, v24, -1, v0
  vmv1r.v	v0, v16
  vmerge.vim	v16, v24, -1, v0

Note: Doing this specific case since it was pointed out in
https://github.com/llvm/llvm-project/pull/110164#discussion_r1778268391,
but it's worth noting that we have the same basic problem (over costing
split operations with split invariant terms) at multiple places through
this file.
2024-09-27 10:53:45 -07:00
Philip Reames
1a9569c4f0 [RISCV][TTI] Avoid an infinite recursion issue in getCastInstrCost (#110164)
Calling into BasicTTI is not always safe. In particular, BasicTTI does
not have a full legalization implementation (vector widening is
missing), and falls back on scalarization. The problem is that
scalarization for <N x i1> vectors is cost in terms of the cast API and
we can end up in an infinite recursive cycle.

The "right" fix for this would be teach BasicTTI how to model the full
legalization state machine, but several attempts at doing so have
resulted in dead ends or undesirable cost changes for targets I don't
understand.

This patch instead papers over the issue by avoiding the call to the
base class when dealing with an i1 source or dest. This doesn't
necessarily produce correct costs, but it should at least return
something semi-sensible and not crash.

Fixes https://github.com/llvm/llvm-project/issues/108708
2024-09-27 07:47:09 -07:00
sstipano
eb16acedf5 [AMDGPU] Overload resource descriptor in image intrinsics. (#107255) 2024-09-27 15:33:52 +02:00
Ramkumar Ramachandra
3fee3e83a8 KnownBits: refine srem for high-bits (#109121)
KnownBits::srem does not correctly set the leader zero-bits, omitting
the fact that LHS may be known-negative or known-non-negative. Fix this.

Alive2 proof: https://alive2.llvm.org/ce/z/Ugh-Dq
2024-09-27 12:00:50 +01:00
Ramkumar Ramachandra
d781df2006 ValueTracking/test: cover known-high-bits of rem (#109006)
There is an underlying bug in KnownBits, and we should theoretically be
able to determine the high-bits of an srem as shown in the test, just
like urem. In preparation to fix this bug, add pre-commit tests testing
high-bits of srem and urem.
2024-09-26 16:08:51 +01:00
Florian Hahn
28439a19c1 [SCEV] Add tests with non-power-of-2 steps for #108777.
Adds extra tests for https://github.com/llvm/llvm-project/pull/108777.
2024-09-26 12:57:04 +01:00
jofrn
3e65c30eee [Lint][AMDGPU] No store to const addrspace (#109181)
Ensure store to const addrspace is not allowed by Linter.
2024-09-25 19:18:17 -04:00
Mircea Trofin
c8365feed7 [ctx_prof] Simple ICP criteria during module inliner (#109881)
This is mostly for test: under contextual profiling, we perform ICP for those indirect callsites which have targets marked as `alwaysinline`.

This helped uncover a bug with the way the profile was updated upon ICP, where we were skipping over the update if the target wasn't called in that context. That was resulting in incorrect counts for the indirect BB.

Also flyby fix to the total/direct count values, they should be 64-bit (as all counters are in the contextual profile)
2024-09-25 15:05:52 -07:00
Philip Reames
d288574363 [TTI][RISCV] Model cost of loading constants arms of selects and compares (#109824)
This follows in the spirit of 7d82c99403,
and extends the costing API for compares and selects to provide
information about the operands passed in an analogous manner. This
allows us to model the cost of materializing the vector constant, as
some select-of-constants are significantly more expensive than others
when you account for the cost of materializing the constants involved.

This is a stepping stone towards fixing
https://github.com/llvm/llvm-project/issues/109466. A separate SLP patch
will be required to utilize the new API.
2024-09-25 07:25:57 -07:00
Luke Lau
f43ad88ae1 [RISCV] Handle zvfhmin and zvfbfmin promotion to f32 in half arith costs (#108361)
Arithmetic half or bfloat ops on zvfhmin and zvfbfmin respectively will
be promoted and carried out in f32, so this updates
getArithmeticInstrCost to check for this.
2024-09-25 18:50:16 +08:00
Philip Reames
bcbdf7ad6b [RISCV][TTI/SLP] Add test coverage for select of constants costing
Provides coverage for an upcoming change which accounts for the cost
of materializing the vector constants in the vector select.
2024-09-24 08:15:40 -07:00