We can stop using a graph representation of the SLP structure and switch
directly to tree by relying on a single user of each tree node. If the
node has multiple uses, other uses must be represented as a separate
gather/buildvector node, which then will be combined with the existing
vectorized node(s) uoon cost estimation/codegen.
This allow to simplify inner structure and turn in some extra
optimizations, which could not be turned on for the nodes with multi
users (reordering, minbitwidth analysis).
AVX512, -O3+LTO
Metric: size..text
results results0 diff
test-suite :: MultiSource/Benchmarks/ASCI_Purple/SMG2000/smg2000.test 253453.00 254253.00 0.3%
test-suite :: External/SPEC/CFP2006/444.namd/444.namd.test 251411.00 252051.00 0.3%
test-suite :: SingleSource/Benchmarks/Misc/oourafft.test 19114.00 19146.00 0.2%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1399200.00 1399520.00 0.0%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1399200.00 1399520.00 0.0%
test-suite :: MicroBenchmarks/LCALS/SubsetALambdaLoops/lcalsALambda.test 304310.00 304326.00 0.0%
test-suite :: MicroBenchmarks/LCALS/SubsetARawLoops/lcalsARaw.test 304662.00 304678.00 0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12566919.00 12567511.00 0.0%
test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 1146300.00 1146316.00 0.0%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 1159864.00 1159880.00 0.0%
test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 9407880.00 9407864.00 -0.0%
test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 9407880.00 9407864.00 -0.0%
test-suite :: MultiSource/Benchmarks/tramp3d-v4/tramp3d-v4.test 1011612.00 1011596.00 -0.0%
test-suite :: MultiSource/Benchmarks/Prolangs-C/TimberWolfMC/timberwolfmc.test 280584.00 280536.00 -0.0%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 93016.00 93000.00 -0.0%
ASCI_Purple/SMG2000 - extra code vectorized, small variations
CFP2006/444.namd - small variations, less shuffles
Benchmarks/Misc/oourafft - small variations
CFP2017rate/538.imagick_r
CFP2017speed/638.imagick_s - small variations, less shuffles
LCALS/SubsetALambdaLoops - less shuffles
LCALS/SubsetARawLoops - less shuffles
CFP2017rate/526.blender_r - small variations, extra vector code
CFP2006/453.povray - small variations
CFP2017rate/511.povray_r - small variations
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - small variations
Benchmarks/tramp3d-v4 - small variations
Prolangs-C/TimberWolfMC - small variations
DOE-ProxyApps-C++/miniFE - extra code vectorized, small variations
DOE-ProxyApps-C++/CLAMR - extra code vectorized, small variations
ASCI_Purple/SMG2000 - no significant changes
RISCV, -O3+LTO
Metric: size..text
results results0 diff
test-suite :: SingleSource/Regression/C/gcc-c-torture/execute/GCC-C-execute-pr28982b.test 1812.00 1866.00 3.0%
test-suite :: MultiSource/Benchmarks/Olden/health/health.test 3946.00 4016.00 1.8%
test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test 513180.00 513550.00 0.1%
test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test 513180.00 513550.00 0.1%
test-suite :: External/SPEC/CINT2017rate/502.gcc_r/502.gcc_r.test 7672198.00 7672202.00 0.0%
test-suite :: External/SPEC/CINT2017speed/602.gcc_s/602.gcc_s.test 7672198.00 7672202.00 0.0%
test-suite :: External/SPEC/CFP2017rate/508.namd_r/508.namd_r.test 746060.00 746044.00 -0.0%
test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 9497716.00 9497364.00 -0.0%
test-suite :: External/SPEC/CFP2006/453.povray/453.povray.test 948266.00 948214.00 -0.0%
test-suite :: External/SPEC/CFP2006/433.milc/433.milc.test 89874.00 89862.00 -0.0%
test-suite :: MultiSource/Benchmarks/7zip/7zip-benchmark.test 835492.00 835346.00 -0.0%
test-suite :: MultiSource/Benchmarks/DOE-ProxyApps-C++/miniFE/miniFE.test 66230.00 66202.00 -0.0%
test-suite :: External/SPEC/CFP2017rate/511.povray_r/511.povray_r.test 946090.00 944206.00 -0.2%
test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test 1136404.00 1131854.00 -0.4%
test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test 1136404.00 1131854.00 -0.4%
gcc-c-torture/execute/GCC-C-execute-pr28982b - better vector code
Olden/health - extra vector code
CINT2017speed/625.x264_s
CINT2017rate/525.x264_r - small variation + improvements in reordering, @pixel_hadamard_ac stopped
being vectorized because of some non-effective shuffle recognition by
the compiler
CINT2017rate/502.gcc_r
CINT2017speed/602.gcc_s - small variations
CFP2017rate/508.namd_r - small variations
CFP2017rate/526.blender_r - small variations
CFP2006/453.povray - extra vector code
Benchmarks/7zip - extra vector code
DOE-ProxyApps-C++/miniFE - small variations
CFP2017rate/511.povray_r - extra vector code
CFP2017speed/638.imagick_s
CFP2017rate/538.imagick_r - extra vector code
Reviewers: RKSimon, hiraditya
Reviewed By: RKSimon
Pull Request: https://github.com/llvm/llvm-project/pull/126771
98 lines
5.8 KiB
LLVM
98 lines
5.8 KiB
LLVM
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
|
|
; RUN: opt -S --passes=slp-vectorizer -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
|
|
|
|
define i1 @test(i64 %v1, ptr %v2, i32 %v3, i1 %v4) {
|
|
; CHECK-LABEL: define i1 @test(
|
|
; CHECK-SAME: i64 [[V1:%.*]], ptr [[V2:%.*]], i32 [[V3:%.*]], i1 [[V4:%.*]]) {
|
|
; CHECK-NEXT: [[NEWFUNCROOT:.*:]]
|
|
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i64> poison, i64 [[V1]], i32 0
|
|
; CHECK-NEXT: [[TMP1:%.*]] = shufflevector <2 x i64> [[TMP0]], <2 x i64> poison, <2 x i32> zeroinitializer
|
|
; CHECK-NEXT: [[TMP2:%.*]] = lshr <2 x i64> [[TMP1]], <i64 32, i64 40>
|
|
; CHECK-NEXT: [[TMP3:%.*]] = trunc <2 x i64> [[TMP2]] to <2 x i8>
|
|
; CHECK-NEXT: [[TMP4:%.*]] = and <2 x i8> [[TMP3]], <i8 1, i8 -1>
|
|
; CHECK-NEXT: [[TMP5:%.*]] = zext <2 x i8> [[TMP4]] to <2 x i32>
|
|
; CHECK-NEXT: [[TMP9:%.*]] = zext <2 x i8> [[TMP4]] to <2 x i32>
|
|
; CHECK-NEXT: [[TMP6:%.*]] = icmp eq <2 x i32> [[TMP9]], zeroinitializer
|
|
; CHECK-NEXT: [[TMP8:%.*]] = insertelement <4 x i32> poison, i32 [[V3]], i32 0
|
|
; CHECK-NEXT: [[TMP30:%.*]] = shufflevector <4 x i32> [[TMP8]], <4 x i32> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 0>
|
|
; CHECK-NEXT: [[TMP10:%.*]] = call <4 x i32> @llvm.vector.insert.v4i32.v2i32(<4 x i32> [[TMP30]], <2 x i32> [[TMP5]], i64 0)
|
|
; CHECK-NEXT: [[TMP11:%.*]] = uitofp <4 x i32> [[TMP10]] to <4 x float>
|
|
; CHECK-NEXT: [[TMP12:%.*]] = fdiv <4 x float> zeroinitializer, [[TMP11]]
|
|
; CHECK-NEXT: [[TMP13:%.*]] = insertelement <4 x i1> poison, i1 [[V4]], i32 0
|
|
; CHECK-NEXT: [[TMP14:%.*]] = shufflevector <4 x i1> [[TMP13]], <4 x i1> poison, <4 x i32> <i32 poison, i32 poison, i32 0, i32 0>
|
|
; CHECK-NEXT: [[TMP15:%.*]] = call <4 x i1> @llvm.vector.insert.v4i1.v2i1(<4 x i1> [[TMP14]], <2 x i1> [[TMP6]], i64 0)
|
|
; CHECK-NEXT: [[TMP16:%.*]] = select <4 x i1> [[TMP15]], <4 x float> zeroinitializer, <4 x float> [[TMP12]]
|
|
; CHECK-NEXT: [[TMP17:%.*]] = extractelement <4 x float> [[TMP16]], i32 3
|
|
; CHECK-NEXT: [[CONV_I_I1743_3:%.*]] = fptoui float [[TMP17]] to i32
|
|
; CHECK-NEXT: [[TMP18:%.*]] = icmp ne i32 [[CONV_I_I1743_3]], 0
|
|
; CHECK-NEXT: [[TMP19:%.*]] = bitcast <4 x float> [[TMP16]] to <4 x i32>
|
|
; CHECK-NEXT: [[TMP20:%.*]] = icmp ult <4 x i32> [[TMP19]], splat (i32 1333788672)
|
|
; CHECK-NEXT: [[TMP21:%.*]] = extractelement <4 x i1> [[TMP20]], i32 3
|
|
; CHECK-NEXT: [[NARROW:%.*]] = select i1 [[TMP21]], i1 [[TMP18]], i1 false
|
|
; CHECK-NEXT: [[TMP22:%.*]] = extractelement <4 x float> [[TMP16]], i32 2
|
|
; CHECK-NEXT: [[CONV_I_I1743_2:%.*]] = fptoui float [[TMP22]] to i32
|
|
; CHECK-NEXT: [[TMP23:%.*]] = extractelement <4 x i1> [[TMP20]], i32 2
|
|
; CHECK-NEXT: [[NARROW1:%.*]] = select i1 [[TMP23]], i32 [[CONV_I_I1743_2]], i32 0
|
|
; CHECK-NEXT: [[TMP24:%.*]] = zext i1 [[NARROW]] to i32
|
|
; CHECK-NEXT: [[TMP25:%.*]] = or i32 [[NARROW1]], [[TMP24]]
|
|
; CHECK-NEXT: [[TMP26:%.*]] = extractelement <4 x float> [[TMP16]], i32 1
|
|
; CHECK-NEXT: [[CONV_I_I1743_1:%.*]] = fptoui float [[TMP26]] to i32
|
|
; CHECK-NEXT: [[TMP27:%.*]] = extractelement <4 x i1> [[TMP20]], i32 1
|
|
; CHECK-NEXT: [[NARROW2:%.*]] = select i1 [[TMP27]], i32 [[CONV_I_I1743_1]], i32 0
|
|
; CHECK-NEXT: [[RV3:%.*]] = or i32 [[TMP25]], [[NARROW2]]
|
|
; CHECK-NEXT: [[TMP28:%.*]] = extractelement <4 x float> [[TMP16]], i32 0
|
|
; CHECK-NEXT: [[CONV_I_I1743:%.*]] = fptoui float [[TMP28]] to i32
|
|
; CHECK-NEXT: [[TMP29:%.*]] = extractelement <4 x i1> [[TMP20]], i32 0
|
|
; CHECK-NEXT: [[NARROW4:%.*]] = select i1 [[TMP29]], i32 [[CONV_I_I1743]], i32 0
|
|
; CHECK-NEXT: [[RT5:%.*]] = or i32 [[RV3]], [[NARROW4]]
|
|
; CHECK-NEXT: [[RT:%.*]] = zext i32 [[RT5]] to i64
|
|
; CHECK-NEXT: store i64 [[RT]], ptr [[V2]], align 1
|
|
; CHECK-NEXT: ret i1 false
|
|
;
|
|
newFuncRoot:
|
|
%conv.i147.i1756.3 = uitofp i32 %v3 to float
|
|
%div.i.i.i1749.3 = fdiv float 0.000000e+00, %conv.i147.i1756.3
|
|
%cond.i.i.i1751.3 = select i1 %v4, float 0.000000e+00, float %div.i.i.i1749.3
|
|
%conv.i147.i1756.2 = uitofp i32 %v3 to float
|
|
%div.i.i.i1749.2 = fdiv float 0.000000e+00, %conv.i147.i1756.2
|
|
%cond.i.i.i1751.2 = select i1 %v4, float 0.000000e+00, float %div.i.i.i1749.2
|
|
%0 = lshr i64 %v1, 40
|
|
%1 = trunc i64 %0 to i32
|
|
%tt2 = and i32 %1, 255
|
|
%cmp1.i.i.i1746.1 = icmp eq i32 %tt2, 0
|
|
%conv.i147.i1756.1 = uitofp i32 %tt2 to float
|
|
%div.i.i.i1749.1 = fdiv float 0.000000e+00, %conv.i147.i1756.1
|
|
%cond.i.i.i1751.1 = select i1 %cmp1.i.i.i1746.1, float 0.000000e+00, float %div.i.i.i1749.1
|
|
%tt3 = lshr i64 %v1, 32
|
|
%2 = trunc i64 %tt3 to i32
|
|
%tt1 = and i32 %2, 1
|
|
%cmp1.i.i.i1746 = icmp eq i32 %tt1, 0
|
|
%conv.i147.i1756 = uitofp i32 %tt1 to float
|
|
%div.i.i.i1749 = fdiv float 0.000000e+00, %conv.i147.i1756
|
|
%cond.i.i.i1751 = select i1 %cmp1.i.i.i1746, float 0.000000e+00, float %div.i.i.i1749
|
|
%3 = bitcast float %cond.i.i.i1751.3 to i32
|
|
%cmp.i99.i1736.3 = icmp ult i32 %3, 1333788672
|
|
%conv.i.i1743.3 = fptoui float %cond.i.i.i1751.3 to i32
|
|
%4 = icmp ne i32 %conv.i.i1743.3, 0
|
|
%narrow = select i1 %cmp.i99.i1736.3, i1 %4, i1 false
|
|
%5 = bitcast float %cond.i.i.i1751.2 to i32
|
|
%cmp.i99.i1736.2 = icmp ult i32 %5, 1333788672
|
|
%conv.i.i1743.2 = fptoui float %cond.i.i.i1751.2 to i32
|
|
%narrow1 = select i1 %cmp.i99.i1736.2, i32 %conv.i.i1743.2, i32 0
|
|
%6 = zext i1 %narrow to i32
|
|
%7 = or i32 %narrow1, %6
|
|
%8 = bitcast float %cond.i.i.i1751.1 to i32
|
|
%cmp.i99.i1736.1 = icmp ult i32 %8, 1333788672
|
|
%conv.i.i1743.1 = fptoui float %cond.i.i.i1751.1 to i32
|
|
%narrow2 = select i1 %cmp.i99.i1736.1, i32 %conv.i.i1743.1, i32 0
|
|
%rv3 = or i32 %7, %narrow2
|
|
%9 = bitcast float %cond.i.i.i1751 to i32
|
|
%cmp.i99.i1736 = icmp ult i32 %9, 1333788672
|
|
%conv.i.i1743 = fptoui float %cond.i.i.i1751 to i32
|
|
%narrow4 = select i1 %cmp.i99.i1736, i32 %conv.i.i1743, i32 0
|
|
%rt5 = or i32 %rv3, %narrow4
|
|
%rt = zext i32 %rt5 to i64
|
|
store i64 %rt, ptr %v2, align 1
|
|
ret i1 false
|
|
}
|