Files
clang-p2996/llvm/test/Transforms/SLPVectorizer/X86/same-scalar-in-same-phi-extract.ll
Alexey Bataev 31eaf86a1e [SLP]Improve minbitwidth analysis.
This improves overall analysis for minbitwidth in SLP. It allows to
analyze the trees with store/insertelement root nodes. Also, instead of
using single minbitwidth, detected from the very first analysis stage,
it tries to detect the best one for each trunc/ext subtree in the graph
and use it for the subtree.
Results in better code and less vector register pressure.

Metric: size..text

Program                                                                                                                                                size..text
                                                                                                                                                       results     results0    diff
                                                                      test-suite :: SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant.test    92549.00    92609.00  0.1%
                                                                                  test-suite :: External/SPEC/CINT2017speed/625.x264_s/625.x264_s.test   663381.00   663493.00  0.0%
                                                                                   test-suite :: External/SPEC/CINT2017rate/525.x264_r/525.x264_r.test   663381.00   663493.00  0.0%
                                                                                               test-suite :: MultiSource/Benchmarks/Bullet/bullet.test   307182.00   307214.00  0.0%
                                                                             test-suite :: External/SPEC/CFP2017speed/638.imagick_s/638.imagick_s.test  1394420.00  1394484.00  0.0%
                                                                              test-suite :: External/SPEC/CFP2017rate/538.imagick_r/538.imagick_r.test  1394420.00  1394484.00  0.0%
                                                                                test-suite :: External/SPEC/CFP2017rate/510.parest_r/510.parest_r.test  2040257.00  2040273.00  0.0%

                                                                              test-suite :: External/SPEC/CFP2017rate/526.blender_r/526.blender_r.test 12396098.00 12395858.00 -0.0%
                                                                                         test-suite :: External/SPEC/CINT2006/445.gobmk/445.gobmk.test   909944.00   909768.00 -0.0%

SingleSource/Benchmarks/Adobe-C++/simple_types_loop_invariant - 4 scalar
instructions remain scalar (good).
Spec2017/x264 - the whole function idct4x4dc is vectorized using <16
x i16> instead of <16 x i32>, also zext/trunc are removed. In other
places last vector zext/sext removed and replaced by
extractelement + scalar zext/sext pair.
MultiSource/Benchmarks/Bullet/bullet - reduce or <4 x i32> replaced by
reduce or <4 x i8>
Spec2017/imagick - Removed extra zext from 2 packs of the operations.
Spec2017/parest - Removed extra zext, replaced by extractelement+scalar
zext
Spec2017/blender - the whole bunch of vector zext/sext replaced by
extractelement+scalar zext/sext, some extra code vectorized in smaller
types.
Spec2006/gobmk - fixed cost estimation, some small code remains scalar.

Original Pull Request: https://github.com/llvm/llvm-project/pull/84334

The patch has the same functionality (no test changes, no changes in
benchmarks) as the original patch, just has some compile time
improvements + fixes for xxhash unittest, discovered earlier in the
previous version of the patch.

Reviewers:

Pull Request: https://github.com/llvm/llvm-project/pull/84536
2024-03-19 08:19:45 -07:00

77 lines
2.3 KiB
LLVM

; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 4
; RUN: opt -S --passes=slp-vectorizer -slp-threshold=-99999 -mtriple=x86_64-unknown-linux-gnu < %s | FileCheck %s
define void @test(i32 %arg) {
; CHECK-LABEL: define void @test(
; CHECK-SAME: i32 [[ARG:%.*]]) {
; CHECK-NEXT: bb:
; CHECK-NEXT: [[TMP0:%.*]] = insertelement <2 x i32> <i32 poison, i32 0>, i32 [[ARG]], i32 0
; CHECK-NEXT: br label [[BB2:%.*]]
; CHECK: bb2:
; CHECK-NEXT: switch i32 0, label [[BB10:%.*]] [
; CHECK-NEXT: i32 0, label [[BB9:%.*]]
; CHECK-NEXT: i32 11, label [[BB9]]
; CHECK-NEXT: i32 1, label [[BB4:%.*]]
; CHECK-NEXT: ]
; CHECK: bb3:
; CHECK-NEXT: [[TMP1:%.*]] = extractelement <2 x i32> [[TMP0]], i32 0
; CHECK-NEXT: [[TMP2:%.*]] = zext i32 [[TMP1]] to i64
; CHECK-NEXT: switch i32 0, label [[BB10]] [
; CHECK-NEXT: i32 18, label [[BB7:%.*]]
; CHECK-NEXT: i32 1, label [[BB7]]
; CHECK-NEXT: i32 0, label [[BB10]]
; CHECK-NEXT: ]
; CHECK: bb4:
; CHECK-NEXT: [[TMP3:%.*]] = phi <2 x i32> [ [[TMP0]], [[BB2]] ]
; CHECK-NEXT: [[TMP4:%.*]] = extractelement <2 x i32> [[TMP3]], i32 0
; CHECK-NEXT: [[TMP5:%.*]] = zext i32 [[TMP4]] to i64
; CHECK-NEXT: [[GETELEMENTPTR:%.*]] = getelementptr i32, ptr null, i64 [[TMP5]]
; CHECK-NEXT: [[TMP7:%.*]] = extractelement <2 x i32> [[TMP3]], i32 1
; CHECK-NEXT: [[TMP6:%.*]] = zext i32 [[TMP7]] to i64
; CHECK-NEXT: [[GETELEMENTPTR6:%.*]] = getelementptr i32, ptr null, i64 [[TMP6]]
; CHECK-NEXT: ret void
; CHECK: bb7:
; CHECK-NEXT: [[PHI8:%.*]] = phi i64 [ [[TMP2]], [[BB3:%.*]] ], [ [[TMP2]], [[BB3]] ]
; CHECK-NEXT: br label [[BB9]]
; CHECK: bb9:
; CHECK-NEXT: ret void
; CHECK: bb10:
; CHECK-NEXT: ret void
;
bb:
%zext = zext i32 %arg to i64
%zext1 = zext i32 0 to i64
br label %bb2
bb2:
switch i32 0, label %bb10 [
i32 0, label %bb9
i32 11, label %bb9
i32 1, label %bb4
]
bb3:
switch i32 0, label %bb10 [
i32 18, label %bb7
i32 1, label %bb7
i32 0, label %bb10
]
bb4:
%phi = phi i64 [ %zext, %bb2 ]
%phi5 = phi i64 [ %zext1, %bb2 ]
%getelementptr = getelementptr i32, ptr null, i64 %phi
%getelementptr6 = getelementptr i32, ptr null, i64 %phi5
ret void
bb7:
%phi8 = phi i64 [ %zext, %bb3 ], [ %zext, %bb3 ]
br label %bb9
bb9:
ret void
bb10:
ret void
}