Consider IR such as this: for.body: %iv = phi i64 [ 0, %entry ], [ %iv.next, %for.body ] %accum = phi i32 [ 0, %entry ], [ %add, %for.body ] %gep.a = getelementptr i8, ptr %a, i64 %iv %load.a = load i8, ptr %gep.a, align 1 %ext.a = zext i8 %load.a to i32 %add = add i32 %ext.a, %accum %iv.next = add i64 %iv, 1 %exitcond.not = icmp eq i64 %iv.next, 1025 br i1 %exitcond.not, label %for.exit, label %for.body Conceptually we can vectorise this using partial reductions too, although the current loop vectoriser implementation requires the accumulation of a multiply. For AArch64 this is easily done with a udot or sdot with an identity operand, i.e. a vector of (i16 1). In order to do this I had to teach getScaledReductions that the accumulated value may come from a unary op, hence there is only one extension to consider. Similarly, I updated the vplan and AArch64 TTI cost model to understand the possible unary op. --------- Co-authored-by: Matt Devereau <matthew.devereau@arm.com>
The LLVM Compiler Infrastructure ================================ This directory and its subdirectories contain source code for LLVM, a toolkit for the construction of highly optimized compilers, optimizers, and runtime environments. LLVM is open source software. You may freely distribute it under the terms of the license agreement found in LICENSE.txt. Please see the documentation provided in docs/ for further assistance with LLVM, and in particular docs/GettingStarted.rst for getting started with LLVM and docs/README.txt for an overview of LLVM's documentation setup. If you are writing a package for LLVM, see docs/Packaging.rst for our suggestions.