In [1], a few new insns are proposed to expand BPF ISA to
. fixing the limitation of existing insn (e.g., 16bit jmp offset)
. adding new insns which may improve code quality
(sign_ext_ld, sign_ext_mov, st)
. feature complete (sdiv, smod)
. better user experience (bswap)
This patch implemented insn encoding for
. sign-extended load
. sign-extended mov
. sdiv/smod
. bswap insns
. unconditional jump with 32bit offset
The new bswap insns are generated under cpu=v4 for __builtin_bswap.
For cpu=v3 or earlier, for __builtin_bswap, be or le insns are generated
which is not intuitive for the user.
To support 32-bit branch offset, a 32-bit ja (JMPL) insn is implemented.
For conditional branch which is beyond 16-bit offset, llvm will do
some transformation 'cond_jmp' -> 'cond_jmp + jmpl' to simulate 32bit
conditional jmp. See BPFMIPeephole.cpp for details. The algorithm is
hueristic based. I have tested bpf selftest pyperf600 with unroll account
600 which can indeed generate 32-bit jump insn, e.g.,
13: 06 00 00 00 9b cd 00 00 gotol +0xcd9b <LBB0_6619>
Eduard is working on to add 'st' insn to cpu=v4.
A list of llc flags:
disable-ldsx, disable-movsx, disable-bswap,
disable-sdiv-smod, disable-gotol
can be used to disable a particular insn for cpu v4.
For example, user can do:
llc -march=bpf -mcpu=v4 -disable-movsx t.ll
to enable cpu v4 without movsx insns.
References:
[1] https://lore.kernel.org/bpf/4bfe98be-5333-1c7e-2f6d-42486c8ec039@meta.com/
Differential Revision: https://reviews.llvm.org/D144829
45 lines
1.2 KiB
ArmAsm
45 lines
1.2 KiB
ArmAsm
// RUN: llvm-mc -triple bpfel --mcpu=v4 --assemble --filetype=obj %s \
|
|
// RUN: | llvm-objdump -d --mattr=+alu32 - \
|
|
// RUN: | FileCheck %s
|
|
|
|
// CHECK: d7 01 00 00 10 00 00 00 r1 = bswap16 r1
|
|
// CHECK: d7 02 00 00 20 00 00 00 r2 = bswap32 r2
|
|
// CHECK: d7 03 00 00 40 00 00 00 r3 = bswap64 r3
|
|
r1 = bswap16 r1
|
|
r2 = bswap32 r2
|
|
r3 = bswap64 r3
|
|
|
|
// CHECK: 91 41 00 00 00 00 00 00 r1 = *(s8 *)(r4 + 0x0)
|
|
// CHECK: 89 52 04 00 00 00 00 00 r2 = *(s16 *)(r5 + 0x4)
|
|
// CHECK: 81 63 08 00 00 00 00 00 r3 = *(s32 *)(r6 + 0x8)
|
|
r1 = *(s8 *)(r4 + 0)
|
|
r2 = *(s16 *)(r5 + 4)
|
|
r3 = *(s32 *)(r6 + 8)
|
|
|
|
// CHECK: 91 41 00 00 00 00 00 00 r1 = *(s8 *)(r4 + 0x0)
|
|
// CHECK: 89 52 04 00 00 00 00 00 r2 = *(s16 *)(r5 + 0x4)
|
|
r1 = *(s8 *)(r4 + 0)
|
|
r2 = *(s16 *)(r5 + 4)
|
|
|
|
// CHECK: bf 41 08 00 00 00 00 00 r1 = (s8)r4
|
|
// CHECK: bf 52 10 00 00 00 00 00 r2 = (s16)r5
|
|
// CHECK: bf 63 20 00 00 00 00 00 r3 = (s32)r6
|
|
r1 = (s8)r4
|
|
r2 = (s16)r5
|
|
r3 = (s32)r6
|
|
|
|
// CHECK: bc 31 08 00 00 00 00 00 w1 = (s8)w3
|
|
// CHECK: bc 42 10 00 00 00 00 00 w2 = (s16)w4
|
|
w1 = (s8)w3
|
|
w2 = (s16)w4
|
|
|
|
// CHECK: 3f 31 01 00 00 00 00 00 r1 s/= r3
|
|
// CHECK: 9f 42 01 00 00 00 00 00 r2 s%= r4
|
|
r1 s/= r3
|
|
r2 s%= r4
|
|
|
|
// CHECK: 3c 31 01 00 00 00 00 00 w1 s/= w3
|
|
// CHECK: 9c 42 01 00 00 00 00 00 w2 s%= w4
|
|
w1 s/= w3
|
|
w2 s%= w4
|