Files
clang-p2996/llvm/test/CodeGen/BPF/assembler-disassembler-v4.s
Peilin Ye 17bfc00f7c [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (#108636)
As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.  Define 2 new flags:

  BPF_LOAD_ACQ    0x100
  BPF_STORE_REL   0x110

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).

Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or
16-bit load-acquire zero-extends the value before writing it to a 32-bit
register, just like ARM64 instruction LDAPRH and friends.

As an example (assuming little-endian):

  long foo(long *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

Inline assembly is also supported.

Add a pre-defined macro, __BPF_FEATURE_LOAD_ACQ_STORE_REL, to let
developers detect this new feature.  It can also be disabled using a new
llc option, -disable-load-acq-store-rel.

Using __ATOMIC_RELAXED for __atomic_store{,_n}() will generate a "plain"
store (BPF_MEM | BPF_STX) instruction:

  void foo(short *ptr, short val) {
      __atomic_store_n(ptr, val, __ATOMIC_RELAXED);
  }

  6b 21 00 00 00 00 00 00  *(u16 *)(r1 + 0x0) = w2
  95 00 00 00 00 00 00 00  exit

Similarly, using __ATOMIC_RELAXED for __atomic_load{,_n}() will generate
a zero-extending, "plain" load (BPF_MEM | BPF_LDX) instruction:

  int foo(char *ptr) {
      return __atomic_load_n(ptr, __ATOMIC_RELAXED);
  }

  71 11 00 00 00 00 00 00  w1 = *(u8 *)(r1 + 0x0)
  bc 10 08 00 00 00 00 00  w0 = (s8)w1
  95 00 00 00 00 00 00 00  exit

Currently __ATOMIC_CONSUME is an alias for __ATOMIC_ACQUIRE.  Using
__ATOMIC_SEQ_CST ("sequentially consistent") is not supported yet and
will cause an error:

  $ clang --target=bpf -mcpu=v4 -c bar.c > /dev/null
bar.c:1:5: error: sequentially consistent (seq_cst) atomic load/store is
not supported
1 | int foo(int *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
      |     ^
  ...

Finally, rename those isST*() and isLD*() helper functions in
BPFMISimplifyPatchable.cpp based on what the instructions actually do,
rather than their instruction class.

[1]
https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/
2025-03-04 09:19:39 -08:00

65 lines
2.1 KiB
ArmAsm

// RUN: llvm-mc -triple bpfel --mcpu=v4 --assemble --filetype=obj %s \
// RUN: | llvm-objdump -d --mattr=+alu32 - \
// RUN: | FileCheck %s
// CHECK: d7 01 00 00 10 00 00 00 r1 = bswap16 r1
// CHECK: d7 02 00 00 20 00 00 00 r2 = bswap32 r2
// CHECK: d7 03 00 00 40 00 00 00 r3 = bswap64 r3
r1 = bswap16 r1
r2 = bswap32 r2
r3 = bswap64 r3
// CHECK: 91 41 00 00 00 00 00 00 r1 = *(s8 *)(r4 + 0x0)
// CHECK: 89 52 04 00 00 00 00 00 r2 = *(s16 *)(r5 + 0x4)
// CHECK: 81 63 08 00 00 00 00 00 r3 = *(s32 *)(r6 + 0x8)
r1 = *(s8 *)(r4 + 0)
r2 = *(s16 *)(r5 + 4)
r3 = *(s32 *)(r6 + 8)
// CHECK: 91 41 00 00 00 00 00 00 r1 = *(s8 *)(r4 + 0x0)
// CHECK: 89 52 04 00 00 00 00 00 r2 = *(s16 *)(r5 + 0x4)
r1 = *(s8 *)(r4 + 0)
r2 = *(s16 *)(r5 + 4)
// CHECK: bf 41 08 00 00 00 00 00 r1 = (s8)r4
// CHECK: bf 52 10 00 00 00 00 00 r2 = (s16)r5
// CHECK: bf 63 20 00 00 00 00 00 r3 = (s32)r6
r1 = (s8)r4
r2 = (s16)r5
r3 = (s32)r6
// CHECK: bc 31 08 00 00 00 00 00 w1 = (s8)w3
// CHECK: bc 42 10 00 00 00 00 00 w2 = (s16)w4
w1 = (s8)w3
w2 = (s16)w4
// CHECK: 3f 31 01 00 00 00 00 00 r1 s/= r3
// CHECK: 9f 42 01 00 00 00 00 00 r2 s%= r4
r1 s/= r3
r2 s%= r4
// CHECK: 3c 31 01 00 00 00 00 00 w1 s/= w3
// CHECK: 9c 42 01 00 00 00 00 00 w2 s%= w4
w1 s/= w3
w2 s%= w4
// CHECK: d3 10 00 00 00 01 00 00 w0 = load_acquire((u8 *)(r1 + 0x0))
// CHECK: cb 10 00 00 00 01 00 00 w0 = load_acquire((u16 *)(r1 + 0x0))
// CHECK: c3 10 00 00 00 01 00 00 w0 = load_acquire((u32 *)(r1 + 0x0))
w0 = load_acquire((u8 *)(r1 + 0))
w0 = load_acquire((u16 *)(r1 + 0))
w0 = load_acquire((u32 *)(r1 + 0))
// CHECK: db 10 00 00 00 01 00 00 r0 = load_acquire((u64 *)(r1 + 0x0))
r0 = load_acquire((u64 *)(r1 + 0))
// CHECK: d3 21 00 00 10 01 00 00 store_release((u8 *)(r1 + 0x0), w2)
// CHECK: cb 21 00 00 10 01 00 00 store_release((u16 *)(r1 + 0x0), w2)
// CHECK: c3 21 00 00 10 01 00 00 store_release((u32 *)(r1 + 0x0), w2)
store_release((u8 *)(r1 + 0), w2)
store_release((u16 *)(r1 + 0), w2)
store_release((u32 *)(r1 + 0), w2)
// CHECK: db 21 00 00 10 01 00 00 store_release((u64 *)(r1 + 0x0), r2)
store_release((u64 *)(r1 + 0), r2)