AAPCS32 defines the fp16 and bf16 types as being passed as if they were extended to 32 bits, with the high 16 bits being unspecified. The extension is specified as happening as-if it was done in a register, which means that for big endian targets, the actual value gets passed in the higher addressed half of the stack slot, instead of the lower addressed half as for little endian. Previously, for targets with the fp16 extension, we were passing these types as a 16 bit stack slot, which worked for little endian because every later stack slot would be 4-byte aligned leaving the 2 byte gap, but was incorrect for big endian.
13 KiB
13 KiB