In https://reviews.llvm.org/D30114, support for mismatching address
spaces was introduced to CodeGenPrepare's optimizeMemoryInst, using
addrspacecast as it was argued that only no-op addrspacecasts would be
considered when constructing the address mode. However, by doing
inttoptr/ptrtoint, it's possible to get CGP to emit an addrspace
that's not actually no-op, introducing a miscompilation:
define void @kernel(i8* %julia_ptr) {
%intptr = ptrtoint i8* %julia_ptr to i64
%ptr = inttoptr i64 %intptr to i32 addrspace(3)*
br label %end
end:
store atomic i32 1, i32 addrspace(3)* %ptr unordered, align 4
ret void
}
Gets compiled to:
define void @kernel(i8* %julia_ptr) {
end:
%0 = addrspacecast i8* %julia_ptr to i32 addrspace(3)*
store atomic i32 1, i32 addrspace(3)* %0 unordered, align 4
ret void
}
In the case of NVPTX, this introduces a cvta.to.shared, whereas
leaving out the %end block and branch doesn't trigger this
optimization. This results in illegal memory accesses as seen in
https://github.com/JuliaGPU/CUDA.jl/issues/558
In this change, I introduced a check before doing the pointer cast
that verifies address spaces are the same. If not, it emits a
ptrtoint/inttoptr combination to get a no-op cast between address
spaces. I decided against disallowing ptrtoint/inttoptr with
non-default AS in matchOperationAddr, because now its still possible
to look through multiple sequences of them that ultimately do not
result in a address space mismatch (i.e. the second lit test).