AMDGPU target has faced the situation which can be illustrated with the
following testcase:
define void @dont_merge_cbranches(i32 %V) {
%divergent_cond = icmp ne i32 %V, 0
%uniform_cond = call i1 @uniform_result(i1 %divergent_cond)
br i1 %uniform_cond, label %bb2, label %exit, !prof !0
bb2:
br i1 %divergent_cond, label %bb3, label %exit
bb3:
call void @bar( )
br label %exit
exit:
ret void
}
!0 = !{!"branch_weights", i32 1, i32 100000}
SimplifyCFG merges branches on %uniform_cond and %divergent_cond which is undesirable because the first branch to bb2 is taken extremely rare and the second branch is expensive. The merged branch becomes as expensive as the second.
This patch prevents such merging if the branch to the second branch is unlikely to happen.
285 KiB
285 KiB