We compute atan2f(y, x) in 2 stages: - Fast step: perform computations in double precision , with relative errors < 2^-50 - Accurate step: if the result from the Fast step fails Ziv's rounding test, then we perform computations in double-double precision, with relative errors < 2^-100. On Ryzen 5900X, worst-case latency is ~ 200 clocks, compared to average latency ~ 60 clocks, and average reciprocal throughput ~ 20 clocks.
35 KiB
35 KiB