The previous behavior could be harmful in some edge cases, such as emitting a call to `fma()` in the `fma()` implementation itself. Do this by just being more accurate in `isFMAFasterThanFMulAndFAdd()`. This was already done for PowerPC; this commit just extends that to Arm, z/Arch, and x86. MIPS and SPARC already got it right, but I added tests for them too, for good measure. Note: I don't have commit access.