Apply loop guards to BTC before checking if the last iteration should be
peeled off. This also adds an assert to make sure applying the guards
does not pessimize the results. I checked on a large test set and it did
not trigger there, but it adds an additional guard to catch potential
cases where loop-guards pessimize results.
Peels ~15% more loops.
PR: https://github.com/llvm/llvm-project/pull/142605