Try to runtime-unroll loops with early-continues depending on loop-varying loads; this helps with branch-prediction for the early-continues and can significantly improve performance for such loops Builds on top of https://github.com/llvm/llvm-project/pull/118317. PR: https://github.com/llvm/llvm-project/pull/118499.