Replace iterators of the outermost loop with region arguments of the innermost one. The changes avoid later `bufferization` passes to insert allocation within the body of the innermost loop. Reviewed By: mravishankar Differential Revision: https://reviews.llvm.org/D130083