This revision updates the op semantics to also allow rank-reducing behavior as well as updates the implementation to reuse code between the sequential and the parallel version of the op. Depends on D128920 Differential Revision: https://reviews.llvm.org/D128985