Currently loop cache analysis uses following formula to evaluate cost of
an RefGroup for a consecutive memory access:
`RefCost=(TripCount*Stride)/CLS`
This cost evaluates to zero when `TripCount*Stride` is smaller than
cache-line-size. This results in wrong cost value for a loop and
misleads loopInterchange decisions as shown in [this
case](https://llvm.godbolt.org/z/jTz1vn4hn).
This patch fixes the problem by rounding the cost to 1 once this problem
happens.