Previous implementation may took some extra time, when walked over the
same instructions several times. And also it did not include proper
analysis for cross-basic-block use of the vectorized values. This
version fixes it.
It walks over the tree and checks the deps between entries and their
operands. If there are non-vectorized calls in between, it adds
a single(!) spill cost, because the vector value should be
spilled/reloaded only once.
Also, this version caches analysis for each entries, which are detected,
and do not repeats it, uses data, found during previous analysis for
previous nodes.
Also, it has the internal limit. If the number of instructions
between nodes and their operands is too big (> than ScheduleRegionSizeBudget / VectorizableTree.size()), it is considered that the spill is required. It allows to improve compile time.
Reviewers: preames, RKSimon, mikhailramalho
Reviewed By: preames
Pull Request: https://github.com/llvm/llvm-project/pull/129258