unroll/unroll-and-jam more powerful; add additional affine expr builder methods
- use previously added analysis/simplification to infer multiple of unroll
factor trip counts, making loop unroll/unroll-and-jam more general.
- for loop unroll, support bounds that are single result affine map's with the
same set of operands. For unknown loop bounds, loop unroll will now work as
long as trip count can be determined to be a multiple of unroll factor.
- extend getConstantTripCount to deal with single result affine map's with the
same operands. move it to mlir/Analysis/LoopAnalysis.cpp
- add additional builder utility methods for affine expr arithmetic
(difference, mod/floordiv/ceildiv w.r.t postitive constant). simplify code to
use the utility methods.
- move affine analysis routines to AffineAnalysis.cpp/.h from
AffineStructures.cpp/.h.
- Rename LoopUnrollJam to LoopUnrollAndJam to match class name.
- add an additional simplification for simplifyFloorDiv, simplifyCeilDiv
- Rename AffineMap::getNumOperands() getNumInputs: an affine map by itself does
not have operands. Operands are passed to it through affine_apply, from loop
bounds/if condition's, etc., operands are stored in the latter.
This should be sufficiently powerful for now as far as unroll/unroll-and-jam go for TPU
code generation, and can move to other analyses/transformations.
Loop nests like these are now unrolled without any cleanup loop being generated.
for %i = 1 to 100 {
// unroll factor 4: no cleanup loop will be generated.
for %j = (d0) -> (d0) (%i) to (d0) -> (5*d0 + 3) (%i) {
%x = "foo"(%j) : (affineint) -> i32
}
}
for %i = 1 to 100 {
// unroll factor 4: no cleanup loop will be generated.
for %j = (d0) -> (d0) (%i) to (d0) -> (d0 - d mod 4 - 1) (%i) {
%y = "foo"(%j) : (affineint) -> i32
}
}
for %i = 1 to 100 {
for %j = (d0) -> (d0) (%i) to (d0) -> (d0 + 128) (%i) {
%x = "foo"() : () -> i32
}
}
TODO(bondhugula): extend this to LoopUnrollAndJam as well in the next CL (with minor
changes).
PiperOrigin-RevId: 212661212
loop counts. Improve / refactor loop unroll / loop unroll and jam.
- add utility to remove single iteration loops.
- use this utility to promote single iteration loops after unroll/unroll-and-jam
- use loopUnrollByFactor for loopUnrollFull and remove most of the latter.
- add methods for getting constant loop trip count
PiperOrigin-RevId: 212039569
- for test purposes, the unroll-jam pass unroll jams the first outermost loop.
While on this:
- fix StmtVisitor to allow overriding of function to iterate walk over children
of a stmt.
PiperOrigin-RevId: 210644813
This revamps implementation of the loop bounds in the ForStmt, using general representation that supports operands. The frequent case of constant bounds is supported
via special access methods.
This also includes:
- Operand iterators for the Statement class.
- OpPointer::is() method to query the class of the Operation.
- Support for the bound shorthand notation parsing and printing.
- Validity checks for the bound operands used as dim ids and symbols
I didn't mean this CL to be so large. It just happened this way, as one thing led to another.
PiperOrigin-RevId: 210204858
operation and statement to have a location, and make it so a location is
required to be specified whenever you make one (though a null location is still
allowed). This is to encourage compiler authors to propagate loc info
properly, allowing our failability story to work well.
This is still a WIP - it isn't clear if we want to continue abusing Attribute
for location information, or whether we should introduce a new class heirarchy
to do so. This is good step along the way, and unblocks some of the tf/xla
work that builds upon it.
PiperOrigin-RevId: 210001406
Collect loops through a post order walk instead of a pre-order so that loops
are collected from inner loops are collected before outer surrounding ones.
Add a complex test case.
PiperOrigin-RevId: 209041057
an operand mapping, which simplifies it a bit. Implement cloning for IfStmt,
rename getThenClause() to getThen() which is unambiguous and less repetitive in
use cases.
PiperOrigin-RevId: 207915990
- fix/complete forStmt cloning for unrolling to work for outer loops
- create IV const's only when needed
- test outer loop unrolling by creating a short trip count unroll pass for
loops with trip counts <= <parameter>
- add unrolling test cases for multiple op results, outer loop unrolling
- fix/clean up StmtWalker class while on this
- switch unroll loop iterator values from i32 to affineint
PiperOrigin-RevId: 207645967
- deal with non-operation stmt's (if/for stmt's) in loops being unrolled
(unrolling of non-innermost loops works).
- update uses in unrolled bodies to use results of new operations that may be
introduced in the unrolled bodies.
Unrolling now works for all kinds of loop nests - perfect nests, imperfect
nests, loops at any depth, and with any kind of operation in the body. (IfStmt
support not done, hence untested there).
Added missing dump/print method for StmtBlock.
TODO: add test case for outer loop unrolling.
PiperOrigin-RevId: 207314286
MLFunctions.
- MLStmt cloning and IV replacement
- While at this, fix the innermostLoopGatherer to actually gather all the
innermost loops (it was stopping its walk at the first innermost loop it
found)
- Improve comments for MLFunction statement classes, fix inheritance order.
- Fixed StmtBlock destructor.
PiperOrigin-RevId: 207049173
Fix b/112039912 - we were recording 'i' instead of '%i' for loop induction variables causing "use of undefined SSA value" error.
PiperOrigin-RevId: 206884644
- Sketch out a TensorFlow/IR directory that will hold op definitions and common TF support logic. We will eventually have TensorFlow/TF2HLO, TensorFlow/Grappler, TensorFlow/TFLite, etc.
- Add sketches of a Switch/Merge op definition, including some missing stuff like the TwoResults trait. Add a skeleton of a pass to raise this form.
- Beef up the Pass/FunctionPass definitions slightly, moving the common code out of LoopUnroll.cpp into a new IR/Pass.cpp file.
- Switch ConvertToCFG.cpp to be a ModulePass.
- Allow _ to start bare identifiers, since this is important for TF attributes.
PiperOrigin-RevId: 206502517
- Update InnermostLoopGatherer to use a post order traversal (linear
time/single traversal).
- Drop getNumNestedLoops().
- Update isInnermost() to use the StmtWalker.
When using return values in conjunction with walkers, the StmtWalker CRTP
pattern doesn't appear to be of any use. It just requires overriding nearly all
of the methods, which is what InnermostLoopGatherer currently does. Please see
FIXME/ENLIGHTENME comments. TODO: figure this out from this CL discussion.
Note
- Comments on visitor/walker base class are out of date; will update when this
CL is finalized.
PiperOrigin-RevId: 206340901
pointer, and ensure that functions are deleted when the module is destroyed.
This exposed the fact that MLFunction had no dtor, and that the dtor in
CFGFunction was broken with cyclic references. Fix both of these problems.
PiperOrigin-RevId: 206051666
- Implement a full loop unroll for innermost loops.
- Use it to implement a pass that unroll all the innermost loops of all
mlfunction's in a module. ForStmt's parsed currently have constant trip
counts (and constant loop bounds).
- Implement StmtVisitor based (Visitor pattern)
Loop IVs aren't currently parsed and represented as SSA values. Replacing uses
of loop IVs in unrolled bodies is thus a TODO. Class comments are sparse at some places - will add them after one round of comments.
A cmd-line flag triggers this for now.
Original:
mlfunc @loops() {
for x = 1 to 100 step 2 {
for x = 1 to 4 {
"Const"(){value: 1} : () -> ()
}
}
return
}
After unrolling:
mlfunc @loops() {
for x = 1 to 100 step 2 {
"Const"(){value: 1} : () -> ()
"Const"(){value: 1} : () -> ()
"Const"(){value: 1} : () -> ()
"Const"(){value: 1} : () -> ()
}
return
}
PiperOrigin-RevId: 205933235