- hoist DMAs past all loops immediately surrounding the region that the latter
is invariant on - do this at DMA generation time itself
PiperOrigin-RevId: 234628447
generation pass to make it drop certain assumptions, complete TODOs.
- multiple fixes for getMemoryFootprintBytes
- pass loopDepth correctly from getMemoryFootprintBytes()
- use union while computing memory footprints
- bug fixes for addAffineForOpDomain
- take into account loop step
- add domains of other loop IVs in turn that might have been used in the bounds
- dma-generate: drop assumption of "non-unit stride loops being tile space loops
and skipping those and recursing to inner depths"; DMA generation is now purely
based on available fast mem capacity and memory footprint's calculated
- handle memory region compute failures/bailouts correctly from dma-generate
- loop tiling cleanup/NFC
- update some debug and error messages to use emitNote/emitError in
pipeline-data-transfer pass - NFC
PiperOrigin-RevId: 234245969
for dma-generate, loop-unroll.
- add -tile-sizes command line option for loop tiling to specify different tile
sizes for loops in a band
- clean up command line options for loop-unroll, dma-generate (remove
cl::hidden)
PiperOrigin-RevId: 234006232
- for the DMA buffers being allocated (and their tags), generate corresponding deallocs
- minor related update to replaceAllMemRefUsesWith and PipelineDataTransfer pass
Code generation for DMA transfers was being done with the initial simplifying
assumption that the alloc's would map to scoped allocations, and so no
deallocations would be necessary. Drop this assumption to generalize. Note that
even with scoped allocations, unrolling loops that have scoped allocations
could create a series of allocations and exhaustion of fast memory. Having a
end of lifetime marker like a dealloc in fact allows creating new scopes if
necessary when lowering to a backend and still utilize scoped allocation.
DMA buffers created by -dma-generate are guaranteed to have either
non-overlapping lifetimes or nested lifetimes.
PiperOrigin-RevId: 233502632
* AffineStructures has moved to IR.
* simplifyAffineExpr/simplifyAffineMap/getFlattenedAffineExpr have moved to IR.
* makeComposedAffineApply/fullyComposeAffineMapAndOperands have moved to AffineOps.
* ComposeAffineMaps is replaced by AffineApplyOp::canonicalize and deleted.
PiperOrigin-RevId: 232586468
- use getAccessMap() instead of repeating it
- fold getMemRefRegion into MemRefRegion ctor (more natural, avoid heap
allocation and unique_ptr where possible)
- change extractForInductionVars - MutableArrayRef -> ArrayRef for the
arguments. Since the method is just returning copies of 'Value *', the client
can't mutate the pointers themselves; it's fine to mutate the 'Value''s
themselves, but that doesn't mutate the pointers to those.
- change the way extractForInductionVars returns (see b/123437690)
PiperOrigin-RevId: 232359277
loops), (2) take into account fast memory space capacity and lower 'dmaDepth'
to fit, (3) add location information for debug info / errors
- change dma-generate pass to work on blocks of instructions (start/end
iterators) instead of 'for' loops; complete TODOs - allows DMA generation for
straightline blocks of operation instructions interspersed b/w loops
- take into account fast memory capacity: check whether memory footprint fits
in fastMemoryCapacity parameter, and recurse/lower the depth at which DMA
generation is performed until it does fit in the provided memory
- add location information to MemRefRegion; any insufficient fast memory
capacity errors or debug info w.r.t dma generation shows location information
- allow DMA generation pass to be instantiated with a fast memory capacity
option (besides command line flag)
- change getMemRefRegion to return unique_ptr's
- change getMemRefFootprintBytes to work on a 'Block' instead of 'ForInst'
- other helper methods; add postDomInstFilter option for
replaceAllMemRefUsesWith; drop forInst->walkOps, add Block::walkOps methods
Eg. output
$ mlir-opt -dma-generate -dma-fast-mem-capacity=1 /tmp/single.mlir
/tmp/single.mlir:9:13: error: Total size of all DMA buffers' for this block exceeds fast memory capacity
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
^
$ mlir-opt -debug-only=dma-generate -dma-generate -dma-fast-mem-capacity=400 /tmp/single.mlir
/tmp/single.mlir:9:13: note: 8 KiB of DMA buffers in fast memory space for this block
for %i3 = (d0) -> (d0)(%i1) to (d0) -> (d0 + 32)(%i1) {
PiperOrigin-RevId: 232297044
Example:
dma-generate options:
-dma-fast-mem-capacity - Set fast memory space ...
-dma-fast-mem-space=<uint> - Set fast memory space ...
loop-fusion options:
-fusion-compute-tolerance=<number> - Fractional increase in ...
-fusion-maximal - Enables maximal loop fusion
loop-tile options:
-tile-size=<uint> - Use this tile size for ...
loop-unroll options:
-unroll-factor=<uint> - Use this unroll factor ...
-unroll-full - Fully unroll loops
-unroll-full-threshold=<uint> - Unroll all loops with ...
-unroll-num-reps=<uint> - Unroll innermost loops ...
loop-unroll-jam options:
-unroll-jam-factor=<uint> - Use this unroll jam factor ...
PiperOrigin-RevId: 231019363
- introduce a way to compute union using symbolic rectangular bounding boxes
- handle multiple load/store op's to the same memref by taking a union of the regions
- command-line argument to provide capacity of the fast memory space
- minor change to replaceAllMemRefUsesWith to not generate affine_apply if the
supplied index remap was identity
PiperOrigin-RevId: 230848185
- switch some debug info to emitError
- use a single constant op for zero index to make it easier to write/update
test cases; avoid creating new constant op's for common zero index cases
- test case cleanup
This is in preparation for an upcoming major update to this pass.
PiperOrigin-RevId: 230728379
- update fusion cost model to fuse while tolerating a certain amount of redundant
computation; add cl option -fusion-compute-tolerance
evaluate memory footprint and intermediate memory reduction
- emit debug info from -loop-fusion showing what was fused and why
- introduce function to compute memory footprint for a loop nest
- getMemRefRegion readability update - NFC
PiperOrigin-RevId: 230541857
- ForInst::walkOps will also be used in an upcoming CL (cl/229438679); better to have
this instead of deriving from the InstWalker
PiperOrigin-RevId: 230413820
- the size of the private memref created for the slice should be based on
the memref region accessed at the depth at which the slice is being
materialized, i.e., symbolic in the outer IVs up until that depth, as opposed
to the region accessed based on the entire domain.
- leads to a significant contraction of the temporary / intermediate memref
whenever the memref isn't reduced to a single scalar (through store fwd'ing).
Other changes
- update to promoteIfSingleIteration - avoid introducing unnecessary identity
map affine_apply from IV; makes it much easier to write and read test cases
and pass output for all passes that use promoteIfSingleIteration; loop-fusion
test cases become much simpler
- fix replaceAllMemrefUsesWith bug that was exposed by the above update -
'domInstFilter' could be one of the ops erased due to a memref replacement in
it.
- fix getConstantBoundOnDimSize bug: a division by the coefficient of the identifier was
missing (the latter need not always be 1); add lbFloorDivisors output argument
- rename getBoundingConstantSizeAndShape -> getConstantBoundingSizeAndShape
PiperOrigin-RevId: 230405218
- the load/store forwarding relies on memref dependence routines as well as
SSA/dominance to identify the memref store instance uniquely supplying a value
to a memref load, and replaces the result of that load with the value being
stored. The memref is also deleted when possible if only stores remain.
- add methods for post dominance for MLFunction blocks.
- remove duplicated getLoopDepth/getNestingDepth - move getNestingDepth,
getMemRefAccess, getNumCommonSurroundingLoops into Analysis/Utils (were
earlier static)
- add a helper method in FlatAffineConstraints - isRangeOneToOne.
PiperOrigin-RevId: 227252907
Function::walk functionality into f->walkInsts/Ops which allows visiting all
instructions, not just ops. Eliminate Function::getBody() and
Function::getReturn() helpers which crash in CFG functions, and were only kept
around as a bridge.
This is step 25/n towards merging instructions and statements.
PiperOrigin-RevId: 227243966
consistent and moving the using declarations over. Hopefully this is the last
truly massive patch in this refactoring.
This is step 21/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227178245
The last major renaming is Statement -> Instruction, which is why Statement and
Stmt still appears in various places.
This is step 19/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227163082
FuncBuilder class. Also rename SSAValue.cpp to Value.cpp
This is step 12/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227067644
is the new base of the SSA value hierarchy. This CL also standardizes all the
nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing.
This is step 11/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 227064624
making it more similar to the CFG side of things. It is true that in a deeply
nested case that this is not a guaranteed O(1) time operation, and that 'get'
could lead compiler hackers to think this is cheap, but we need to merge these
and we can look into solutions for this in the future if it becomes a problem
in practice.
This is step 9/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226983931
from it. This is necessary progress to squaring away the parent relationship
that a StmtBlock has with its enclosing if/for/fn, and makes room for functions
to have more than one block in the future. This also removes IfClause and ForStmtBody.
This is step 5/n towards merging instructions and statements, NFC.
PiperOrigin-RevId: 226936541
StmtBlock. This is more consistent with IfStmt and also conceptually makes
more sense - a forstmt "isn't" its body, it contains its body.
This is step 1/N towards merging BasicBlock and StmtBlock. This is required
because in the new regime StmtBlock will have a use list (just like BasicBlock
does) of operands, and ForStmt already has a use list for its induction
variable.
This is a mechanical patch, NFC.
PiperOrigin-RevId: 226684158
- generate DMAs correctly now using strided DMAs where needed
- add support for multi-level/nested strides; op still supports one level of
stride for now.
Other things
- add test case for symbolic lower/upper bound; cases where the DMA buffer
size can't be bounded by a known constant
- add test case for dynamic shapes where the DMA buffers are however bounded by
constants
- refactor some of the '-dma-generate' code
PiperOrigin-RevId: 224584529
update/improve/clean up API.
- update FlatAffineConstraints::getConstBoundDifference; return constant
differences between symbolic affine expressions, look at equalities as well.
- fix buffer size computation when generating DMAs symbolic in outer loops,
correctly handle symbols at various places (affine access maps, loop bounds,
loop IVs outer to the depth at which DMA generation is being done)
- bug fixes / complete some TODOs for getMemRefRegion
- refactor common code b/w memref dependence check and getMemRefRegion
- FlatAffineConstraints API update; added methods employ trivial checks /
detection - sufficient to handle hyper-rectangular cases in a precise way
while being fast / low complexity. Hyper-rectangular cases fall out as
trivial cases for these methods while other cases still do not cause failure
(either return conservative or return failure that is handled by the caller).
PiperOrigin-RevId: 224229879
cases.
- fix bug in calculating index expressions for DMA buffers in certain cases
(affected tiled loop nests); add more test cases for better coverage.
- introduce an additional optional argument to replaceAllMemRefUsesWith;
additional operands to the index remap AffineMap can now be supplied by the
client.
- FlatAffineConstraints::addBoundsForStmt - fix off by one upper bound,
::composeMap - fix position bug.
- Some clean up and more comments
PiperOrigin-RevId: 222434628
and getMemRefRegion() to work with specified loop depths; add support for
outgoing DMAs, store op's.
- add support for getMemRefRegion symbolic in outer loops - hence support for
DMAs symbolic in outer surrounding loops.
- add DMA generation support for outgoing DMAs (store op's to lower memory
space); extend getMemoryRegion to store op's. -memref-bound-check now works
with store op's as well.
- fix dma-generate (references to the old memref in the dma_start op were also
being replaced with the new buffer); we need replace all memref uses to work
only on a subset of the uses - add a new optional argument for
replaceAllMemRefUsesWith. update replaceAllMemRefUsesWith to take an optional
'operation' argument to serve as a filter - if provided, only those uses that
are dominated by the filter are replaced.
- Add missing print for attributes for dma_start, dma_wait op's.
- update the FlatAffineConstraints API
PiperOrigin-RevId: 221889223
- constant bounded memory regions, static shapes, no handling of
overlapping/duplicate regions (through union) for now; also only, load memory
op's.
- add build methods for DmaStartOp, DmaWaitOp.
- move getMemoryRegion() into Analysis/Utils and expose it.
- fix addIndexSet, getMemoryRegion() post switch to exclusive upper bounds;
update test cases for memref-bound-check and memref-dependence-check for
exclusive bounds (missed in a previous CL)
PiperOrigin-RevId: 220729810