clang-p2996

Author	SHA1	Message	Date
River Riddle	3e656599f1	Define a PassID class to use when defining a pass. This allows for the type used for the ID field to be self documenting. It also allows for the compiler to know the set alignment of the ID object, which is useful for storing pointer identifiers within llvm data structures. PiperOrigin-RevId: 235107957	2019-03-29 16:37:12 -07:00
River Riddle	48ccae2476	NFC: Refactor the files related to passes. * PassRegistry is split into its own source file. * Pass related files are moved to a new library 'Pass'. PiperOrigin-RevId: 234705771	2019-03-29 16:32:56 -07:00
Alex Zinenko	b4dba895a6	EDSC: make Expr typed and extensible Expose the result types of edsc::Expr, which are now stored for all types of Exprs and not only for the variadic ones. Require return types when an Expr is constructed, if it will ever have some. An empty return type list is interpreted as an Expr that does not create a value (e.g. `return` or `store`). Conceptually, all edss::Exprs are now typed, with the type being a (potentially empty) tuple of return types. Unbound expressions and Bindables must now be constructed with a specific type they will take. This makes EDSC less evidently type-polymorphic, but we can still write generic code such as Expr sumOfSquares(Expr lhs, Expr rhs) { return lhs * lhs + rhs * rhs; } and use it to construct different typed expressions as sumOfSquares(Bindable(IndexType::get(ctx)), Bindable(IndexType::get(ctx))); sumOfSquares(Bindable(FloatType::getF32(ctx)), Bindable(FloatType::getF32(ctx))); On the positive side, we get the following. 1. We can now perform type checking when constructing Exprs rather than during MLIR emission. Nevertheless, this is still duplicates the Op::verify() until we can factor out type checking from that. 2. MLIREmitter is significantly simplified. 3. ExprKind enum is only used for actual kinds of expressions. Data structures are converging with AbstractOperation, and the users can now create a VariadicExpr("canonical_op_name", {types}, {exprs}) for any operation, even an unregistered one without having to extend the enum and make pervasive changes to EDSCs. On the negative side, we get the following. 1. Typed bindables are more verbose, even in Python. 2. We lose the ability to do print debugging for higher-level EDSC abstractions that are implemented as multiple MLIR Ops, for example logical disjunction. This is the step 2/n towards making EDSC extensible. *** Move MLIR Op construction from MLIREmitter::emitExpr to Expr::build since Expr now has sufficient information to build itself. This is the step 3/n towards making EDSC extensible. Both of these strive to minimize the amount of irrelevant changes. In particular, this introduces more complex pretty-printing for affine and binary expression to make sure tests continue to pass. It also relies on string comparison to identify specific operations that an Expr produces. PiperOrigin-RevId: 234609882	2019-03-29 16:31:26 -07:00
Alex Zinenko	0a4c940c1b	EDSC: introduce support for blocks EDSC currently implement a block as a statement that is itself a list of statements. This suffers from two modeling problems: (1) these blocks are not addressable, i.e. one cannot create an instruction where thus constructed block is a successor; (2) they support block nesting, which is not supported by MLIR blocks. Furthermore, emitting such "compound statement" (misleadingly named `Block` in Python bindings) does not actually produce a new Block in the IR. Implement support for creating actual IR Blocks in EDSC. In particular, define a new StmtBlock EDSC class that is neither an Expr nor a Stmt but contains a list of Stmts. Additionally, StmtBlock may have (early-) typed arguments. These arguments are Bindable expressions that can be used inside the block. Provide two calls in the MLIREmitter, `emitBlock` that actually emits a new block and `emitBlockBody` that only emits the instructions contained in the block without creating a new block. In the latter case, the instructions must not use block arguments. Update Python bindings to make it clear when instruction emission happens without creating a new block. PiperOrigin-RevId: 234556474	2019-03-29 16:30:56 -07:00
Alex Zinenko	0e59e5c49b	EDSC: move Expr and Stmt construction operators to a namespace In the current state, edsc::Expr and edsc::Stmt overload operators to construct other Exprs and Stmts. This includes some unconventional overloads of the `operator==` to create a comparison expression and of the `operator!` to create a negation expression. This situation could lead to unpleasant surprises where the code does not behave like expected. Make all Expr and Stmt construction operators free functions and move them to the `edsc::op` namespace. Callers willing to use these operators must explicitly include them with the `using` declaration. This can be done in some local scope. Additionally, we currently emit signed comparisons for order-comparison operators. With namespaces, we can later introduce two sets of operators in different namespace, e.g. `edsc::op::sign` and `edsc::op::unsign` to clearly state which kind of comparison is implied. PiperOrigin-RevId: 233578674	2019-03-29 16:25:08 -07:00
Uday Bondhugula	4ba8c9147d	Automated rollback of changelist 232717775. PiperOrigin-RevId: 232807986	2019-03-29 16:19:33 -07:00
River Riddle	90d10b4e00	NFC: Rename the 'for' operation in the AffineOps dialect to 'affine.for'. The is the second step to adding a namespace to the AffineOps dialect. PiperOrigin-RevId: 232717775	2019-03-29 16:17:59 -07:00
River Riddle	b499277fb6	Remove remaining usages of OperationInst in lib/Transforms. PiperOrigin-RevId: 232323671	2019-03-29 16:10:53 -07:00
Nicolas Vasilache	0353ef99eb	Cleanup EDSCs and start a functional auto-generated library of custom Ops This CL applies the following simplifications to EDSCs: 1. Rename Block to StmtList because an MLIR Block is a different, not yet supported, notion; 2. Rework Bindable to drop specific storage and just use it as a simple wrapper around Expr. The only value of Bindable is to force a static cast when used by the user to bind into the emitter. For all intended purposes, Bindable is just a lightweight check that an Expr is Unbound. This simplifies usage and reduces the API footprint. After playing with it for some time, it wasn't worth the API cognition overhead; 3. Replace makeExprs and makeBindables by makeNewExprs and copyExprs which is more explicit and less easy to misuse; 4. Add generally useful functionality to MLIREmitter: a. expose zero and one for the ubiquitous common lower bounds and step; b. add support to create already bound Exprs for all function arguments as well as shapes and views for Exprs bound to memrefs. 5. Delete Stmt::operator= and replace by a `Stmt::set` method which is more explicit. 6. Make Stmt::operator Expr() explicit. 7. Indexed.indices assertions are removed to pave the way for expressing slices and views as well as to work with 0-D memrefs. The CL plugs those simplifications with TableGen and allows emitting a full MLIR function for pointwise add. This "x.add" op is both type and rank-agnostic (by allowing ArrayRef of Expr passed to For loops) and opens the door to spinning up a composable library of existing and custom ops that should automate a lot of the tedious work in TF/XLA -> MLIR. Testing needs to be significantly improved but can be done in a separate CL. PiperOrigin-RevId: 231982325	2019-03-29 16:05:23 -07:00
Nicolas Vasilache	81c7f2e2f3	Cleanup resource management and rename recursive matchers This CL follows up on a memory leak issue related to SmallVector growth that escapes the BumpPtrAllocator. The fix is to properly use ArrayRef and placement new to define away the issue. The following renaming is also applied: 1. MLFunctionMatcher -> NestedPattern 2. MLFunctionMatches -> NestedMatch As a consequence all allocations are now guaranteed to live on the BumpPtrAllocator. PiperOrigin-RevId: 231047766	2019-03-29 15:39:53 -07:00
River Riddle	6859f33292	Migrate VectorOrTensorType/MemRefType shape api to use int64_t instead of int. PiperOrigin-RevId: 230605756	2019-03-29 15:33:20 -07:00
Nicolas Vasilache	9f3f39d61a	Cleanup EDSCs This CL performs a bunch of cleanups related to EDSCs that are generally useful in the context of using them with a simple wrapping C API (not in this CL) and with simple language bindings to Python and Swift. PiperOrigin-RevId: 230066505	2019-03-29 15:27:58 -07:00
Nicolas Vasilache	4573a8da9a	Fix improperly indexed DimOp in LowerVectorTransfers.cpp This CL fixes a misunderstanding in how to build DimOp which triggered execution issues in the CPU path. The problem is that, given a `memref<?x4x?x8x?xf32>`, the expressions to construct the dynamic dimensions should be: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and `dim %arg, 4 : memref<?x4x?x8x?xf32>` Before this CL, we wold construct: `dim %arg, 0 : memref<?x4x?x8x?xf32>` `dim %arg, 1 : memref<?x4x?x8x?xf32>` `dim %arg, 2 : memref<?x4x?x8x?xf32>` and expect the other dimensions to be constants. This assumption seems consistent at first glance with the syntax of alloc: ``` %tensor = alloc(%M, %N, %O) : memref<?x4x?x8x?xf32> ``` But this was actuallyincorrect. This CL also makes the relevant functions available to EDSCs and removes duplication of the incorrect function. PiperOrigin-RevId: 229622766	2019-03-29 15:24:13 -07:00
Nicolas Vasilache	424041ad58	Add EDSC sugar This allows load, store and ForNest to be used with both Expr and Bindable. This simplifies writing generic pieces of MLIR snippet. For instance, a generic pointwise add can now be written: ```cpp // Different Bindable ivs, one per loop in the loop nest. auto ivs = makeBindables(shapeA.size()); Bindable zero, one; // Same bindable, all equal to `zero`. SmallVector<Bindable, 8> zeros(ivs.size(), zero); // Same bindable, all equal to `one`. SmallVector<Bindable, 8> ones(ivs.size(), one); // clang-format off Bindable A, B, C; Stmt scalarA, scalarB, tmp; Stmt block = edsc::Block({ ForNest(ivs, zeros, shapeA, ones, { scalarA = load(A, ivs), scalarB = load(B, ivs), tmp = scalarA + scalarB, store(tmp, C, ivs) }), }); // clang-format on ``` This CL also adds some extra support for pretty printing that will be used in a future CL when we introduce standalone testing of EDSCs. At the momen twe are lacking the basic infrastructure to write such tests. PiperOrigin-RevId: 229375850	2019-03-29 15:16:53 -07:00
Nicolas Vasilache	d734c50c5f	[MLIR] Clip all access dimensions during LowerVectorTransfers This CL adds a short term remedy to an issue that was found during execution tests. Lowering of vector transfer ops uses the permutation map to determine which ForInst have been super-vectorized. During materialization to HW vector sizes however, some of those dimensions may be fully unrolled and do not appear in the permutation map. Such dimensions were then not clipped and may have accessed out of bounds. This CL conservatively clips all dimensions to ensure no out of bounds access. The longer term solution is still up for debate but will probably require either passing more information between Materialization and lowering, or just merging the 2 passes. PiperOrigin-RevId: 228980787	2019-03-29 15:12:26 -07:00
Nicolas Vasilache	73f5c9c380	[MLIR] Sketch a simple set of EDSCs to declaratively write MLIR This CL introduces a simple set of Embedded Domain-Specific Components (EDSCs) in MLIR components: 1. a `Type` system of shell classes that closely matches the MLIR type system. These types are subdivided into `Bindable` leaf expressions and non-bindable `Expr` expressions; 2. an `MLIREmitter` class whose purpose is to: a. maintain a map of `Bindable` leaf expressions to concrete SSAValue; b. provide helper functionality to specify bindings of `Bindable` classes to SSAValue while verifying comformable types; c. traverse the `Expr` and emit the MLIR. This is used on a concrete example to implement MemRef load/store with clipping in the LowerVectorTransfer pass. More specifically, the following pseudo-C++ code: ```c++ MLFuncBuilder *b = ...; Location location = ...; Bindable zero, one, expr, size; // EDSL expression auto access = select(expr < zero, zero, select(expr < size, expr, size - one)); auto ssaValue = MLIREmitter(b) .bind(zero, ...) .bind(one, ...) .bind(expr, ...) .bind(size, ...) .emit(location, access); ``` is used to emit all the MLIR for a clipped MemRef access. This simple EDSL can easily be extended to more powerful patterns and should serve as the counterpart to pattern matchers (and could potentially be unified once we get enough experience). In the future, most of this code should be TableGen'd but for now it has concrete valuable uses: make MLIR programmable in a declarative fashion. This CL also adds Stmt, proper supporting free functions and rewrites VectorTransferLowering fully using EDSCs. The code for creating the EDSCs emitting a VectorTransferReadOp as loops with clipped loads is: ```c++ Stmt block = Block({ tmpAlloc = alloc(tmpMemRefType), vectorView = vector_type_cast(tmpAlloc, vectorMemRefType), ForNest(ivs, lbs, ubs, steps, { scalarValue = load(scalarMemRef, accessInfo.clippedScalarAccessExprs), store(scalarValue, tmpAlloc, accessInfo.tmpAccessExprs), }), vectorValue = load(vectorView, zero), tmpDealloc = dealloc(tmpAlloc.getLHS())}); emitter.emitStmt(block); ``` where `accessInfo.clippedScalarAccessExprs)` is created with: ```c++ select(i + ii < zero, zero, select(i + ii < N, i + ii, N - one)); ``` The generated MLIR resembles: ```mlir %1 = dim %0, 0 : memref<?x?x?x?xf32> %2 = dim %0, 1 : memref<?x?x?x?xf32> %3 = dim %0, 2 : memref<?x?x?x?xf32> %4 = dim %0, 3 : memref<?x?x?x?xf32> %5 = alloc() : memref<5x4x3xf32> %6 = vector_type_cast %5 : memref<5x4x3xf32>, memref<1xvector<5x4x3xf32>> for %i4 = 0 to 3 { for %i5 = 0 to 4 { for %i6 = 0 to 5 { %7 = affine_apply #map0(%i0, %i4) %8 = cmpi "slt", %7, %c0 : index %9 = affine_apply #map0(%i0, %i4) %10 = cmpi "slt", %9, %1 : index %11 = affine_apply #map0(%i0, %i4) %12 = affine_apply #map1(%1, %c1) %13 = select %10, %11, %12 : index %14 = select %8, %c0, %13 : index %15 = affine_apply #map0(%i3, %i6) %16 = cmpi "slt", %15, %c0 : index %17 = affine_apply #map0(%i3, %i6) %18 = cmpi "slt", %17, %4 : index %19 = affine_apply #map0(%i3, %i6) %20 = affine_apply #map1(%4, %c1) %21 = select %18, %19, %20 : index %22 = select %16, %c0, %21 : index %23 = load %0[%14, %i1, %i2, %22] : memref<?x?x?x?xf32> store %23, %5[%i6, %i5, %i4] : memref<5x4x3xf32> } } } %24 = load %6[%c0] : memref<1xvector<5x4x3xf32>> dealloc %5 : memref<5x4x3xf32> ``` In particular notice that only 3 out of the 4-d accesses are clipped: this corresponds indeed to the number of dimensions in the super-vector. This CL also addresses the cleanups resulting from the review of the prevous CL and performs some refactoring to simplify the abstraction. PiperOrigin-RevId: 227367414	2019-03-29 14:50:23 -07:00
Chris Lattner	dffc589ad2	Extend InstVisitor and Walker to handle arbitrary CFG functions, expand the Function::walk functionality into f->walkInsts/Ops which allows visiting all instructions, not just ops. Eliminate Function::getBody() and Function::getReturn() helpers which crash in CFG functions, and were only kept around as a bridge. This is step 25/n towards merging instructions and statements. PiperOrigin-RevId: 227243966	2019-03-29 14:46:58 -07:00
Chris Lattner	456ad6a8e0	Standardize naming of statements -> instructions, revisting the code base to be consistent and moving the using declarations over. Hopefully this is the last truly massive patch in this refactoring. This is step 21/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227178245	2019-03-29 14:44:30 -07:00
Chris Lattner	315a466aed	Rename BasicBlock and StmtBlock to Block, and make a pass cleaning it up. I did not make an effort to rename all of the 'bb' names in the codebase, since they are still correct and any specific missed once can be fixed up on demand. The last major renaming is Statement -> Instruction, which is why Statement and Stmt still appears in various places. This is step 19/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227163082	2019-03-29 14:43:58 -07:00
Chris Lattner	69d9e990fa	Eliminate the using decls for MLFunction and CFGFunction standardizing on Function. This is step 18/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227139399	2019-03-29 14:43:13 -07:00
Chris Lattner	d798f9bad5	Rename BBArgument -> BlockArgument, Op::getOperation -> Op::getInst(), StmtResult -> InstResult, StmtOperand -> InstOperand, and remove the old names. This is step 17/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227121537	2019-03-29 14:42:40 -07:00
Chris Lattner	5187cfcf03	Merge Operation into OperationInst and standardize nomenclature around OperationInst. This is a big mechanical patch. This is step 16/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227093712	2019-03-29 14:42:23 -07:00
Chris Lattner	4c05f8cac6	Merge CFGFuncBuilder/MLFuncBuilder/FuncBuilder together into a single new FuncBuilder class. Also rename SSAValue.cpp to Value.cpp This is step 12/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227067644	2019-03-29 14:40:22 -07:00
Chris Lattner	3f190312f8	Merge SSAValue, CFGValue, and MLValue together into a single Value class, which is the new base of the SSA value hierarchy. This CL also standardizes all the nomenclature and comments to use 'Value' where appropriate. This also eliminates a large number of cast<MLValue>(x)'s, which is very soothing. This is step 11/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 227064624	2019-03-29 14:40:06 -07:00
Chris Lattner	d613f5ab65	Refactor MLFunction to contain a StmtBlock for its body instead of inheriting from it. This is necessary progress to squaring away the parent relationship that a StmtBlock has with its enclosing if/for/fn, and makes room for functions to have more than one block in the future. This also removes IfClause and ForStmtBody. This is step 5/n towards merging instructions and statements, NFC. PiperOrigin-RevId: 226936541	2019-03-29 14:36:35 -07:00
Chris Lattner	1301f907a1	Refactor ForStmt: having it contain a StmtBlock instead of subclassing StmtBlock. This is more consistent with IfStmt and also conceptually makes more sense - a forstmt "isn't" its body, it contains its body. This is step 1/N towards merging BasicBlock and StmtBlock. This is required because in the new regime StmtBlock will have a use list (just like BasicBlock does) of operands, and ForStmt already has a use list for its induction variable. This is a mechanical patch, NFC. PiperOrigin-RevId: 226684158	2019-03-29 14:35:19 -07:00
Alex Zinenko	4dbd94b543	Refactor LowerVectorTransfersPass using pattern rewriters This introduces a generic lowering pass for ML functions. The pass is parameterized by template arguments defining individual pattern rewriters. Concrete lowering passes define individual pattern rewriters and inherit from the generic class that takes care of allocating rewriters, traversing ML functions and performing the actual rewrite. While this is similar to the greedy pattern rewriter available in Transform/Utils, it requires adjustments due to the ML/CFG duality. In particular, ML function rewriters must be able to create statements, not only operations, and need access to an MLFuncBuilder. When we move to using the unified function type, the ML-specific rewriting will become unnecessary. Use LowerVectorTransfers as a testbed for the generic pass. PiperOrigin-RevId: 225887424	2019-03-29 14:31:43 -07:00
Alex Zinenko	51c8a095a3	Materialize vector_type_cast operation in the SuperVector dialect This operation is produced and used by the super-vectorization passes and has been emitted as an abstract unregistered operation until now. For end-to-end testing purposes, it has to be eventually lowered to LLVM IR. Matching abstract operation by name goes into the opposite direction of the generic lowering approach that is expected to be used for LLVM IR lowering in the future. Register vector_type_cast operation as a part of the SuperVector dialect. Arguably, this operation is a special case of the `view` operation from the Standard dialect. The semantics of `view` is not fully specified at this point so it is safer to rely on a custom operation. Additionally, using a custom operation may help to achieve clear dialect separation. PiperOrigin-RevId: 225887305	2019-03-29 14:31:13 -07:00
Alex Zinenko	bc52a639f9	Extract vector_transfer_* Ops into a SuperVectorDialect. From the beginning, vector_transfer_read and vector_transfer_write opreations were intended as a mid-level vectorization abstraction. In particular, they are lowered to the StandardOps dialect before further processing. As such, it does not make sense to keep them at the same level as StandardOps. Introduce the new SuperVectorOps dialect and move vector_transfer_* operations there. This will be used as a testbed for the generic lowering/legalization pass. PiperOrigin-RevId: 225554492	2019-03-29 14:28:58 -07:00
Nicolas Vasilache	c28aeef901	[MLIR] Drop bug-prone global map indexed by MLFunction* PiperOrigin-RevId: 224610805	2019-03-29 14:23:49 -07:00
Nicolas Vasilache	d9b6420fc9	[MLIR] Add LowerVectorTransfersPass This CL adds a pass that lowers VectorTransferReadOp and VectorTransferWriteOp to a simple loop nest via local buffer allocations. This is an MLIR->MLIR lowering based on builders. A few TODOs are left to address in particular: 1. invert the permutation map so the accesses to the remote memref are coalesced; 2. pad the alloc for bank conflicts in local memory (e.g. GPUs shared_memory); 3. support broadcast / avoid copies when permutation_map is not of full column rank 4. add a proper "element_cast" op One notable limitation is this does not plan on supporting boundary conditions. It should be significantly easier to use pre-baked MLIR functions to handle such paddings. This is left for future consideration. Therefore the current CL only works properly for full-tile cases atm. This CL also adds 2 simple tests: ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N step 4 { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { vector_transfer_write %f1, %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, d1, d0)} : vector<5x4x3xf32>, memref<?x?x?x?xf32, 0>, index, index, index, index ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 step 4 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> store %cst, %2[%c0] : memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i1, %i5) for %i6 = 0 to 3 { %5 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %6 = load %1[%i4, %i5, %i6] : memref<5x4x3xf32> store %6, %0[%5, %4, %i2, %3] : memref<?x?x?x?xf32> dealloc %1 : memref<5x4x3xf32> ``` and ```mlir for %i0 = 0 to %M step 3 { for %i1 = 0 to %N { for %i2 = 0 to %O { for %i3 = 0 to %P step 5 { %f = vector_transfer_read %A, %i0, %i1, %i2, %i3 {permutation_map: (d0, d1, d2, d3) -> (d3, 0, d0)} : (memref<?x?x?x?xf32, 0>, index, index, index, index) -> vector<5x4x3xf32> ``` lowers into: ```mlir for %i0 = 0 to %arg0 step 3 { for %i1 = 0 to %arg1 { for %i2 = 0 to %arg2 { for %i3 = 0 to %arg3 step 5 { %1 = alloc() : memref<5x4x3xf32> %2 = "element_type_cast"(%1) : (memref<5x4x3xf32>) -> memref<1xvector<5x4x3xf32>> for %i4 = 0 to 5 { %3 = affine_apply (d0, d1) -> (d0 + d1) (%i3, %i4) for %i5 = 0 to 4 { for %i6 = 0 to 3 { %4 = affine_apply (d0, d1) -> (d0 + d1) (%i0, %i6) %5 = load %0[%4, %i1, %i2, %3] : memref<?x?x?x?xf32> store %5, %1[%i4, %i5, %i6] : memref<5x4x3xf32> %6 = load %2[%c0] : memref<1xvector<5x4x3xf32>> dealloc %1 : memref<5x4x3xf32> ``` PiperOrigin-RevId: 224552717	2019-03-29 14:23:05 -07:00

31 Commits