Commit Graph

76 Commits

Author SHA1 Message Date
Nicolas Vasilache
b3f7cf80a7 Add a CL option to Standard to LLVM lowering to use alloca instead of malloc/free.
In the future, a more configurable malloc and free interface should be used and exposed via
extra parameters to the `createLowerToLLVMPass`. Until requirements are gathered, a simple CL flag allows generating code that runs successfully on hardware that cannot use the stdlib.

PiperOrigin-RevId: 283833424
2019-12-04 14:16:00 -08:00
Alex Zinenko
993e79e9bd Fix ViewOp to have at most one offset operand
As described in the documentation, ViewOp is expected to take an optional
dynamic offset followed by a list of dynamic sizes. However, the ViewOp parser
did not include a check for the offset being a single value and accepeted a
list of values instead.

Furthermore, several tests have been exercising the wrong syntax of a ViewOp,
passing multiple values to the dyanmic stride list, which was not caught by the
parser. The trailing values could have been erronously interpreted as dynamic
sizes. This is likely due to resyntaxing of the ViewOp, with the previous
syntax taking the list of sizes before the offset. Update the tests to use the
syntax with the offset preceding the sizes.

Worse, the conversion of ViewOp to the LLVM dialect assumed the wrong order of
operands with offset in the trailing position, and erronously relied on the
permissive parsing that interpreted trailing dynamic offset values as leading
dynamic sizes. Fix the lowering to use the correct order of operands.

PiperOrigin-RevId: 283532506
2019-12-03 06:23:04 -08:00
Stephan Herhut
2125c0e3a8 Extend conversion of SubViewOp to llvm to also support cases where size and stride
are constant (i.e., there are no size and stride operands).

We recently added canonicalization that rewrites constant size and stride operands to
SubViewOp into static information in the type, so these patterns now occur during code
generation.

PiperOrigin-RevId: 283524688
2019-12-03 05:11:49 -08:00
Alex Zinenko
fdbb99cd62 Add linkage support to LLVMFuncOp
A recent commit introduced the Linkage attribute to the LLVM dialect and used
it in the Global Op. Also use it in LLVMFuncOp. As per LLVM Language Reference,
if the linkage attribute is omitted, the function is assumed to have external
linkage.

PiperOrigin-RevId: 283493299
2019-12-03 00:26:44 -08:00
Lei Zhang
0d22a3fdc8 NFC: Update std.subview op to use AttrSizedOperandSegments
This turns a few manually written helper methods into auto-generated ones.

PiperOrigin-RevId: 283339617
2019-12-02 07:52:00 -08:00
Mahesh Ravishankar
19212105dd Changes to SubViewOp to make it more amenable to canonicalization.
The current SubViewOp specification allows for either all offsets,
shape and stride to be dynamic or all of them to be static. There are
opportunities for more fine-grained canonicalization based on which of
these are static. For example, if the sizes are static, the result
memref is of static shape. The specification of SubViewOp is modified
to allow on or more of offsets, shapes and strides to be statically
specified. The verification is updated to ensure that the result type
of the subview op is consistent with which of these are static and
which are dynamic.

PiperOrigin-RevId: 281560457
2019-11-20 12:32:51 -08:00
Alex Zinenko
8961d8e32f Change conversion CLI flag from -lower-to-llvm to -convert-std-to-llvm
The command-line flag name `lower-to-llvm` for the pass performing dialect
conversion from the Standard dialect to the LLVM dialect is misleading and
inconsistent with most of the conversion passses. It leads the user to believe
that there are no restrictions on what can be converted, while in fact only a
subset of the Standard dialect can be converted (with operations from other
dialects converted by separate passes). Use `convert-std-to-llvm` that better
reflects what the pass does and is consistent with most other conversions.

PiperOrigin-RevId: 281238797
2019-11-19 00:34:51 -08:00
Alex Zinenko
062dd406b1 ConvertStandardToLLVM: replace assertion with graceful failure
The assertion was introduced in the early days of dialect conversion
infrastructure when we had the matching function separate from the rewriting
function. The infrastructure evolved to have a common matchAndRewrite function
and the separate matching function was dropped without chaning the rewriting
that became matchAndRewrite. This has led to assertion being triggered. Return
a matchFailure instead of failing an assertion on unsupported types.

Closes tensorflow/mlir#230

PiperOrigin-RevId: 281113741
2019-11-18 11:26:24 -08:00
Alex Zinenko
b34a861d5a Make positions of elements in MemRef descriptor private
Previous commits removed all uses of LLVMTypeConverter::k*PosInMemRefDescriptor
outside of the MemRefDescriptor class. These numbers are an implementation
detail and can be hidden under a layer of more semantic APIs.

PiperOrigin-RevId: 280442444
2019-11-14 09:17:38 -08:00
Alex Zinenko
bf5916e7a4 Use MemRefDescriptor in Vector-to-LLVM convresion
Following up on the consolidation of MemRef descriptor conversion, update
Vector-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This also makes the types of
the attributes in emitted llvm.insert/extractelement operations consistently
i64 instead of a mix of index and i64.

PiperOrigin-RevId: 280441451
2019-11-14 09:05:42 -08:00
Alex Zinenko
7c28de4aef Use MemRefDescriptor in Linalg-to-LLVM conversion
Following up on the consolidation of MemRef descriptor conversion, update
Linalg-to-LLVM conversion to use the helper class that abstracts away the
implementation details of the MemRef descriptor. This required MemRefDescriptor
to become publicly visible. Since this conversion is heavily EDSC-based,
introduce locally an additional wrapper that uses builder and location pointed
to by the EDSC context while emitting descriptor manipulation operations.

PiperOrigin-RevId: 280429228
2019-11-14 08:04:10 -08:00
Alex Zinenko
ee5c2256ef Concentrate memref descriptor manipulation logic in one place
Memref descriptor is becoming increasingly complex. Memrefs are manipulated by
multiple standard instructions, each of which has a non-trivial lowering to the
LLVM dialect. This leads to verbose code that manipulates the descriptors
exposing the internals of insert/extractelement opreations. Implement a wrapper
class that contains a memref descriptor and provides semantically named methods
that build the primitive IR operations instead.

PiperOrigin-RevId: 280371225
2019-11-14 00:49:12 -08:00
Nicolas Vasilache
51de3f688e Add LLVM lowering of std.subview
A followup CL will replace usage of linalg.subview by std.subview.

PiperOrigin-RevId: 279961981
2019-11-12 07:23:18 -08:00
Nicolas Vasilache
f51a155337 Add support for alignment attribute in std.alloc.
This CL adds an extra pointer to the memref descriptor to allow specifying alignment.

In a previous implementation, we used 2 types: `linalg.buffer` and `view` where the buffer type was the unit of allocation/deallocation/alignment and `view` was the unit of indexing.

After multiple discussions it was decided to use a single type, which conflates both, so the memref descriptor now needs to carry both pointers.

This is consistent with the [RFC-Proposed Changes to MemRef and Tensor MLIR Types](https://groups.google.com/a/tensorflow.org/forum/#!searchin/mlir/std.view%7Csort:date/mlir/-wKHANzDNTg/4K6nUAp8AAAJ).

PiperOrigin-RevId: 279959463
2019-11-12 07:06:54 -08:00
Nicolas Vasilache
72040bf7c8 Update Linalg to use std.view
Now that a view op has graduated to the std dialect, we can update Linalg to use it and remove ops that have become obsolete. As a byproduct, the linalg buffer and associated ops can also disappear.

PiperOrigin-RevId: 279073591
2019-11-07 06:33:10 -08:00
Nicolas Vasilache
7f6c6084b5 Add lowering of std.view to LLVM
This CL ports the lowering of linalg.view to the newly introduced std.view.
Differences in implementation relate to std.view having slightly different semantics:
1. a static or dynamic offset can be specified.
2. the size of the (contiguous) shape is passed instead of a range.
3. static size and stride information is extracted from the memref type rather than the range.

Besides these differences, lowering behaves the same.
A future CL will update Linalg to use this unified infrastructure.

PiperOrigin-RevId: 278948853
2019-11-06 15:06:16 -08:00
Nicolas Vasilache
2823b68580 Implement lowering of VectorTypeCastOp to LLVM
A VectorTypeCastOp can only be used to lower between statically sized contiguous memrefs of scalar and matching vector type. The sizes and strides are thus fully static and easy to determine.

A relevant test is added.

This is a step towards solving tensorflow/mlir#189.

PiperOrigin-RevId: 275538981
2019-10-18 14:00:06 -07:00
River Riddle
dfe09cc621 Add support for PatternRewriter::eraseOp.
This hook is useful when an operation is known to be dead, and no replacement values make sense.

PiperOrigin-RevId: 275052756
2019-10-16 09:50:57 -07:00
Nicolas Vasilache
abf5c60af9 Add conversion for splat of vectors of 2+D
This CL adds a missing lowering for splat of multi-dimensional vectors.
Additional support is also added to the runtime utils library to allow printing memrefs with such vectors.

PiperOrigin-RevId: 274794723
2019-10-15 06:53:08 -07:00
Alex Zinenko
8c2ea32072 Emit LLVM IR equivalent of sizeof when lowering alloc operations
Originally, the lowering of `alloc` operations has been computing the number of
bytes to allocate when lowering based on the properties of MLIR type. This does
not take into account type legalization that happens when compiling LLVM IR
down to target assembly. This legalization can widen the type, potentially
leading to out-of-bounds accesses to `alloc`ed data due to mismatches between
address computation that takes the widening into account and allocation that
does not. Use the LLVM IR's equivalent of `sizeof` to compute the number of
bytes to be allocated:
  %0 = getelementptr %type* null, %indexType 0
  %1 = ptrtoint %type* %0 to %indexType
adapted from
http://nondot.org/sabre/LLVMNotes/SizeOf-OffsetOf-VariableSizedStructs.txt

PiperOrigin-RevId: 274159900
2019-10-11 06:33:26 -07:00
Uday Bondhugula
47596f5345 Drop obsolete code from std to llvm memref lowering
- dropping what looks like outdated code post some of the previous
  updates

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Closes tensorflow/mlir#179

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/179 from bondhugula:llfix 2a72ea441fe1b3924802273ffbe9870afeb90f91
PiperOrigin-RevId: 274158273
2019-10-11 06:31:18 -07:00
Alexander Belyaev
7301ac72bc Rename LLVM::exp and LLVM::fmuladd to LLVM::ExpOP and LLVM::FMulAddOp.
PiperOrigin-RevId: 274154655
2019-10-11 05:38:37 -07:00
Alexander Belyaev
00d2a37e32 Add unary ops and ExpOp to Standard Dialect.
PiperOrigin-RevId: 274152154
2019-10-11 05:13:55 -07:00
Alex Zinenko
08a2ce8a14 Standard-to-LLVM conversion: check that operands have LLVM types
In Standard to LLVM dialect conversion, the binary op conversion pattern
implicitly assumed some operands were of LLVM IR dialect type. This is not
necessarily true, for example if the Ops that produce those operands did not
match the existing convresion patterns. Check if all operands are of LLVM IR
dialect type and if not, fail to patch the binary op pattern.

Closes tensorflow/mlir#168

PiperOrigin-RevId: 274063207
2019-10-10 17:19:57 -07:00
Alex Zinenko
5e7959a353 Use llvm.func to define functions with wrapped LLVM IR function type
This function-like operation allows one to define functions that have wrapped
LLVM IR function type, in particular variadic functions. The operation was
added in parallel to the existing lowering flow, this commit only switches the
flow to use it.

Using a custom function type makes the LLVM IR dialect type system more
consistent and avoids complex conversion rules for functions that previously
had to use the built-in function type instead of a wrapped LLVM IR dialect type
and perform conversions during the analysis.

PiperOrigin-RevId: 273910855
2019-10-10 01:34:06 -07:00
Nicolas Vasilache
754ea72794 Replace constexpr MemRefType::kDynamicStrideOrOffset by a MemRefType:;getDynamicStrideOrOffset() method - NFC
This fixes global ODR-use issues, some of which manifest in Parser.cpp.

Fixes tensorflow/mlir#167.

PiperOrigin-RevId: 272886347
2019-10-04 08:58:09 -07:00
Christian Sigg
85dcaf19c7 Fix typos, NFC.
PiperOrigin-RevId: 272851237
2019-10-04 04:37:53 -07:00
MLIR Team
0dfa7fc908 Add fpext and fptrunc to the Standard dialect and includes conversion to LLVM
PiperOrigin-RevId: 272768027
2019-10-03 16:37:24 -07:00
Alex Zinenko
bd4762502c Add parentheses around boolean operators in assert
This removes a warning and is generally a good practice.

PiperOrigin-RevId: 272613597
2019-10-03 01:47:14 -07:00
Alex Zinenko
e0d78eac23 NFC: rename Conversion/ControlFlowToCFG to Conversion/LoopToStandard
This makes the name of the conversion pass more consistent with the naming
scheme, since it actually converts from the Loop dialect to the Standard
dialect rather than working with arbitrary control flow operations.

PiperOrigin-RevId: 272612112
2019-10-03 01:35:03 -07:00
Nicolas Vasilache
9604bb6269 Extract MemRefType::getStridesAndOffset as a free function and fix dynamic offset determination.
This also adds coverage with a missing test, which uncovered a bug in the conditional for testing whether an offset is dynamic or not.

PiperOrigin-RevId: 272505798
2019-10-02 13:25:05 -07:00
Alex Zinenko
c760f233b3 Fix and simplify CallOp/CallIndirectOp to LLVM::CallOp conversion
A recent ABI compatibility change affected the conversion from standard
CallOp/CallIndirectOp to LLVM::CallOp by changing its signature. In order to
analyze the signature, the code was looking up the callee symbol in the module.
This is incorrect since, during the conversion, the module may contain both the
original and the converted function op that have the same symbol name. There is
no strict guarantee on which of the two symbols will be found by the lookup.
The conversion was not failing because the type legalizer converts the LLVM
types to themselves making the original and the converted function signatures
ultimately produce the same type.

Instead of looking up the function signature to get the list of result types,
use the types of the CallOp/CallIndirectOp results which must match those of
the function in valid IR. These types are guaranteed to be the original,
unconverted types when converting the operation. Furthermore, this avoids the
need to perform a lookup of a symbol name in the module which may be expensive.

Finally, propagate attributes as-is from the original op to the converted op
since they share the attribute name for the callee of direct calls and the rest
of attributes are not affected by the conversion. This removes the need for
additional contorsions between direct and indirect calls to extract the name of
the optional callee attribute only to insert it back. This also prevents the
conversion from unintentionally dropping the other attributes of the op.

PiperOrigin-RevId: 272218871
2019-10-01 08:41:50 -07:00
Nicolas Vasilache
e36337a998 Unify Linalg types by using strided memrefs
This CL finishes the implementation of the Linalg + Affine type unification of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
As a consequence, the !linalg.view type, linalg::DimOp, linalg::LoadOp and linalg::StoreOp can now disappear and Linalg can use standard types everywhere.

PiperOrigin-RevId: 272187165
2019-10-01 05:23:21 -07:00
Nicolas Vasilache
923b33ea16 Normalize MemRefType lowering to LLVM as strided MemRef descriptor
This CL finishes the implementation of the lowering part of the [strided memref RFC](https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).

Strided memrefs correspond conceptually to the following templated C++ struct:
```
template <typename Elem, size_t Rank>
struct {
  Elem *ptr;
  int64_t offset;
  int64_t sizes[Rank];
  int64_t strides[Rank];
};
```
The linearization procedure for address calculation for strided memrefs is the same as for linalg views:
`base_offset + SUM_i index_i * stride_i`.

The following CL will unify Linalg and Standard by removing !linalg.view in favor of strided memrefs.

PiperOrigin-RevId: 272033399
2019-09-30 11:58:54 -07:00
Nicolas Vasilache
bc4984e4f7 Add TODO to revisit coupling of CallOp to MemRefType lowering
PiperOrigin-RevId: 271619132
2019-09-27 12:03:00 -07:00
Nicolas Vasilache
ddf737c5da Promote MemRefDescriptor to a pointer to struct when passing function boundaries in LLVMLowering.
The strided MemRef RFC discusses a normalized descriptor and interaction with library calls (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).
Lowering of nested LLVM structs as value types does not play nicely with externally compiled C/C++ functions due to ABI issues.
Solving the ABI problem generally is a very complex problem and most likely involves taking
a dependence on clang that we do not want atm.

A simple workaround is to pass pointers to memref descriptors at function boundaries, which this CL implement.

PiperOrigin-RevId: 271591708
2019-09-27 09:57:36 -07:00
Uday Bondhugula
458ede8775 Introduce splat op + provide its LLVM lowering
- introduce splat op in standard dialect (currently for int/float/index input
  type, output type can be vector or statically shaped tensor)
- implement LLVM lowering (when result type is 1-d vector)
- add constant folding hook for it
- while on Ops.cpp, fix some stale names

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Closes tensorflow/mlir#141

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/141 from bondhugula:splat 48976a6aa0a75be6d91187db6418de989e03eb51
PiperOrigin-RevId: 270965304
2019-09-24 12:44:58 -07:00
Nicolas Vasilache
42d8fa667b Normalize lowering of MemRef types
The RFC for unifying Linalg and Affine compilation passes into an end-to-end flow with a predictable ABI and linkage to external function calls raised the question of why we have variable sized descriptors for memrefs depending on whether they have static or dynamic dimensions  (https://groups.google.com/a/tensorflow.org/forum/#!topic/mlir/MaL8m2nXuio).

This CL standardizes the ABI on the rank of the memrefs.
The LLVM struct for a memref becomes equivalent to:
```
template <typename Elem, size_t Rank>
struct {
  Elem *ptr;
  int64_t sizes[Rank];
};
```

PiperOrigin-RevId: 270947276
2019-09-24 11:21:49 -07:00
Manuel Freiberger
2c11997d48 Add integer sign- and zero-extension and truncation to standard.
This adds sign- and zero-extension and truncation of integer types to the
standard dialects. This allows to perform integer type conversions without
having to go to the LLVM dialect and introduce custom type casts (between
standard and LLVM integer types).

Closes tensorflow/mlir#134

COPYBARA_INTEGRATE_REVIEW=https://github.com/tensorflow/mlir/pull/134 from ombre5733:sext-zext-trunc-in-std c7657bc84c0ca66b304e53ec03797e09152e4d31
PiperOrigin-RevId: 270479722
2019-09-21 16:14:56 -07:00
River Riddle
f1b100c77b NFC: Finish replacing FunctionPassBase/ModulePassBase with OpPassBase.
These directives were temporary during the generalization of FunctionPass/ModulePass to OpPass.

PiperOrigin-RevId: 268970259
2019-09-13 13:34:27 -07:00
MLIR Team
36508528c7 Overload LLVM::TerminatorOp::build() for empty operands list.
PiperOrigin-RevId: 268041584
2019-09-09 11:39:03 -07:00
MLIR Team
b5652720c1 Retain address space during MLIR > LLVM conversion.
PiperOrigin-RevId: 267206460
2019-09-04 12:26:52 -07:00
Mehdi Amini
765d60fd4d Add missing lowering to CFG in mlir-cpu-runner + related cleanup
- the list of passes run by mlir-cpu-runner included -lower-affine and
  -lower-to-llvm but was missing -lower-to-cfg (because -lower-affine at
  some point used to lower straight to CFG); add -lower-to-cfg in
  between. IR with affine ops can now be run by mlir-cpu-runner.

- update -lower-to-cfg to be consistent with other passes (create*Pass methods
  were changed to return unique ptrs, but -lower-to-cfg appears to have been
  missed).

- mlir-cpu-runner was unable to parse custom form of affine op's - fix
  link options

- drop unnecessary run options from test/mlir-cpu-runner/simple.mlir
  (none of the test cases had loops)

- -convert-to-llvmir was changed to -lower-to-llvm at some point, but the
  create pass method name wasn't updated (this pass converts/lowers to LLVM
  dialect as opposed to LLVM IR). Fix this.

(If we prefer "convert", the cmd-line options could be changed to
"-convert-to-llvm/cfg" then.)

Signed-off-by: Uday Bondhugula <uday@polymagelabs.com>

Closes tensorflow/mlir#115

PiperOrigin-RevId: 266666909
2019-09-01 11:33:22 -07:00
Nicolas Vasilache
fa592908af Let LLVMOpLowering specify a PatternBenefit - NFC
Currently the benefit is always set to 1 which limits the ability to do A->B->C lowering

PiperOrigin-RevId: 264854146
2019-08-22 09:38:42 -07:00
Nicolas Vasilache
f55ac5c076 Add support for LLVM lowering of binary ops on n-D vector types
This CL allows binary operations on n-D vector types to be lowered to LLVMIR by performing an (n-1)-D extractvalue, 1-D vector operation and an (n-1)-D insertvalue.

PiperOrigin-RevId: 264339118
2019-08-20 02:00:22 -07:00
River Riddle
ba0fa92524 NFC: Move LLVMIR, SDBM, and StandardOps to the Dialect/ directory.
PiperOrigin-RevId: 264193915
2019-08-19 11:01:25 -07:00
Nicolas Vasilache
9bf69e6a2e Refactor linalg lowering to LLVM
The linalg.view type used to be lowered to a struct containing a data pointer, offset, sizes/strides information. This was problematic when passing to external functions due to ABI, struct padding and alignment issues.

The linalg.view type is now lowered to LLVMIR as a *pointer* to a struct containing the data pointer, offset and sizes/strides. This simplifies the interfacing with external library functions and makes it trivial to add new functions without creating a shim that would go from a value type struct to a pointer type.

The consequences are that:
1. lowering explicitly uses llvm.alloca in lieu of llvm.undef and performs the proper llvm.load/llvm.store where relevant.
2. the shim creation function `getLLVMLibraryCallDefinition` disappears.
3. views are passed by pointer, scalars are passed by value. In the future, other structs will be passed by pointer (on a per-need basis).

PiperOrigin-RevId: 264183671
2019-08-19 10:21:40 -07:00
Jacques Pienaar
79f53b0cf1 Change from llvm::make_unique to std::make_unique
Switch to C++14 standard method as llvm::make_unique has been removed (
https://reviews.llvm.org/D66259). Also mark some targets as c++14 to ease next
integrates.

PiperOrigin-RevId: 263953918
2019-08-17 11:06:03 -07:00
Mehdi Amini
926fb685de Express ownership transfer in PassManager API through std::unique_ptr (NFC)
Since raw pointers are always passed around for IR construct without
implying any ownership transfer, it can be error prone to have implicit
ownership transferred the same way.
For example this code can seem harmless:

  Pass *pass = ....
  pm.addPass(pass);
  pm.addPass(pass);
  pm.run(module);

PiperOrigin-RevId: 263053082
2019-08-12 19:13:12 -07:00
Nicolas Vasilache
252ada4932 Add lowering of vector dialect to LLVM dialect.
This CL is step 3/n towards building a simple, programmable and portable vector abstraction in MLIR that can go all the way down to generating assembly vector code via LLVM's opt and llc tools.

This CL adds support for converting MLIR n-D vector types to (n-1)-D arrays of 1-D LLVM vectors and a conversion VectorToLLVM that lowers the `vector.extractelement` and `vector.outerproduct` instructions to the proper mix of `llvm.vectorshuffle`, `llvm.extractelement` and `llvm.mulf`.

This has been independently verified to produce proper avx2 code.

Input:
```
func @vec_1d(%arg0: vector<4xf32>, %arg1: vector<8xf32>) -> vector<8xf32> {
  %2 = vector.outerproduct %arg0, %arg1 : vector<4xf32>, vector<8xf32>
  %3 = vector.extractelement %2[0 : i32]: vector<4x8xf32>
  return %3 : vector<8xf32>
}
```

Command:
```
mlir-opt vector-to-llvm.mlir -vector-lower-to-llvm-dialect --disable-pass-threading | mlir-opt -lower-to-cfg -lower-to-llvm | mlir-translate --mlir-to-llvmir | opt -O3 | llc -O3 -march=x86-64 -mcpu=haswell -mattr=fma,avx2
```

Output:
```
vec_1d:                                 # @vec_1d
# %bb.0:
        vbroadcastss    %xmm0, %ymm0
        vmulps  %ymm1, %ymm0, %ymm0
        retq
```
PiperOrigin-RevId: 262895929
2019-08-12 04:08:57 -07:00