Currently module asm ends up emitted twice and at the wrong place in the PTX.
This patch moves module asm generation into emitStartOfAsmFile() which puts at
the correct location in the generated PTX.
Differential Revision: https://reviews.llvm.org/D82280