Use 512-bit instructions with subvector insertion/extraction like we do in a number of similar circumstances llvm-svn: 287762