This patch updates the cp.async.bulk.{commit/wait}_group Ops to use NVVM
intrinsics.
* Doc updated for the commit_group Op.
* Tests are added to verify the lowering to the intrinsics.
While we are there, fix the FileCheck directive on the
'nvvm.setmaxregister' test.
Signed-off-by: Durgadoss R <durgadossr@nvidia.com>