Adds `mapa` and `mapa.shared.cluster` MLIR Ops to generate mapa
instructions.
`mapa` - Map the address of the shared variable in the target CTA.
- `mapa` - source is a register containing generic address pointing to
shared memory.
- `mapa.shared.cluster` - source is a shared memory variable or a
register containing a valid shared memory address.
PTX Spec Reference:
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-mapa