The current implementation supports only sharding of tensor axes that have size divisible by the mesh axis size.
4.2 KiB
4.2 KiB
The current implementation supports only sharding of tensor axes that have size divisible by the mesh axis size.