Summary: Currently we dispatch the writing mode off of a runtime enum passed in by the constructor. This causes very unfortunate codegen for the GPU targets where we get worst-case codegen because of the unused function pointer for `sprintf`. Instead, this patch moves all of this to a template so it can be masked out. This results in no dynamic stack and uses 60 VGPRs instead of 117. It also compiles about 5x as fast.
15 KiB
15 KiB