Everything today is locked behind CoreML and Accelerate and those are poor targets for modern compiler-based approaches :( On the Vulkan side there's been rumblings of a vendor-agnostic extension for cooperative matrix and support from major vendors - at which point I'm hoping that leads to Apple wanting to show off their own HW features. RE cooperative matrix: they have functional units (AMX/ANE) that could hopefully be exposed in MSL via something shaped like cooperative matrix and I'm pretty sure it'd be fantastic. I'm excited to see more portable use of compute shaders, I think to a large extent it's the future. Is that something that might be useful in a machine learning context?ĭo you support operations like prefix sums? If so, are these using the decoupled look-back algorithm, or multiple dispatches? There are certain features, such as access to "memoryless" buffers, that may be difficult to access through MoltenVK, as I believe the concept doesn't really exist in Vulkan yet. Conversely, are cooperative matrices used in the A100 comparison?Īre you using MoltenVK to run the dispatches? I've had ok results with this but ultimately built my own GPU API abstraction layer, so my code is calling into Metal. Hopefully a vendor-neutral standard will emerge before long. Of course, on the Metal side "half" is fine.ĭoes the M1 silicon have anything like cooperative matrices? This is a huge performance bump, but currently requires fiddly vendor-specific extensions. What is the size of the matrix elements? My understanding is that float32 is super-easy to represent in SPIR-V, and that float16 is possible but the path is not as smoothly paved.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |