23 points | by mikepapadim 2 days ago ago
5 comments
https://github.com/beehive-lab/GPULlama3.java
Does it support flash attention? Use tensor cores? Can I write custom kernels?
UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.
Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393
https://github.com/beehive-lab/TornadoVM/pull/732 https://github.com/beehive-lab/TornadoVM/pull/313
https://github.com/beehive-lab/GPULlama3.java
Does it support flash attention? Use tensor cores? Can I write custom kernels?
UPD. found no evidence that it supports tensor cores, so it's going to be many times slower than implementations that do.
Yes, when you use the PTX backend it supports Tensor Cores.It has also implementation for flash attention. You can also write your own kernels, have a look here: https://github.com/beehive-lab/GPULlama3.java/blob/main/src/... https://github.com/beehive-lab/GPULlama3.java/blob/main/src/...
TornadoVM GitHub has no mentions of tensor cores or WMMA instructions. The only mention of tensor cores is in 2024 and states they are not used: https://github.com/beehive-lab/TornadoVM/discussions/393
https://github.com/beehive-lab/TornadoVM/pull/732 https://github.com/beehive-lab/TornadoVM/pull/313