I've been interested in ZML for a long time: how does it work, what makes it good? To understand the stack properly, so I built a minimal version of ZML in Rust. And wrote a blog about it. It's essentially a trace-based tensor compiler for Rust. Build computation graphs with a familiar tensor API, lower them to StableHLO MLIR, and execute through PJRT on CPU.
I've been interested in ZML for a long time: how does it work, what makes it good? To understand the stack properly, so I built a minimal version of ZML in Rust. And wrote a blog about it. It's essentially a trace-based tensor compiler for Rust. Build computation graphs with a familiar tensor API, lower them to StableHLO MLIR, and execute through PJRT on CPU.
It even runs SmolLM2 on CPU, 5 tok/s.