I have spent the last few months diving deep into SIMD and x86-64 assembly to see if I could push JSON parsing throughput to the absolute limits of modern hardware. The result is Tachyon (v0.7.2), a solo project built around a dual-engine architecture for AVX2 and AVX-512.
Key technical highlights:
Vectorized Depth Skipping: Uses structural bitmasks to skip over nested objects and arrays at memory-bus speeds.
Direct-Key-Jump (Apex Mode): Maps JSON fields directly to C++ structures, avoiding DOM materialization overhead.
AVX-512 "God Mode": Leverages 64-byte registers and k-masking for branchless filtering.
In my benchmarks, Tachyon consistently outperforms standard parsers on AVX2 (reaching ~6 GB/s on Canada.json). On AVX-512, it enters extreme parity with simdjson, trading blows at around 10-11 GB/s. On large 256MB datasets, they are within 1-2% of each other, essentially limited by memory bandwidth and CPU jitter.
It is still a very fresh project, so I expect there might be some bugs in edge cases. I am currently looking for feedback on the skipping logic and the proprietary license model (which I’ve kept free for anyone with <$1M revenue).
I would be happy to answer any technical questions about the SIMD kernels or the optimization techniques used.
Hi HN,
I have spent the last few months diving deep into SIMD and x86-64 assembly to see if I could push JSON parsing throughput to the absolute limits of modern hardware. The result is Tachyon (v0.7.2), a solo project built around a dual-engine architecture for AVX2 and AVX-512.
Key technical highlights:
Vectorized Depth Skipping: Uses structural bitmasks to skip over nested objects and arrays at memory-bus speeds.
Direct-Key-Jump (Apex Mode): Maps JSON fields directly to C++ structures, avoiding DOM materialization overhead.
AVX-512 "God Mode": Leverages 64-byte registers and k-masking for branchless filtering.
In my benchmarks, Tachyon consistently outperforms standard parsers on AVX2 (reaching ~6 GB/s on Canada.json). On AVX-512, it enters extreme parity with simdjson, trading blows at around 10-11 GB/s. On large 256MB datasets, they are within 1-2% of each other, essentially limited by memory bandwidth and CPU jitter.
It is still a very fresh project, so I expect there might be some bugs in edge cases. I am currently looking for feedback on the skipping logic and the proprietary license model (which I’ve kept free for anyone with <$1M revenue).
I would be happy to answer any technical questions about the SIMD kernels or the optimization techniques used.