Parameter-free KV cache compression for memory-efficient long-context LLMs

(arxiv.org)

65 points | by PaulHoule 5 days ago ago

19 comments