Lossless LLM compression for efficient GPU inference via dynamic-length float

(arxiv.org)

354 points | by CharlesW 17 hours ago ago

107 comments