Ask HN: How do you use local LLMs productively?

3 points | by virgildotcodes 7 hours ago ago

3 comments

andsoitis 7 hours ago
Lots of conversation on this topic yesterday: https://news.ycombinator.com/item?id=47363754
Cytoplast3528 7 hours ago
I think only Claude Sonnet/Opus, GPT 5.2+, Minimax M2.5 are useful. They are all nearly impossible to self-host, unfortunately.
CamperBob2 6 hours ago
Qwen 3.5 was plagued by some premature quant releases and unclear/incomplete guidelines for the sampling parameters. Especially if you are having looping problems, make sure you are using the very latest model files, executables, and recommended params.
If all the stars are aligned, Qwen 3.5 will not exhibit outright looping, although it will still burn more thinking tokens than some other models. There are ways to tone down the overthinking or disable it entirely, though, and the models are still quite capable when configured that way.