Qwen 3.5 was plagued by some premature quant releases and unclear/incomplete guidelines for the sampling parameters. Especially if you are having looping problems, make sure you are using the very latest model files, executables, and recommended params.
If all the stars are aligned, Qwen 3.5 will not exhibit outright looping, although it will still burn more thinking tokens than some other models. There are ways to tone down the overthinking or disable it entirely, though, and the models are still quite capable when configured that way.
Lots of conversation on this topic yesterday: https://news.ycombinator.com/item?id=47363754
I think only Claude Sonnet/Opus, GPT 5.2+, Minimax M2.5 are useful. They are all nearly impossible to self-host, unfortunately.
Qwen 3.5 was plagued by some premature quant releases and unclear/incomplete guidelines for the sampling parameters. Especially if you are having looping problems, make sure you are using the very latest model files, executables, and recommended params.
If all the stars are aligned, Qwen 3.5 will not exhibit outright looping, although it will still burn more thinking tokens than some other models. There are ways to tone down the overthinking or disable it entirely, though, and the models are still quite capable when configured that way.