Post History

Current VersionJun 01, 2026 at 03:17

My go-to model is now consistently Qwen 3.6 35a3b (MOE). It's even more blazingly fast now with MTP, and extremely capable, even at 4 bit quantization. I've even noticed that it's done a great job with knowledge questions - my go-tos are asking very specific questions about paramotoring, and about lesser known frameworks such as jam.py. I have been absolutely amazed at how deep that small MOE's knowledge is, about obscure topics that even the frontier models knew nothing about last year. These sorts of knowledge tests have been performed in LM Studio, with no Internet search tools available to the model.

The very few times I've seen 3.6 MOE need some help completing a coding task, I've watched both varieties of Gemma 4 (26a4 MOE and 31b dense), help the task get completed.

With all these other fast running MTP models, and even the fast Bartoski version of the 122b MOE version of Qwen 3.5, there are lots of performant alternatives, for both AMD and Nvidia, when a task may be helped by connecting to another bigger, more knowledgeable model.

Version 2Jun 01, 2026 at 03:17

My go-to model is now consistently Qwen 3.6 35a3b (MOE). It's even more blazingly fast now with MTP, and extremely capable, even at 4 bit quantization. I've even noticed that it's done a great job with knowledge questions - my go-tos are asking very specific questions about paramotoring, and about lesser known frameworks such as jam.py. I have been absolutely amazed at how deep that small MOE's knowledge is, about obscure topics that even the frontier models knew nothing about last year. These sorts of knowledge tests have been performed in LM Studio, with no Internet search tools available to the model.

The very few times I've seen 3.6 MOE need some help completing a coding task, I've watched both varieties of Gemma 4 (26a4 MOE and 31b dense), help the task get completed.

With all these other fast running MTP models, and even the fast Bartoski version of the 122b MOE version of Qwen 3.5, there are lots of fast alternatives, for both AMD and Nvidia, when a task may be helped by connecting to another bigger, more knowledgeable model.

Version 1May 25, 2026 at 15:51

My go-to model is now consistently Qwen 3.6 35a3b (MOE). It's even more blazingly fast now with MTP, and extremely capable, even at 4 bit quantization. I've even noticed that it's done a great job with knowledge questions - my go-tos are asking very specific questions about paramotoring, and about lesser known frameworks such as jam.py. I have been absolutely amazed at how deep that MOE's knowledge is, about obscure topics that even the frontier models knew nothing about last year.

The very few times I've seen 3.6 MOE need some help completing a coding task, I've watched both varieties of Gemma 4 (26a4 MOE and 31b dense), help the task get completed.

With all these other fast running MTP models, and even the fast Bartoski version of the 122b MOE version of Qwen 3.5, there are lots of fast alternatives, for both AMD and Nvidia, when a task may be helped by connecting to another bigger model.

Previous Versions