Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openrouter, GLM5.1 gave me this unusable result:
http://1y1z.com:8284/3ddriving-glm51.html
Deepseek-4-pro produced this laughably bad piece of junk:
http://1y1z.com:8284/3ddriving-deepseek4pro.html
Grok 4.3, Ling-2.6-1t (a model with 1 trillion parameters), and Hy3, each made something at least kinda-sorta playable, but extremely basic:
- http://1y1z.com:8284/3ddriving-grok43.html
- http://1y1z.com:8284/3d-driving-game--ling26.html
- http://1y1z.com:8284/3ddriving-hy3/3ddriving-hy3.html
Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.
Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:
http://1y1z.com:8284/3d-driving-qwen36-27b.html
That example cost $.05 to create via Openrouter, and was not only the best first-shot code result, it was better than every other final result, even after some iteration with other models.
I just keep seeing surprisingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a many Youtubers, and among my friends who are testing LLMs.
Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, just by casually looking at visual applications.
I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks, when compared to highly visual demos. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.
You can see some business app CRUD examples by Qwen 3.6 scattered around my demo links on this site, such as:
- http://1y1z.com:3929 A full CRUD database Northwind demo
- http://1y1z.com:5938 A little invoicing & payroll app
As well as many quick demos oriented towards useful application building, such as:
- http://1y1z.com:5994 A little public forum
- https://com-pute.com/nick/ui_controls_qwen36-35a3_3080_16Gb.html A simple UI control demo
I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs.
16GB VRAM is the smallest I've tried with 35a3b - it works well even with q4 compression on that class of small consumer grade hardware. This model takes on deeper challenges, tends to understand more of the scope of a goal, and its output is generally more creative, stylish, and reliable than that of any similarly sized model (and often much better than far larger models).
The takeaway is that you can actually accomplish useful development goals with Qwen 3.6. It seems to be by far the most practical, genuinely effective, and impressive model yet, for small GPUs.
I hope this is a sign of even better quality and performance to come in smaller models this year!