Post History

Current version by Nick Antonaccio

Current VersionMay 11, 2026 at 16:31

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openrouter, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Deepseek-4-pro produced this laughably bad piece of junk:

http://1y1z.com:8284/3ddriving-deepseek4pro.html

Grok 4.3, Ling-2.6-1t (a model with 1 trillion parameters), and Hy3, each made something at least kinda-sorta playable, but extremely basic:

Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.

Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That example cost $.05 to create via Openrouter, and was not only the best first-shot code result, it was better than every other final result, even after some iteration with other models.

I just keep seeing surprisingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a many Youtubers, and among my friends who are testing LLMs.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, just by casually looking at visual applications.

I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks, when compared to highly visual demos. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.

You can see some business app CRUD examples by Qwen 3.6 scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs.

16GB VRAM is the smallest I've tried with 35a3b - it works well even with q4 compression on that class of small consumer grade hardware. This model takes on deeper challenges, tends to understand more of the scope of a goal, and its output is generally more creative, stylish, and reliable than that of any similarly sized model (and often much better than far larger models).

The takeaway is that you can actually accomplish useful development goals with Qwen 3.6. It seems to be by far the most practical, genuinely effective, and impressive model yet, for small GPUs.

I hope this is a sign of even better quality and performance to come in smaller models this year!

Previous Versions
Version 6May 11, 2026 at 16:31

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openrouter, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Deepseek-4-pro produced this laughably bad piece of junk:

http://1y1z.com:8284/3ddriving-deepseek4pro.html

Grok 4.3, Ling-2.6-1t (a model with 1 trillion parameters), and Hy3, each made something at least kinda-sorta playable, but extremely basic:

Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.

Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That example cost $.05 to create via Openrouter, and was not only the best first-shot code result, it was better than every other final result, even after some iteration with other models.

I just keep seeing surprisingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a many Youtubers, and among my friends who are testing LLMs.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, just by casually looking at visual applications.

I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks, when compared to highly visual demos. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.

You can see some business app CRUD examples by Qwen 3.6 scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs - 16GB VRAM is the smallest I've tried, and 35a3b works well even with q4 compression on that class of small consumer grade hardware.

The takeaway is that you can actually accomplish useful development goals with Qwen 3.6. It seems to be by far the most practical, genuinely effective, and impressive model yet, for small GPUs.

I hope this is a sign of even better quality and performance to come in smaller models this year!

Version 5May 11, 2026 at 16:05

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openrouter, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Deepseek-4-pro produced this laughably bad piece of junk:

http://1y1z.com:8284/3ddriving-deepseek4pro.html

Grok 4.3, Ling-2.6-1t (a model with 1 trillion parameters), and Hy3, each made something at least kinda-sorta playable, but extremely basic:

http://1y1z.com:8284/3ddriving-grok43.html http://1y1z.com:8284/3d-driving-game--ling26.html http://1y1z.com:8284/3ddriving-hy3/3ddriving-hy3.html

Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.

Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That example cost $.05 to create via Openrouter, and was not only the best first-shot code result, it was better than every other final result, even after some iteration with other models.

I just keep seeing surprisingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a many Youtubers, and among my friends who are testing LLMs.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, just by casually looking at visual applications.

I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks, when compared to highly visual demos. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.

You can see some business app CRUD examples by Qwen 3.6 scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs - 16GB VRAM is the smallest I've tried, and 35a3b works well even with q4 compression on that class of small consumer grade hardware.

The takeaway is that you can actually accomplish useful development goals with Qwen 3.6. It seems to be by far the most practical, genuinely effective, and impressive model yet, for small GPUs.

I hope this is a sign of even better quality and performance to come in smaller models this year!

Version 4May 11, 2026 at 15:59

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openrouter, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Deepseek-4-pro produced this laughably bad piece of junk:

http://1y1z.com:8284/3ddriving-deepseek4pro.html

Ling-2.6-1t. a model with 1 trillion parameters, made something at least kinda sorta playable, but extremely basic:

http://1y1z.com:8284/3d-driving-game--ling26.html

Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.

Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That example cost $.05 to create via Openrouter, and was not only the best first-shot code result, it was better than every other final results, even after some iteration with other models.

I just keep seeing surprisingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a many Youtubers, and among my friends who are testing LLMs.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, just by casually looking at visual applications.

I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks, when compared to highly visual demos. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.

You can see some business app CRUD examples by Qwen 3.6 scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs - 16GB VRAM is the smallest I've tried, and 35a3b works well even with q4 compression on that class of small consumer grade hardware.

The takeaway is that you can actually accomplish useful development goals with Qwen 3.6. It seems to be by far the best model yet, for small GPUs.

I hope this is a sign of even better quality and performance to come in smaller models this year!

Version 3May 11, 2026 at 15:29

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openrouter, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Deepseek-4-pro produced this laughably bad piece of junk:

http://1y1z.com:8284/3ddriving-deepseek4pro.html

Ling-2.6-1t. a model with 1 trillion parameters, made something at least kinda sorta playable, but extremely basic:

http://1y1z.com:8284/3d-driving-game--ling26.html

Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.

Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That example cost $.05 to create via Openrouter, and was not only the best first-shot code result, it was better than every other final results, even after some iteration with other models.

I just keep seeing surprisingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a many Youtubers, and among my friends who are testing LLMs.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, just by casually looking at visual applications.

I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks, when compared to highly visual demos. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.

You can see some business app CRUD examples by Qwen 3.6 scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs - 16GB VRAM is the smallest I've tried, and 35a3b works well even with q4 compression on that class of small consumer grade hardware.

I hope this is a sign of even better quality and performance to come in smaller models this year!

Version 2May 11, 2026 at 15:28

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openroute, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Deepseek-4-pro produced this laughably bad piece of garbage:

http://1y1z.com:8284/3ddriving-deepseek4pro.html

Ling-2.6-1t (that's a model with 1 trillion parameters) made something at least kinda sorta playable, but extremely basic:

http://1y1z.com:8284/3d-driving-game--ling26.html

Most of the other models produced a mix of various garbage results that were totally unusable, even after a few iteration attempts.

Then Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That example cost $.05 to produce via Openrouter, and was not only the best first-shot code result, it was better than every other final result, even after some iteration with other models.

I just keep seeing suprizingly high quality results come from the Qwen 3.6 models, not just in my own work, but in videos by a large number of Youtubers, and among my friends who are testing LLMs.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy for onlookers to immediately understand the successes and failures of code an LLM has provided, when casually looking at visual applications.

I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for business apps too.

You can see some business app CRUD examples scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense model often goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good for the speed, especially given that it can run on such small GPUs - 16GB VRAM is the lowest I've tried, and it works well even on q4 compression on that class of small consumer grade hardware.

I hope this is a sign of even better quality and performance to come in smaller models this year!

Version 1May 11, 2026 at 15:22

Qwen 3.6 27b punches so far above it's weights, it's just unreal. This morning I quickly vibe-coded a pile of demo 3D driving games, using a wide variety of the best and biggest models (deepseek-4-pro, glm5.1, grok4.3, mimo2.5pro, and more), and most produced absolutely terrible results. For example, after $2.94 in tokens on Openroute, GLM5.1 gave me this unusable result:

http://1y1z.com:8284/3ddriving-glm51.html

Most of the other models also gave me a various mix of garbage results.

Then, for $.05 on Openrouter, Qwen 3.6 27b provided this useful start of a game, right out of the gate:

http://1y1z.com:8284/3d-driving-qwen36-27b.html

That was not only the best first-shot result, it was better than every other result, even after some iteration with the other models.

Lately I've been fond of demonstrating game results and web site layout results, even though I don't have any need to write that sort of code, because it's easy to immediately understand the successes and failures of the code an LLM has provided, when looking at visual applications - but I've noticed that the same quality roughly parallels the models' output for all sorts of other tasks. When you see that one model builds a horribly shoddy looking web site layout, and produces games with broken controls and little playability, you'll tend to see that same model struggle to provide working back end logic for a business app.

You can see some business app CRUD examples scattered around my demo links on this site, such as:

As well as many quick demos oriented towards useful application building, such as:

http://1y1z.com:5994 https://com-pute.com/nick/ui_controls_qwen36-35a3_3080_16Gb.html

I've just been absolutely blown away by the Qwen 3.6 models. The 27b dense models goes head to head with many of the frontier models in terms of code quality, for all sorts of tasks - and the 35a3b MOE model is outrageously good, for the speed, especially given that it can run on such small GPUs (16GB VRAM is the lowest I've tried, and it works well even on q4 compression).

I hope this is a sign of even better quality and performance to come in smaller models this year...