Step 3.7 is great

13 views
Nick Antonaccio
Nick AntonaccioAdmin
May 29, 2026 at 12:20 (edited, 1 revision)
#1

Step 3.7 is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU at IQ4_XS quantization, it's running at 25.51 tokens per second in LM Studio on the Asus GX10.

Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best model so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you need 128GB VRAM).

Nick Antonaccio
Nick AntonaccioAdmin
May 29, 2026 at 17:22
#2

First impressions continue to improve with Stepfun Flash 3.7. I built another Northwind full CRUD database example app, similar to what I've built with Deepseek-4-pro, Qwen 3.6, and other favorite models. It only took 2 prompts to complete, and would likely have been only 1 prompt if I'd asked for full CRUD functionality in the initial query:

https://com-pute.com/nick/northwind-flask-app-stepfun-flash-3.7-no-db.zip

Along the way, it provided a link to a web site containing the Northwind database as a downloadable SQLite file, and built the code of the app to work with this file:

https://github.com/jpwhite3/northwind-SQLite3

Results worked perfectly first shot.

Then I had the agent build a zip file package with everything needed to install the app on another server, and it yielded the file linked above, which included detailed install instructions.

I also had it build a 3D space invaders game. This was the first-shot result - bullets don't work yet, but a decent initial output:

https://com-pute.com/nick/space-invaders-3d-step37flash-1.html

Additionally, I asked the model set up Rustdesk to be restarted every time the computer restarts, and it did a perfect job first shot.

So far, step-3.7-flash feels more similar to a frontier model than most of the other models I've self hosted. I do wish it was just a bit smaller at q4 compression, because there's not a ton of room for KV cache (I had it set up to use ~89k).

I don't think it will dethrone Qwen 3.6 for basic coding tasks yet, but it's another great tool - the best yet for knowledge, which still runs at a fast speed.

Please login to post a reply.

© 2026 AI By Nick.