Post History

Current version by Nick Antonaccio

Current VersionMay 29, 2026 at 12:20

Step 3.7 is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU at IQ4_XS quantization, it's running at 25.51 tokens per second in LM Studio on the Asus GX10.

Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best model so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you need 128GB VRAM).

Previous Versions
Version 1May 29, 2026 at 12:20

Step 3.7 is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU, it's running at 25.51 tokens per second in LM Studio on the Asus GX10.

Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you need 128GB VRAM).