Step 3.7 is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU at IQ4_XS quantization, it's running at 25.51 tokens per second in LM Studio on the Asus GX10.
Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best model so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you need 128GB VRAM).