Post History - AI By Nick

Current VersionMay 30, 2026 at 12:13

Step-3.7-Flash by Stepfun is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU at IQ4_XS quantization, it's running at 25.51 tokens per second in LM Studio on the Asus GX10 (DGX Spark).

Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek-v4-flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best model so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you do need a minimum 128GB VRAM to run it).

Previous Versions

Version 2May 30, 2026 at 12:13

Step 3.7 is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU at IQ4_XS quantization, it's running at 25.51 tokens per second in LM Studio on the Asus GX10.

Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best model so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you need 128GB VRAM).

Version 1May 29, 2026 at 12:20

Step 3.7 is looking like a fantastic model for local inference. It appears to be much more knowledgeable than any other model I can run on a single local machine, and if I offload all layers to GPU, it's running at 25.51 tokens per second in LM Studio on the Asus GX10.

Benchmarks and community feedback so far seem to indicate it's better at coding and agentic tasks than Deepseek flash and Qwen 27b. I'll believe it when I see it do better than qwen27b at coding, but either way, it looks to be the best so far for encyclopedic information in a small enough package to run on a single consumer grade machine (you need 128GB VRAM).