Tencent/hy3-preview is the current medium sized model to watch

Tencent/hy3-preview has absolutely impressed me so far. For local inference, it could be the first genuinely viable alternative to huge trillion+ parameter models like deepseek, kimi, and glm.

Hy3 is a 295B-parameter MOE model with 21B active parameters and a 262K context window. It's ridiculously cheap ($0.066 /M input tokens $0.26 /M output tokens) and extremely fast to run on Openrouter.

Output from Hy3 has been very high quality, comparable to trillion+ parameter models.

A 4 bit quantized version would likely run nicely on 2 clustered DGX Spark machines, for example, so you don't need a datacenter, or a new electrical circuit in your house to run it. This looks like truly viable step up from smaller models in the Qwen3.6/Gemma4 class.