Post History

Current VersionMay 30, 2026 at 17:33

Qwen 3.5 122a10b on Strix Halo runs at:

10.05 tokens per second without MTP (q5_k_l Bartoski version)
25.30 tokens per second with MTP (q4_k_s Unsloth version) Be aware of the q5 vs q4 comparison here

On the Nvidia GX10:

11.10 tokens per second without MTP (q5_k_l Unsloth version)
15.26 tokens per second with MTP (q5_k_l Unsloth version) Be aware of the q5 vs q5, exact same model comparison here

Strangely, the Bartoski version of Qwen 3.5 122a10b at q5_k_l quantization ran at 19.13 tokens per second on the Nvidia machine - faster than the MTP version of the exact same model from Unsloth, on same machine, with all other settings the same. What magic is Bartoski wielding?

BTW, the Nvidia machine is always faster than the Strix Halo at loading models, and at processing input tokens, regardless of MTP or not.

Version 4May 30, 2026 at 17:33

Qwen 3.5 122a10b on Strix Halo runs at:

10.05 tokens per second without MTP (q5_k_l Bartoski version)
17.23 tokens per second with MTP (q4_k_s Unsloth version) Be aware of the q5 vs q4 comparison here

On the Nvidia GX10:

11.10 tokens per second without MTP (q5_k_l Unsloth version)
15.26 tokens per second with MTP (q5_k_l Unsloth version) Be aware of the q5 vs q5, exact same model comparison here

Strangely, the Bartoski version of Qwen 3.5 122a10b at q5_k_l quantization ran at 19.13 tokens per second on the Nvidia machine - faster than the MTP version of the exact same model from Unsloth, on same machine, with all other settings the same. What magic is Bartoski wielding?

BTW, the Nvidia machine is always faster than the Strix Halo at loading models, and at processing input tokens, regardless of MTP or not.

Version 3May 25, 2026 at 14:04

Qwen 3.5 122a10b on Strix Halo runs at:

10.05 tokens per second without MTP (q5_k_l Bartoski version)
17.23 tokens per second with MTP (q4_k_s Unsloth version) Be aware of the q5 vs q4 comparison here

On the Nvidia GX10:

11.10 tokens per second without MTP (q5_k_l Unsloth version)
15.26 tokens per second with MTP (q5_k_l Unsloth version) Be aware of the q5 vs q5, exact same model comparison here

Strangely, the Bartoski version of Qwen 3.5 122a10b at q5_k_l quantization ran at 19.13 tokens per second on the Nvidia machine - faster than the MTP version of the exact same model from Unsloth, on same machine, with all other settings the same. What magic is Bartoski wielding?

BTW, the Nvidia machine is always faster than the Strix Halo at loading models, and at processing input tokens, regardless of MTP or not.

Version 2May 25, 2026 at 14:03

Qwen 3.5 122a10b on Strix Halo runs at:

10.05 tokens per second without MTP (q5_k_l Bartoski version) 17.23 tokens per second with MTP (q4_k_s Unsloth version) Be aware of the q5 vs q4 comparison here

On the Nvidia GX10:

11.10 tokens per second without MTP (q5_k_l Unsloth version) 15.26 tokens per second* with MTP (q5_k_l Unsloth version) Be aware of the q5 vs q5, exact same model comparison here

Strangely, the Bartoski version of Qwen 3.5 122a10b at q5_k_l quantization ran at 19.13 tokens per second on the Nvidia machine - faster than the MTP version of the exact same model from Unsloth, on same machine, with all other settings the same. What magic is Bartoski wielding?

BTW, the Nvidia machine is always faster than the Strix Halo at loading models, and at processing input tokens, regardless of MTP or not.

Version 1May 25, 2026 at 14:03

Qwen 3.5 122a10b on Strix Halo runs at:

10.05 tokens per second without MTP (q5_k_l Bartoski version) 17.23 tokens per second with MTP (q4_k_s Unsloth version) Be aware of the q5 vs q4 comparison here

on the Nvidia GX10:

11.10 tokens per second without MTP (q5_k_l Unsloth version) 15.26 tokens per second with MTP (q5_k_l Unsloth version) Be aware of the q5 vs q5 comparison here

Strangely, the Bartoski version of Qwen 3.5 122a10b at q5_k_l quantization ran at 19.13 tokens per second on the Nvidia machine - faster than the MTP version of the exact same model from Unsloth, on same machine, with all other settings the same.

BTW, the Nvidia machine is always faster than the Strix Halo at loading models, and at processing input tokens, regardless of MTP or not.

Previous Versions