Be sure to download MTP versions of the most recent models - they run much faster. For example, in LM Studio qwen3.6 27b_q4_k_s (that's the big dense model) sees these improvements in performance:
On the Asus gx10:
- 4.11 tokens per second without MTP (LM Studio community version)
- 11.45 tokens per second with MTP (Unsloth version, same model, exact same machine, with all default settings)
On the Strix Halo, ASUS ROG Flow Z13 laptop, the exact same models ran significantly faster:
- 9.31 tokens per second without MTP
- 20.24 tokens per second with MTP
Those are fantastic improvements, entirely for free - and boy oh boy is that Strix Halo impressive here! It even runs the q6_k_xl quant of qwen36:27b at 13.48t tps. What a beast.
You can get really serious work done with qwen 3.6 27b, at 20tps. For coding, that model is competitive with much bigger models that were previously impossible to run on anything but datacenter class hardware.
So with this one update, we've really seen some tremendous improvements in capability become available to the self-hosting crowd, very quickly overnight.
Just be sure:
- You've got the most recent version of LM Studio and the Llama CPP runtime. MTP is only supported in the most recent release of both the runtime and the application.
- The MTP speculative decoding toggle in the model load parameters is switched on every session
- You've got an MTP version of the most recent LLMs downloaded - specifically with Qwen 3.6, if you downloaded a version in the past few weeks that wasn't explicitly labeled MTP, you'll need to download a newer version with MTP enabled (the Unsloth MTP models have all worked well for me). All Gemma models and Nemotron 3 Super have supported MTP out of the box.
I used LM Studio 0.4.14 (Build 3) and Llama CPP 2.16.0 for the tests above.
If you're using other inference software, be sure to have the most recent version of Llama CPP.