BTW, qwen36:35a3 q4_k_s MTP runs at 59 tps on the DGX Spark and 54 tps on Strix Halo. The 8 bit version even runs at 40tps on the DGX Spark.
Holy crap that's quick for a very capable model, on a single piece of consumer hardware at low wattage.
Current version by Nick Antonaccio
BTW, qwen36:35a3 q4_k_s MTP runs at 59 tps on the DGX Spark and 54 tps on Strix Halo. The 8 bit version even runs at 40tps on the DGX Spark.
Holy crap that's quick for a very capable model, on a single piece of consumer hardware at low wattage.
BTW, qwen36:35a3 q4_k_s MTP runs at 59 tps on the DGX Spark. The 8 bit version even runs at 40tps.
Holy crap that's quick for a very capable model, on a single piece of consumer hardware at low wattage.