I'm currently looking for the best price/performance and power-efficient solution to run trillion-parameter models like Kimi K2.5 and DeepSeek-V4, and I think a 4-node Mac Studio M3 Ultra cluster ($25,000+) may be the most sensible option for 1TB of memory capacity, and capable of ~35–45 tokens/sec on those huge models:
Apple Mac Studio (M3 Ultra, 256GB RAM, 1TB SSD):
4 x $5,999.00 = $23,996.00
OWC Thunderbolt 5 Hub (5-Port):
$189.99
OWC Thunderbolt 5 Cable (2.0m):
3 x $79.99 = $239.97
CyberPower CP1500PFCLCD 1500VA/1000W Pure Sine UPS:
239.95
Sonnet RackMac Studio Pro 3U (Holds 2 Studios):
2x $549.99 = $1,099.98
Sound Town 12U 4-Post Open Frame Rack:
$157.99
ESTIMATED TOTAL $25,923.88
I'm also considering a cluster of 8x ASUS Ascent GX10 which would be around $28,000, but should also perform faster, more like ~50+ tokens/sec, faster Time to First Token (TTFT) prompt processing, and has CUDA support for non-LLM tasks like image/video generation. This would draw more like ~2000W of power max, so would require a dedicated 20-amp, 120V circuit.
I could also consider getting 6 more strix halo machines... (I'm not sure if anyone has tested that yet, with really fast networking)
If you have any experience with any of these configurations, or another alternative, please share!