At the recently concluded WWDC2026, LM Studio and Apple delivered a remarkable technical demonstration—successfully running Moonshot's 10-trillion parameter large model, Kimi K2.6, using only four Mac Studios in a cluster. This achievement shattered the conventional belief that "trillion-parameter models must rely on cloud GPU clusters," making it a reality for consumer-grade hardware to support cutting-edge AI computing power.

Kimi K2.6 has a total parameter scale of 1 trillion, using a MoE architecture with 32 billion activated parameters. It supports long context, multimodal input, and agent task processing. During this demonstration, four Mac Studios were connected through Apple's memory sharing and interconnection technologies to form a cluster, with a total unified memory of approximately 1.5TB, sufficient to meet the inference requirements of this massive model. Previous developer tests showed that under similar configurations, Kimi K2.6 could achieve a generation speed of about 28 tokens/s, while consuming far less power than traditional GPU solutions.

Connecting directly from iPhone to local cluster, data never leaves the premises

More notably, the demonstration also showcased LM Studio's LM Link remote access feature. Users can securely remotely connect to the Mac Studio cluster from their MacBook Neo laptop or iPhone, interact in real-time with the running model, and all data and communication are processed locally without going through the cloud.

LM Link has been updated into LM Studio's Mac application and Locally AI's iOS application, supporting end-to-end encrypted connections. This design allows users to access cluster-level AI computing power at any time, even with lightweight devices, without worrying about privacy leaks. Combined with Apple's Thunderbolt 5 RDMA and other multi-device memory sharing technologies, the entire ecosystem is rapidly forming a closed-loop in AI local deployment.

This collaboration sends a clear signal: deploying trillion-parameter large models locally is no longer an unreachable lab concept but is becoming an engineering reality on developers' desks. As Apple's hardware connectivity continues to evolve, the boundaries of consumer devices carrying large-scale AI inference are expected to be further expanded.