HomeAI Tutorial

layered-prefill

Public

Layered prefill changes the scheduling axis from tokens to layers and removes redundant MoE weight reloads while keeping decode stall free. The result is lower TTFT, lower end-to-end latency, and lower energy per token without hurting TBT stability.

Creat2025-10-13T16:59:57
Update2025-10-14T08:21:49
5
Stars
0
Stars Increase