AIbase
Product LibraryTool NavigationMCP

Muon-RMS-Norm

Public

This version of Muon converges slightly faster than the Muon from modded-nanogpt in some cases. The change is RMS-Norm after orthogonalization over the first dimension of the weight matrix (last dimension of nn.Linear). The code here assumes you store the weights like nn.Linear i.e. used like x = x @ W.T.

Creat2025-06-08T19:24:41
Update2025-06-13T06:15:16
1
Stars
0
Stars Increase