super-lazy-autograd
PublicHand-derived memory-efficient super lazy PyTorch VJPs for training LLMs on laptop, all using one op (bundled scaled matmuls).
Hand-derived memory-efficient super lazy PyTorch VJPs for training LLMs on laptop, all using one op (bundled scaled matmuls).