karpathy/llm.c is a project that implements LLM training using a simple C/CUDA implementation. It aims to provide a clean and straightforward reference implementation while also including optimized versions that can approach the performance of PyTorch with drastically reduced code and dependencies. It is currently under development for a direct CUDA implementation, CPU version optimization using SIMD instructions, and support for more modern architectures such as Llama2 and Gemma.