LLaMA-Omni is a low-latency, high-quality end-to-end speech interaction model built on the Llama-3.1-8B-Instruct architecture, aimed at achieving speech capabilities comparable to GPT-4o. The model supports low-latency speech interactions, generating text and speech responses simultaneously. It completed training in less than 3 days using only 4 GPUs, demonstrating its efficient training capabilities.