The multimodal motion-language model MotionGPT is an impressive technological innovation that unifies language and motion, transforming language instructions into captivating 3D human movements. Inspired by the concept of just-in-time learning, this model is pre-trained using a blend of motion-language data and fine-tuned through prompt-based question-and-answer tasks, resulting in outstanding performance. By treating human movements as a specific type of language, MotionGPT achieves seamless integration of motion and text. It employs discrete vector quantization to convert 3D movements into motion tokens, a process analogous to generating word tokens. What sets MotionGPT apart is its ability to comprehend and generate engaging human movements from fragmented language instructions, be it kicking or dancing, with rapid response times. This innovative motion-language model opens up unprecedented possibilities for fields such as virtual reality and film production.