xAI, the AI startup founded by Musk, announced on June 1, 2026, on its official recruitment website that it is openly recruiting Chinese AI trainers worldwide. The aim is to comprehensively optimize and enhance the voice interaction and multilingual processing capabilities of its flagship large model Grok. This strategic move marks further efforts by xAI in the field of multimodal speech technology.

According to the job posting, the position offers highly competitive compensation, with an hourly wage ranging from $35 to $45 (approximately RMB 237 to 304). The work model is highly flexible, supporting full-time, part-time, or contract-based remote work, requiring a minimum of 10 hours per week on average. Unlike traditional text annotation, the core task of this recruitment directly targets deep audio training for large models.
The specific responsibilities include speech annotation, recording, transcription, and evaluation of accents and intonation. xAI requires candidates to have native-level Chinese proficiency and a deep understanding of different accents, dialects, or regional variations. English proficiency should reach at least B2 level to ensure natural English audio recordings. Candidates with backgrounds in linguistics, phonetics, voice acting, or audio data annotation will be given priority.
Currently, the global competition among large models has evolved from simple text understanding to real-time multimodal interactions such as speech and images. xAI's significant recruitment of Chinese audio experts is not only aimed at eliminating accent and intonation barriers for Grok in the Chinese language context but also to build a deeper technological moat in the global multimodal competition against front-line companies like OpenAI and Anthropic. This move is expected to significantly accelerate the deployment of Grok in end-side voice interaction and cross-cultural application scenarios.
