At the "Humanoid Robot Innovation and Development Cooperation" sub-forum of the 8th Hongqiao International Economic Forum, Wang Xingxing, founder and CEO of Unitree Technology, delivered a keynote speech, sharing his latest insights on the future development of embodied intelligence and robot large models.
Wang Xingxing said that the current stage of robot large model development is roughly equivalent to 1 to 3 years before the release of ChatGPT. "We have found the right direction, but there is still a significant gap before reaching the critical point of actual implementation," he said.

He pointed out that although generative AI has made breakthroughs in the fields of language and vision over the past two years, to achieve true "embodied intelligence" in robots, multiple systematic challenges in perception, motion control, and interaction understanding need to be addressed.
When discussing when the "ChatGPT moment" for embodied intelligence will arrive, Wang Xingxing gave a specific judgment: "We can consider it has truly arrived when robots can complete about 80% of tasks in unfamiliar living scenarios by only following voice or text instructions."
He believes that achieving this goal requires strong physical world modeling capabilities, data feedback mechanisms, and real-time learning systems as support, not just relying on the reasoning and generation capabilities of large models themselves.


