Recently, Galaxy General, in collaboration with teams from Peking University, the University of Adelaide, and Zhejiang University, launched the world's first cross-ontology full-scene panoramic navigation foundation model — NavFoM (Navigation Foundation Model). This innovative model aims to integrate various robot navigation tasks into a unified framework, including visual and language navigation, goal-oriented navigation, visual tracking, and autonomous driving, among other application scenarios.

One of the features of NavFoM is its full-scenario support capability. Whether indoors or outdoors, the model can run without prior knowledge in unseen environments, without the need for additional mapping or data collection. This means users can more conveniently apply this technology in various environments without cumbersome preparation work.
Additionally, NavFoM also has multi-task support capabilities, allowing it to perform tasks such as target following and autonomous navigation through natural language instructions. This design enables different robots to quickly adapt, and robots of various sizes, from robotic dogs to drones, from wheeled humanoids to cars, can efficiently operate within this framework.
From a technical perspective, NavFoM introduces two key innovations: first, TVI Tokens (Temporal-Viewpoint-Indexed Tokens), which enable the model to understand time and direction; second, the BATS strategy (Budget-Aware Token Sampling), which allows the model to perform well even under limited computational resources.

Notably, NavFoM has built a large cross-task dataset containing approximately eight million cross-task, cross-ontology navigation data, as well as four million open-ended question-and-answer data. This training volume is twice that of previous works, giving the model stronger language and spatial semantic understanding capabilities.
The release of NavFoM marks a major advancement in the field of robot navigation. Developers can build upon this model and further develop application models tailored to specific needs through subsequent training.
Key Points:
🌟 NavFoM is the world's first cross-ontology full-scene panoramic navigation large model, capable of unifying multiple robot navigation tasks.
🏞️ The model supports zero-shot operation in both indoor and outdoor scenarios, without the need for additional mapping or data collection.
💡 Introduces TVI Tokens and BATS strategy, enhancing the model's performance in understanding time, direction, and under computing resource constraints.