LlamaV-o1 is an advanced multimodal large language model specifically designed for complex visual reasoning tasks, optimized through curriculum learning techniques, demonstrating outstanding performance across diverse benchmarks.
Multimodal
SafetensorsEnglish