InternViT-6B-448px-V2_5 is a visual model built upon InternViT-6B-448px-V1-5, which enhances the visual encoder's ability to extract visual features by utilizing ViT incremental learning and NTP loss (Stage 1.5). It particularly excels in domains where representation is lacking in large-scale network datasets, such as multilingual OCR data and mathematical charts. This model is part of the InternVL 2.5 series, maintaining the same 'ViT-MLP-LLM' architecture as its predecessor, while integrating the newly incrementally pretrained InternViT alongside various pretrained LLMs, including InternLM 2.5 and Qwen 2.5, utilizing a randomly initialized MLP projector.