IBM Releases Granite-Docling-258M: Open-Source Enterprise Document AI Model

AIbase基地

Published inAI News · 4 min read · Sep 18, 2025

Recently, IBM officially released Granite-Docling-258M, an open-source vision-language model designed for end-to-end document conversion. Compared to traditional OCR (Optical Character Recognition) technology, Granite-Docling focuses on preserving the layout information of documents, effectively extracting elements such as tables, code, formulas, lists, and headings, and outputting structured machine-readable formats rather than simplified Markdown formats. The model is now available on the Hugging Face platform, where users can experience it live and obtain the MLX version optimized for Apple Silicon.

Granite-Docling is an improved version of SmolDocling-256M. IBM has optimized the original technical architecture, using the Granite165M language model and upgrading the visual encoder to SigLIP2, while maintaining the Idefics3-style connector. These updates have increased the parameter count of Granite-Docling to 258M, significantly improving performance in layout analysis, full-page OCR, code, formulas, and tables. In addition, IBM has resolved instability issues found in the preview model, such as the repeated token loop phenomenon.

Granite-Docling uses an architecture based on Idefics3 and employs the nanoVLM training framework. Its output, DocTags, is a markup language developed by IBM that clearly represents document structure, including elements, coordinates, and relationships, making it easy for subsequent tools to convert it into Markdown, HTML, or JSON format. This structured output not only maintains the order of table topologies, mathematical formulas, code blocks, and headings but also improves data indexing quality and enhances retrieval capabilities.

In terms of multilingual support, Granite-Docling has added experimental support for Japanese, Arabic, and Chinese for the first time, although English remains the primary target. IBM recommends integrating Granite-Docling with Docling, using its CLI/SDK to automatically convert PDFs, office documents, and images into multiple formats. This model runs smoothly in environments such as Transformers, vLLM, ONNX, and MLX, with special optimization for Apple Silicon.

Granite-Docling's release marks another major advancement in enterprise-level document AI technology. By integrating IBM's Granite foundation architecture, the SigLIP2 visual encoder, and the nanoVLM training framework, the model provides excellent performance while remaining lightweight, offering a solid foundation for handling tables, formulas, code, and multilingual text. Overall, Granite-Docling provides a practical solution for accurate and reliable document conversion and enhanced retrieval workflows.

huggingface:https://huggingface.co/collections/ibm-granite/granite-docling-682b8c766a565487bcb3ca00

Key Points:
🌟 The new model Granite-Docling-258M aims to improve document conversion accuracy and preserve layout information.
🔧 It uses an advanced technical architecture, performing well in multiple areas compared to the previous version, SmolDocling.
🌍 It adds support for multiple languages, enhancing the model's application scope and flexibility.

AI Daily: Major Update to ChatGPT Atlas Browser; Klyne AI Earns $20 Million Monthly; Tongyi Qianwen Officially Opens Source Qwen3-TTS

Welcome to the 【AI Daily】 segment! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest news in the AI field, focusing on developers, helping you understand technical trends and innovative AI product applications. Click to learn more about new AI products: https://app.aibase.com/zh1. OpenAI releases a major update to the ChatGPT Atlas browser for Mac: supports tab grouping and automatic navigation in search mode. OpenAI has developed this specifically for the Mac platform.

Aliyun Tongyi Qwen3-TTS: A Groundbreaking Open Source Text-to-Speech with 97ms Ultra-Low Latency - 3-Second Voice Cloning + One-Sentence Voice Design, Completely Transforming Real-Time AI Speech!

Alibaba's Qwen3-TTS series, an open-source speech generation model, features an end-to-end architecture enabling second-level voice cloning, natural language voice design, and real-time streaming. Its innovative Dual-Track mechanism with discrete multi-codebook language model achieves ultra-low latency, lowering barriers for real-time applications.....

Only Using 1% of the Resources of Top U.S. Laboratories! Zhang Yuting, President of Kimi, Says Domestic Large Models Can Win Without Relying on Massive Investment

At the 2026 Davos Forum, Zhang Yuting, president of Moon Shadow, pointed out that their AI product Kimi achieved significant results by using only 1% of the resources of top U.S. laboratories, challenging the industry concept of 'more computing power means better results,' demonstrating the ability of Chinese teams to efficiently utilize resources and achieve major breakthroughs at a low cost.

Ant Tech Financial AI Applies to the Insurance Industry, Signs Agreement with Tongfang Global Life Insurance for Insurance AI Innovation Applications

Ant Tech collaborates with Tongfang Global Life Insurance, using AI technology as the core to deepen cooperation in all areas of insurance. The goal is to reshape business processes through technological empowerment, enhancing operational efficiency and risk control. AI technology has become a crucial engine for high-quality development in the insurance industry, with leading insurers increasingly prioritizing it as a strategic focus.

AI Daily: Tmall Launches AI Image Verification Model; Baichuan Releases Medical Model Baichuan-M3 Plus; Remotion Skills Bring the Era of Making Movies in One Sentence

Welcome to the [AI Daily] column! Here is your guide to exploring the world of artificial intelligence every day. Every day, we present you with the latest content in the AI field, focusing on developers, helping you understand technology trends and innovative AI product applications. Explore new AI products: https://app.aibase.com/zh1. Taobao and Tmall take a strong approach! The new Siri will support voice and text dual input, and will be integrated into iOS27 and all its operating systems, while leveraging the Google Gemini model to enhance performance.

Latest AI News

AI Daily Brief

AI Product Finder

AI Product Rankings

AI Product Submit

AI Tools Directory

AI Models Finder

LLM Leaderboard

Model Providers

Compare LLMs

LLM Cost Calculator

LLM Arena

MCP Servers

MCP Client

MCP Case Tutorials

MCP Ranking

MCP Service Submission

MCP Playground

MCP Inspector

GEO Brand Visibility

AI Brand Monitoring Tool

AI Search Visibility Checker

GEO Promotion Link Detection

GEO Services​

AI Model Compatibility Checker

AI Deployment Calculator

IBM Releases Granite-Docling-258M: Open-Source Enterprise Document AI Model

AIbase基地

This article is from AIbase Daily

AI News Recommendations

OpenAI Expands Business Model: ChatGPT Advertising Service to Launch in February, Enabling Monetization for 900 Million Active Users

AI Daily: Major Update to ChatGPT Atlas Browser; Klyne AI Earns $20 Million Monthly; Tongyi Qianwen Officially Opens Source Qwen3-TTS

The Visual Limitation of Silicon-Based Life: Can Top-Level Large Models Outperform a 6-Year-Old Child in Visual Reasoning?

Aliyun Tongyi Qwen3-TTS: A Groundbreaking Open Source Text-to-Speech with 97ms Ultra-Low Latency - 3-Second Voice Cloning + One-Sentence Voice Design, Completely Transforming Real-Time AI Speech!

Only Using 1% of the Resources of Top U.S. Laboratories! Zhang Yuting, President of Kimi, Says Domestic Large Models Can Win Without Relying on Massive Investment

OpenAI to Take a Percentage from Customer AI-Assisted R&D Outcomes, Further Upgrading Its Commercial Model and Triggering Industry Attention

Ant Tech Financial AI Applies to the Insurance Industry, Signs Agreement with Tongfang Global Life Insurance for Insurance AI Innovation Applications

Kimi from Moonshot AI Makes Debut at Davos: Surpassing U.S. Proprietary Models with 1% Computing Power, Engineering Thinking Becomes the Key for China's AI Breakthrough

Microsoft Releases Rho-alpha Model: Giving AI Robots Human-like Flexibility and Understanding

AI Daily: Tmall Launches AI Image Verification Model; Baichuan Releases Medical Model Baichuan-M3 Plus; Remotion Skills Bring the Era of Making Movies in One Sentence

AI News Recommendations

OpenAI Expands Business Model: ChatGPT Advertising Service to Launch in February, Enabling Monetization for 900 Million Active Users

AI Daily: Major Update to ChatGPT Atlas Browser; Klyne AI Earns $20 Million Monthly; Tongyi Qianwen Officially Opens Source Qwen3-TTS

The Visual Limitation of Silicon-Based Life: Can Top-Level Large Models Outperform a 6-Year-Old Child in Visual Reasoning?

Aliyun Tongyi Qwen3-TTS: A Groundbreaking Open Source Text-to-Speech with 97ms Ultra-Low Latency - 3-Second Voice Cloning + One-Sentence Voice Design, Completely Transforming Real-Time AI Speech!

Only Using 1% of the Resources of Top U.S. Laboratories! Zhang Yuting, President of Kimi, Says Domestic Large Models Can Win Without Relying on Massive Investment

OpenAI to Take a Percentage from Customer AI-Assisted R&D Outcomes, Further Upgrading Its Commercial Model and Triggering Industry Attention

Ant Tech Financial AI Applies to the Insurance Industry, Signs Agreement with Tongfang Global Life Insurance for Insurance AI Innovation Applications

Kimi from Moonshot AI Makes Debut at Davos: Surpassing U.S. Proprietary Models with 1% Computing Power, Engineering Thinking Becomes the Key for China's AI Breakthrough

Microsoft Releases Rho-alpha Model: Giving AI Robots Human-like Flexibility and Understanding

AI Daily: Tmall Launches AI Image Verification Model; Baichuan Releases Medical Model Baichuan-M3 Plus; Remotion Skills Bring the Era of Making Movies in One Sentence

GEO Services