Google launches the "Gemini Notebooks" feature, creating a personal knowledge base to help users efficiently handle complex projects. The feature breaks down data barriers between Gemini and NotebookLM, building a closed-loop AI workflow. Users can manage chat history, documents, and PDFs in an integrated space, import past conversations, and guide Gemini with custom instructions for intelligent analysis.
MiniMax's open-source Office Skills engine addresses AI-generated content usability by supporting Word, Excel, PPT, and PDF under MIT license. It bypasses traditional libraries through low-level reconstruction to directly deliver standards, enhancing AI office practicality.....
Google launches Gemini Embedding2, a multimodal embedding model that unifies text, images, videos, audio, and PDFs into a single semantic space, enhancing AI data processing and multimodal retrieval capabilities.....
OpenAI upgrades its research AI platform Prism with GPT-5.3 and Codex CLI, integrating text editing, PDF reading, LaTeX compilation, and literature management to streamline scientific workflows and enhance collaboration.....
Free AI quiz generator that can generate quizzes from notes, PDFs, images, and YouTube without registration.
Quickly convert Markdown to beautiful PDFs without installation. Use it online for free and privately.
Analyze the floor plan with a nine - grid layout. AI provides feng shui remedies and generates a beautiful PDF report.
Free AI PDF summarizer. Summarize documents in seconds and chat with AI to understand key points.
TomoroAI
TomoroAI/tomoro-colqwen3-embed-4b is an advanced ColPali-style multimodal embedding model that can map text queries, visual documents (such as images, PDFs) or short videos into aligned multi-vector embeddings. This model combines the advantages of Qwen3-VL-4B-Instruct and Qwen3-Embedding-4B, performs excellently in the ViDoRe benchmark test, and significantly reduces the embedding space occupation.
prithivMLmods
Chandra is a high-precision OCR model that can convert images and PDFs into structured outputs, such as Markdown, HTML, and JSON, while retaining detailed layout information. It supports more than 40 languages and is good at handling complex document elements.
noctrex
A quantized version of LightOnOCR-1B-1025, specifically designed for image-to-text tasks and widely used in fields such as document understanding and visual language processing. This model supports multiple European languages and is suitable for scenarios such as OCR, PDF processing, and table recognition.
Mungert
The Nanonets-OCR2-3B GGUF model is a powerful tool designed for document processing. It can intelligently convert various types of documents into structured Markdown format and has multiple advanced recognition and processing capabilities such as OCR, image-to-text conversion, PDF-to-Markdown conversion, and visual question answering.
datalab-to
Chandra is an advanced OCR model that can extract text from images and PDFs with high precision and preserve layout information. It supports output in Markdown, HTML, and JSON formats and performs excellently in handwriting recognition, form reconstruction, table processing, etc. It supports more than 40 languages.
echo840
MonkeyOCR is a document parsing model based on the Structure-Recognition-Relationship (SRR) triple paradigm. It can efficiently process PDF and image documents, extract structured content such as text, formulas, and tables, and support the parsing of Chinese and English documents.
Adun
olmOCR is an optical character recognition model fine-tuned based on Qwen2-VL-7B-Instruct. It focuses on converting image content such as PDFs into text and improves the recognition accuracy in specific scenarios through fine-tuning.
apkonsta
A table detection model optimized for International Financial Reporting Standards (IFRS) PDF documents, excelling in processing borderless tables
kitjesen
This model converts PDF documents into Markdown format while preserving the original document layout structure and accurately recognizing mathematical formulas and tables.
shixuanleong
VisualHeist is an object detection model specifically designed to extract charts, schematics, and tables from PDF files, including titles, headers, and footers.
HongxuanLi
Nougat is a vision-language model based on the Donut architecture, specifically designed for transcribing scientific PDFs into Markdown format.
hantian
A reading order prediction model that converts text boxes extracted from PDF or detected by OCR into a readable sequence.
Xenova
Nougat is a vision-based academic document understanding model capable of converting scientific PDF images into Markdown-formatted text.
facebook
Nougat is a vision-language model based on the Donut architecture, specifically designed for converting scientific PDFs into Markdown format.
Nougat is a model based on the Donut architecture, specifically trained for transcribing scientific PDFs into easy-to-use Markdown format
shubh1608
OCR model trained on image folder datasets for text recognition in PDF documents
impira
A document classification model fine-tuned based on the LayoutLM architecture, specifically designed for classifying PDF documents, especially invoices
geralt
A distilled GPT-2 model fine-tuned on texts from over 100 mechanical/automotive PDF books, specializing in text generation tasks in the mechanical engineering field
Markdownify is a multi-functional file conversion service that supports converting multiple formats such as PDFs, images, audio, and web page content into Markdown format.
PageIndex MCP is an inference-based vectorless RAG system. Through the MCP protocol, it exposes the tree-like index of documents to LLMs, enabling platforms such as Claude to retrieve information from PDF documents through structural reasoning like human experts, without the need for a vector database.
A production - level Berlin city service MCP server that provides comprehensive service queries, intelligent PDF form processing, elastic caching, and remote synchronization functions.
A service implementation for retrieving data such as PDF from AWS S3 via the MCP protocol
An MCP server based on FastAPI that automatically fetches, summarizes, and pushes Reddit content to Slack. The system uses Azure OpenAI to generate summaries of posts from selected sub - reddits, organizes them into PDF reports, and shares them with the team.
The MCP Document Converter is a multi-format document conversion tool based on the MCP protocol, supporting bidirectional conversion between five formats: Markdown, HTML, DOCX, PDF, and text, providing powerful document processing capabilities for AI assistants.
The enhanced Markdownify MCP UTF-8 is a Markdown processing service that supports multilingual content conversion. It optimizes UTF-8 encoding support, provides Markdown conversion capabilities for various formats such as PDF, images, audio and video, and Office documents, and is specifically optimized for the Windows system.
The arXiv MCP Server is a service based on the Model Context Protocol (MCP) that allows users to interact with the arXiv API using natural language, enabling functions such as retrieving academic article metadata, downloading PDF files, searching the database, and loading articles into the context of a large - language model (LLM).
This project is an integrated MCP server suite with various functions, including media tools, information retrieval, PDF generation, and presentation creation services, which need to be configured and run separately.
The PDF Reader MCP service provides AI agents with a secure and flexible function to extract content from PDF files, including text, metadata, and page count information. It supports local and remote PDF files and is easy to integrate into the MCP environment.
Deep Research is an agent - based tool that provides web search and advanced research functions, supports PDF analysis, image description, and YouTube transcription extraction, and can run as an MCP server.
An MCP server designed for Claude Desktop Edition, capable of scraping web page text, YouTube video subtitles, and PDF file content via links.
A PDF form processing toolkit based on MCP and PyMuPDF, providing PDF file search, form field extraction, and visualization functions.
A high-performance PDF to Markdown service based on MCP, supporting batch processing of local files and URLs, retaining the document structure and intelligently optimizing the output.
The MCP server implementation of Foxit PDF API, providing Python and TypeScript versions, exposes more than 35 operations (such as creation, conversion, editing, security, OCR, etc.) of Foxit PDF services as tools available to AI agents.
Zed's PDF semantic search extension, integrating an AI assistant to enhance document processing capabilities
An MCP server for converting Markdown documents to PDF files, supporting syntax highlighting and custom styles
This project is a USPTO patent data access server based on FastMCP. It supports accessing patent and patent application data from the United States Patent and Trademark Office through the Patent Public Search API and the Open Data Portal API, providing patent search, full - text retrieval, PDF download, and metadata query functions for MCP clients such as Claude Desktop.
This project builds an HR chatbot based on RAG. The MCP server serves as the functional call center to implement PDF document upload, parsing, retrieval, and natural language question answering functions.
A high-performance PDF to Markdown service based on MCP, supporting batch processing and structured output