Home
Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

trafilatura

Public

Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

Creat2019-04-08T19:38:48
Update2025-03-26T19:40:42
https://trafilatura.readthedocs.io
4.8K
Stars
22
Stars Increase

Related projects