MINT-1T
A multimodal dataset comprising one trillion tokens and 3.4 billion images.
PremiumNewProductOpenSourceMultimodalDataset
MINT-1T is a multimodal dataset open-sourced by Salesforce AI, containing one trillion text tokens and 3.4 billion images, making it ten times larger than existing open-source datasets. It includes not only HTML documents but also PDF documents and ArXiv papers, enriching the dataset's diversity. The construction of MINT-1T involves multiple data collection, processing, and filtering steps to ensure high quality and diversity of the data.
MINT-1T Visit Over Time
Monthly Visits
No Data
Bounce Rate
No Data
Page per Visit
No Data
Visit Duration
No Data
MINT-1T Visit Trend
No Visits Data
MINT-1T Visit Geography
No Geography Data
MINT-1T Traffic Sources
No Traffic Sources Data