Home
Information

AI Dataset Collection

Large-scale datasets and benchmarks for training, evaluating, and testing models to measure

Tools

Intelligent Document Recognition

Comprehensive Text Extraction and Document Processing Solutions for Users

AI Tutorial

count-tokens-hf-datasets

Public

This project shows how to derive the total number of training tokens from a large text dataset from ? datasets with Apache Beam and Dataflow.

Creat2022-06-10T11:25:54
Update2025-06-11T20:40:26
27
Stars
0
Stars Increase