The AutoMathText dataset is an extensive collection of mathematical text data, totaling 200GB in size. This dataset aggregates data from various sources, including scientific papers, programming code snippets, and web content. It is suitable for applications such as mathematical reasoning, inference training, and fine-tuning. The dataset also supports text generation and question-answering tasks, making it particularly useful for developing and testing models that understand and generate mathematical content. Currently, the dataset ranges from 1 billion to 10 billion data points, providing a rich resource for large-scale model training.