Beijing Academy of Artificial Intelligence Releases Chinese Internet Corpus CCI3.0 Containing 1000GB Dataset
At the 2024 Beijing Cultural Forum, the Beijing Academy of Artificial Intelligence (BAAI) officially announced the release of the next-generation Chinese Internet corpus CCI3.0 (Chinese Corpora Internet), further promoting data co-construction and sharing. CCI3.0 includes a 1000GB dataset and a 498GB high-quality subset CCI3.0-HQ, marking another important update following the initial open-source release of CCI1.0 in November 2023 and the release of CCI2.0 in April 2024.