A multilingual dialog corpus
ChatterBot is a machine learning, conversational dialog engine for creating chat bots
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
:fire:正在成为史上最全分类 Android 开源大全~~~~(长期更新 Star 一下吧)
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)
A collection of small corpuses of interesting data for the creation of bots and similar stuff.
Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML
搜索所有中文NLP数据集,附常用英文NLP数据集
A LITE BERT FOR SELF-SUPERVISED LEARNING OF LANGUAGE REPRESENTATIONS, 海量中文预训练ALBERT模型
MNBVC(Massive Never-ending BT Vast Chinese corpus)超大规模中文语料集。对标chatGPT训练的40T数据。MNBVC数据集不但包括主流文化,也包括各个小众文化甚至火星文的数据。MNBVC数据集包括新闻、作文、小说、书籍、杂志、论文、台词、帖子、wiki、古诗、歌词、商品介绍、笑话、糗事、聊天记录等一切形式的纯文本中文数据。
Deep Learning and deep reinforcement learning research papers and some codes