Eric-Valyu/Test-Prompt
收藏Hugging Face2024-07-26 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/Eric-Valyu/Test-Prompt
下载链接
链接失效反馈官方服务:
资源简介:
MAP-CC是一个开源的中文预训练数据集,包含8000亿个标记,旨在为自然语言处理(NLP)社区提供高质量的中文预训练数据。该数据集仅用于学术研究,采用严格的合规检查以确保数据的完整性和合规性。数据集的使用受限于Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0),允许非商业用途的共享,但禁止修改和衍生作品。
MAP-CC is an open-source Chinese pretraining dataset with a scale of 800 billion tokens, offering the NLP community high-quality Chinese pretraining data. This dataset is intended solely for scholarly research, employing rigorously compliance-checked training data to uphold the highest standards of integrity and compliance. The use of the dataset is governed by the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (CC BY-NC-ND 4.0), which permits sharing for non-commercial purposes only, with no modifications or derivatives allowed.
提供机构:
Eric-Valyu



