eduagarcia/cc_news_pt_v2
收藏Hugging Face2025-04-28 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/eduagarcia/cc_news_pt_v2
下载链接
链接失效反馈官方服务:
资源简介:
CC-News-PT v2是一个葡萄牙语的新闻文章数据集,包含超过1100万篇文章,时间跨度从2016年到2024年6月。数据集已经过清理和去重,并且对文章的语言进行了检测和过滤,确保了数据的质量。这个数据集适用于文本分类、问题回答、文本生成等任务。
CC-News-PT v2 is a Portuguese language news article dataset containing over 11 million articles, ranging from 2016 to June 2024. The dataset has been cleaned and deduplicated, and the language of the articles has been detected and filtered to ensure data quality. This dataset is suitable for tasks such as text classification, question answering, and text generation.
提供机构:
eduagarcia



