laion/Wikipedia-Abstract
收藏Hugging Face2024-10-19 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/laion/Wikipedia-Abstract
下载链接
链接失效反馈官方服务:
资源简介:
Wikipedia Abstract数据集是一个全面的数据集,包含了多种语言的维基百科摘要、完整文章以及流行度评分指数。该数据集特别关注了一些较少被覆盖的语言,如希伯来语、乌尔都语、孟加拉语、阿拉姆语、维吾尔语和波兰语等,以确保这些语言的维基百科数据集能够被高质量地处理和访问。数据集的结构包括URL、Wiki、Language、Title、Abstract和Version Control等字段。数据集定期更新,旨在为AI提供跨语言的支持,打破语言障碍,促进包容性。
The Wikipedia Abstract dataset is a comprehensive collection encompassing abstracts, complete articles, and a popularity score index for a wide range of languages, including both widely spoken and lesser-known ones. Special attention has been given to languages that are often underrepresented, such as Hebrew, Urdu, Bengali, Aramaic, Uighur, and Polish, to ensure high-quality processed Wikipedia datasets are accessible for these languages. The dataset includes fields such as URL, Wiki, Language, Title, Abstract, and Version Control. It is regularly updated to support AI across all languages, aiming to break down language barriers and foster inclusivity.
提供机构:
laion



