OpenLLM-France/wikisource.fr
收藏Hugging Face2023-12-21 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/OpenLLM-France/wikisource.fr
下载链接
链接失效反馈官方服务:
资源简介:
该数据集由OpenLLM France从Wikisource的转储文件中创建,包含了法文Wikisource页面的纯文本版本,去除了HTML标签和wiki模板,仅保留了标题、列表和表格的Markdown语法。数据集的最新转储(20231201)包含185,700个文档,585,700个段落,8,985,905行,523,208,959个单词和3,071,992,473个字符,内存占用为3122.9 MB,磁盘占用为1844.6 MB。
This dataset, created by OpenLLM France from Wikisource dumps, contains plain text pages from fr.wikisource.org without HTML tags or wiki templates. The text includes markdown syntax for headers, lists, and tables. It is suitable for tasks such as text generation and masked language modeling, with multiple configurations each having specific data files and features like id, url, title, and text. The dataset is licensed under CC-BY-SA-4.0 and provides detailed statistics, example usage in Python, and notes on data formatting.
提供机构:
OpenLLM-France



