OrdalieTech/wiki_en
收藏Hugging Face2025-06-25 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/OrdalieTech/wiki_en
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了2025年4月20日存在的完整版法语维基百科快照。它包含了每个页面的最新版本,包括原始文本内容、链接页面的标题以及一个唯一标识符。文章文本保留了MediaWiki格式结构,适用于标题(`== 标题 ==`)、子标题(`=== 子标题 ===`)等,这使得它在需要文档层次结构的任务中特别有用。该语料库非常适合训练语言模型、信息检索、问答以及任何需要大量结构化、百科全书式文本的自然语言处理(NLP)研究。
This dataset contains a complete snapshot of the French-language Wikipedia encyclopedia as it existed on April 20, 2025. It includes the latest version of each page, with its raw text content, the titles of linked pages, as well as a unique identifier. The text of each article retains the MediaWiki formatting structure for titles (`== Section Title ==`), subtitles (`=== Subtitle ===`), and so on, making it particularly useful for tasks that can benefit from the documents hierarchical structure. This corpus is ideal for training language models, information retrieval, question-answering, and any other Natural Language Processing (NLP) research requiring a large amount of structured, encyclopedic text.
提供机构:
OrdalieTech



