thoughtworks/wiki_bio
收藏Hugging Face2025-11-12 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/thoughtworks/wiki_bio
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含从Wikipedia提取的728321个传记条目,每个条目包括传记的第一段和表格信息框内容。数据集主要用于文本生成模型的开发,包含训练集、测试集和验证集,分别有582659、72831和72831个样本。数据集字段包括上下文、表格(包含列标题、内容和行号)和目标文本。
This dataset contains 728321 biographies extracted from Wikipedia, including the first paragraph of the biography and the tabular infobox content. It is primarily used for developing text generation models, with a training set, test set, and validation set containing 582659, 72831, and 72831 samples respectively. The dataset fields include context, table (with column headers, content, and row numbers), and target text.
提供机构:
thoughtworks



