wiki_movies
收藏魔搭社区2025-11-27 更新2025-05-24 收录
下载链接:
https://modelscope.cn/datasets/facebook/wiki_movies
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card for WikiMovies
## Table of Contents
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Dataset Creation](#dataset-creation)
- [Curation Rationale](#curation-rationale)
- [Source Data](#source-data)
- [Annotations](#annotations)
- [Personal and Sensitive Information](#personal-and-sensitive-information)
- [Considerations for Using the Data](#considerations-for-using-the-data)
- [Social Impact of Dataset](#social-impact-of-dataset)
- [Discussion of Biases](#discussion-of-biases)
- [Other Known Limitations](#other-known-limitations)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Homepage:** [WikiMovies Homepage](https://research.fb.com/downloads/babi/)
- **Repository:**
- **Paper:** [Key-Value Memory Networks for Directly Reading Documents](https://arxiv.org/pdf/1606.03126.pdf)
- **Leaderboard:**
- **Point of Contact:**
### Dataset Summary
The WikiMovies dataset consists of roughly 100k (templated) questions over 75k entitiesbased on questions with answers in the open movie database (OMDb). It is the QA part of the Movie Dialog dataset.
### Supported Tasks and Leaderboards
- Question Answering
### Languages
The text in the dataset is written in English.
## Dataset Structure
### Data Instances
The raw data consists of question answer pairs separated by a tab. Here are 3 examples:
```buildoutcfg
1 what does Grégoire Colin appear in? Before the Rain
1 Joe Thomas appears in which movies? The Inbetweeners Movie, The Inbetweeners 2
1 what films did Michelle Trachtenberg star in? Inspector Gadget, Black Christmas, Ice Princess, Harriet the Spy, The Scribbler
```
It is unclear what the `1` is for at the beginning of each line, but it has been removed in the `Dataset` object.
### Data Fields
Here is an example of the raw data ingested by `Datasets`:
```buildoutcfg
{
'answer': 'Before the Rain',
'question': 'what does Grégoire Colin appear in?'
}
```
`answer`: a string containing the answer to a corresponding question.
`question`: a string containing the relevant question.
### Data Splits
The data is split into train, test, and dev sets. The split sizes are as follows:
| wiki-entities_qa_* | n examples|
| ----- | ---- |
| train.txt | 96185 |
| dev.txt | 10000 |
| test.txt | 9952 |
## Dataset Creation
### Curation Rationale
WikiMovies was built with the following goals in mind: (i) machine learning techniques should have ample training examples for learning; and (ii) one can analyze easily the performance of different representations of knowledge and break down the results by question type. The datasetcan be downloaded fromhttp://fb.ai/babi
### Source Data
#### Initial Data Collection and Normalization
[More Information Needed]
#### Who are the source language producers?
[More Information Needed]
### Annotations
#### Annotation process
[More Information Needed]
#### Who are the annotators?
[More Information Needed]
### Personal and Sensitive Information
[More Information Needed]
## Considerations for Using the Data
### Social Impact of Dataset
[More Information Needed]
### Discussion of Biases
[More Information Needed]
### Other Known Limitations
[More Information Needed]
## Additional Information
### Dataset Curators
[More Information Needed]
### Licensing Information
[More Information Needed]
### Citation Information
```
@misc{miller2016keyvalue,
title={Key-Value Memory Networks for Directly Reading Documents},
author={Alexander Miller and Adam Fisch and Jesse Dodge and Amir-Hossein Karimi and Antoine Bordes and Jason Weston},
year={2016},
eprint={1606.03126},
archivePrefix={arXiv},
primaryClass={cs.CL}
```
### Contributions
Thanks to [@aclifton314](https://github.com/aclifton314) for adding this dataset.
# WikiMovies 数据集卡片
## 目录
- [数据集描述](#数据集描述)
- [数据集概要](#数据集概要)
- [支持的任务与评测榜](#支持的任务与评测榜)
- [语言](#语言)
- [数据集结构](#数据集结构)
- [数据实例](#数据实例)
- [数据字段](#数据字段)
- [数据划分](#数据划分)
- [数据集构建](#数据集构建)
- [构建初衷](#构建初衷)
- [源数据](#源数据)
- [标注信息](#标注信息)
- [个人与敏感信息](#个人与敏感信息)
- [数据集使用注意事项](#数据集使用注意事项)
- [数据集的社会影响](#数据集的社会影响)
- [偏差讨论](#偏差讨论)
- [其他已知局限性](#其他已知局限性)
- [附加信息](#附加信息)
- [数据集维护者](#数据集维护者)
- [许可信息](#许可信息)
- [引用信息](#引用信息)
- [贡献](#贡献)
## 数据集描述
- **主页**:[WikiMovies 主页](https://research.fb.com/downloads/babi/)
- **代码仓库**:
- **论文**:[《键值记忆网络用于直接读取文档》(Key-Value Memory Networks for Directly Reading Documents)](https://arxiv.org/pdf/1606.03126.pdf)
- **评测榜**:
- **联系人**:
### 数据集概要
WikiMovies数据集包含基于开放电影数据库(Open Movie Database,OMDb)中的问答对构建的约10万个(模板化)问题,覆盖7.5万个实体。该数据集是电影对话数据集的问答分支。
### 支持的任务与评测榜
- 问答任务(Question Answering)
### 语言
数据集中的文本均为英文。
## 数据集结构
### 数据实例
原始数据由制表符分隔的问答对组成,以下为3个示例:
buildoutcfg
1 what does Grégoire Colin appear in? Before the Rain
1 Joe Thomas appears in which movies? The Inbetweeners Movie, The Inbetweeners 2
1 what films did Michelle Trachtenberg star in? Inspector Gadget, Black Christmas, Ice Princess, Harriet the Spy, The Scribbler
目前尚不明确每行开头的`1`代表何种含义,但在数据集对象中已移除该前缀。
### 数据字段
以下为通过`Datasets`库摄入的原始数据示例:
buildoutcfg
{
'answer': 'Before the Rain',
'question': 'what does Grégoire Colin appear in?'}
`answer`:存储对应问题答案的字符串。
`question`:存储相关问题的字符串。
### 数据划分
数据集划分为训练集、验证集与测试集,各划分的样本量如下:
| wiki-entities_qa_* | 样本数 |
| ----- | ---- |
| train.txt | 96185 |
| dev.txt | 10000 |
| test.txt | 9952 |
## 数据集构建
### 构建初衷
构建WikiMovies数据集旨在实现以下目标:(i) 为机器学习技术提供充足的训练样本以开展学习;(ii) 便于分析不同知识表示形式的性能,并按问题类型拆解实验结果。该数据集可从http://fb.ai/babi下载。
### 源数据
#### 初始数据收集与归一化
[需补充更多信息]
#### 源语言文本创作者
[需补充更多信息]
### 标注信息
#### 标注流程
[需补充更多信息]
#### 标注人员
[需补充更多信息]
### 个人与敏感信息
[需补充更多信息]
## 数据集使用注意事项
### 数据集的社会影响
[需补充更多信息]
### 偏差讨论
[需补充更多信息]
### 其他已知局限性
[需补充更多信息]
## 附加信息
### 数据集维护者
[需补充更多信息]
### 许可信息
[需补充更多信息]
### 引用信息
@misc{miller2016keyvalue,
title={Key-Value Memory Networks for Directly Reading Documents},
author={Alexander Miller and Adam Fisch and Jesse Dodge and Amir-Hossein Karimi and Antoine Bordes and Jason Weston},
year={2016},
eprint={1606.03126},
archivePrefix={arXiv},
primaryClass={cs.CL}
### 贡献
感谢[@aclifton314](https://github.com/aclifton314)添加此数据集。
提供机构:
maas
创建时间:
2025-05-20



