five

tuple_ie

收藏
魔搭社区2025-07-04 更新2025-05-31 收录
下载链接:
https://modelscope.cn/datasets/allenai/tuple_ie
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for TupleInf Open IE ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Homepage:** [Tuple IE Homepage](https://allenai.org/data/tuple-ie) - **Repository:** - **Paper:** [Answering Complex Questions Using Open Information Extraction](https://www.semanticscholar.org/paper/Answering-Complex-Questions-Using-Open-Information-Khot-Sabharwal/0ff595f0645a3e25a2f37145768985b10ead0509) - **Leaderboard:** - **Point of Contact:** ### Dataset Summary The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in “Answering Complex Questions Using Open Information Extraction” (referred as Tuple KB, T). These sentences were collected from a large Web corpus using training questions from 4th and 8th grade as queries. This dataset contains 156K sentences collected for 4th grade questions and 107K sentences for 8th grade questions. Each sentence is followed by the Open IE v4 tuples using their simple format. ### Supported Tasks and Leaderboards [More Information Needed] ### Languages The text in the dataset is in English, collected from a large Web corpus using training questions from 4th and 8th grade as queries. ## Dataset Structure ### Data Instances This dataset contains setences with corresponding relation tuples extracted from each sentence. Each instance should contain a sentence and followed by the [Open IE v4](https://github.com/allenai/openie-standalone) tuples using their *simple format*. An example of an instance: ```JSON { "sentence": "0.04593 kg Used a triple beam balance to mass a golf ball.", "tuples": { "score": 0.8999999761581421, "tuple_text": "(0.04593 kg; Used; a triple beam balance; to mass a golf ball)", "context": "", "arg1": "0.04593 kg", "rel": "Used", "arg2s": ["a triple beam balance", "to mass a golf ball"], } } ``` ### Data Fields - `sentence`: the input text/sentence. - `tuples`: the extracted relation tuples from the sentence. - `score`: the confident score for each tuple. - `tuple_text`: the relationship representation text of the extraction, in the *simple format* of [Open IE v4](https://github.com/allenai/openie-standalone). - `context`: an optional representation of the context for this extraction. Defaults to `""` if there's no context. - `arg1`: the first argument in the relationship. - `rel`: the relation. - `arg2s`: a sequence of the 2nd arguments in the realtionship. ### Data Splits | name | train| |-----------|-----:| | all |267719| | 4th_grade |158910| | 8th_grade |108809| ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information ```bibtex @article{Khot2017AnsweringCQ, title={Answering Complex Questions Using Open Information Extraction}, author={Tushar Khot and A. Sabharwal and Peter Clark}, journal={ArXiv}, year={2017}, volume={abs/1704.05572} } ``` ### Contributions Thanks to [@mattbui](https://github.com/mattbui) for adding this dataset.

# TupleInf 开放信息抽取(Open Information Extraction,简称Open IE)数据集卡片 ## 目录 - [数据集描述](#dataset-description) - [数据集概述](#dataset-summary) - [支持任务与排行榜](#supported-tasks-and-leaderboards) - [语言](#languages) - [数据集结构](#dataset-structure) - [数据实例](#data-instances) - [数据字段](#data-fields) - [数据划分](#data-splits) - [数据集构建](#dataset-creation) - [数据集遴选依据](#curation-rationale) - [源数据](#source-data) - [标注信息](#annotations) - [个人与敏感信息](#personal-and-sensitive-information) - [数据集使用注意事项](#considerations-for-using-the-data) - [数据集的社会影响](#social-impact-of-dataset) - [偏差讨论](#discussion-of-biases) - [其他已知局限性](#other-known-limitations) - [附加信息](#additional-information) - [数据集维护者](#dataset-curators) - [许可信息](#licensing-information) - [引用信息](#citation-information) - [贡献致谢](#contributions) ## 数据集描述 - **主页:** [Tuple IE 主页](https://allenai.org/data/tuple-ie) - **代码仓库:** - **论文:** [《使用开放信息抽取解答复杂问题》](https://www.semanticscholar.org/paper/Answering-Complex-Questions-Using-Open-Information-Khot-Sabharwal/0ff595f0645a3e25a2f37145768985b10ead0509) - **排行榜:** - **联系方式:** ### 数据集概述 TupleInf 开放信息抽取数据集包含从263,000句文本中抽取得到的开放信息抽取三元组,这些文本曾被用于论文《使用开放信息抽取解答复杂问题》中的求解器(该知识库记为Tuple KB,记为T)。这些文本源自大型网络语料库,采集时以四年级和八年级的训练问题作为查询关键词。本数据集包含156,000句源自四年级训练问题的文本,以及107,000句源自八年级训练问题的文本。每句文本后均附带采用简单格式的Open IE v4三元组。 ### 支持任务与排行榜 [需补充更多信息] ### 语言 本数据集中的文本均为英文,采集自大型网络语料库,采集时以四年级和八年级的训练问题作为查询关键词。 ## 数据集结构 ### 数据实例 本数据集包含带有对应关系三元组的句子,这些三元组从每句文本中抽取得到。每个数据实例应包含一句文本,并附带采用**简单格式**的[开放信息抽取v4(Open IE v4)](https://github.com/allenai/openie-standalone)三元组。 一个数据实例示例: JSON { "sentence": "0.04593 kg Used a triple beam balance to mass a golf ball.", "tuples": { "score": 0.8999999761581421, "tuple_text": "(0.04593 kg; Used; a triple beam balance; to mass a golf ball)", "context": "", "arg1": "0.04593 kg", "rel": "Used", "arg2s": ["a triple beam balance", "to mass a golf ball"], } } ### 数据字段 - `sentence`: 输入文本/语句。 - `tuples`: 从该语句中抽取得到的关系三元组。 - `score`: 单个三元组的置信度得分。 - `tuple_text`: 抽取结果的关系表示文本,采用[开放信息抽取v4(Open IE v4)](https://github.com/allenai/openie-standalone)的**简单格式**。 - `context`: 该抽取结果的可选上下文表示,若无上下文则默认为`""`。 - `arg1`: 关系中的第一个论元。 - `rel`: 关系谓词。 - `arg2s`: 关系中的第二论元序列。 ### 数据划分 | 名称 | 样本数量 | |-----------|---------:| | 全部 | 267719 | | 四年级组 | 158910 | | 八年级组 | 108809 | ## 数据集构建 ### 数据集遴选依据 [需补充更多信息] ### 源数据 #### 初始数据采集与标准化 [需补充更多信息] #### 源语言生产者是谁? [需补充更多信息] ### 标注信息 #### 标注流程 [需补充更多信息] #### 标注人员是谁? [需补充更多信息] ### 个人与敏感信息 [需补充更多信息] ## 数据集使用注意事项 ### 数据集的社会影响 [需补充更多信息] ### 偏差讨论 [需补充更多信息] ### 其他已知局限性 [需补充更多信息] ## 附加信息 ### 数据集维护者 [需补充更多信息] ### 许可信息 [需补充更多信息] ### 引用信息 bibtex @article{Khot2017AnsweringCQ, title={Answering Complex Questions Using Open Information Extraction}, author={Tushar Khot and A. Sabharwal and Peter Clark}, journal={ArXiv}, year={2017}, volume={abs/1704.05572} } ### 贡献致谢 感谢 [@mattbui](https://github.com/mattbui) 为本数据集提供的贡献。
提供机构:
maas
创建时间:
2025-05-28
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作