five

kqsong/OASum

收藏
Hugging Face2023-07-03 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/kqsong/OASum
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-sa-3.0 language: - en tags: - summarization - Wikipedia size_categories: - 1M<n<10M task_categories: - summarization --- # Dataset Card for OASum Dataset ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Usage](#dataset-usage) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Repository:** [OASum Dataset repository](https://github.com/tencent-ailab/OASum) - **Paper:** [OASum: Large-Scale Open Domain Aspect-based Summarization](https://arxiv.org/pdf/2212.09233.pdf) The OASum Dataset is an English-language dataset containing over 3.6M document, aspect, and summary triplets. ## Dataset Usage You can directly download it with huggingface datasets. ``` python from datasets import load_dataset dataset = load_dataset("kqsong/OASum") ``` ## Dataset Structure ### Data Instances For each instance, there is a list of strings for the document, a list of strings for the summary, a string for the document title, a string for the aspect and a list of indices for the sentences in the corresponding section. ```json { "title": "Ker's WingHouse Bar & Grill", "document":[ "After Clearwater, Florida chicken wing pioneering restaurant chain Hooters began rapidly expanding, Florida based, Canadian-born restaurant entrepreneur Ed Burnett saw the opportunity.", "Burnett secured the rights to a closed restaurant (\"Knockers\") and opened \"The WingHouse\" restaurant at 7369 Ulmerton Road, Largo, Florida, a high traffic corridor.", "He strategically selected the restaurant in between where people work (commercial real estate) and live (residential real estate), to appeal to the local lunch crowd and family dining crowd.", "This flagship location proved to be a success soon after launching and is the model that the chain expanded on.", "Burnett, looking to expand to additional locations, accepted a financing partner (Crawford Ker) during this time frame, to open additional locations and beyond.", "Burnett's goal was to open 20 to 50 locations, and then sell the chain to a larger restaurant chain or investors.", "Burnett would ultimately regret his choice of investor.","In 1992, Ker retired from the NFL and took a job selling cars at a local dealer.", "In 1994, he invested half interest in a Largo, Florida wing restaurant called, \"Wing House\" that imitated Hooters.", "The restaurant was always The Wing House, and the atmosphere was always toned down to make it more family friendly.", "The restaurant did well and two additional locations were opened in the Tampa Bay area in the following three years.", "Ker won a $1.2-million jury award from Hooters in late 2004, which had sued him for trademark violations for allegedly using their uniforms and decor.", "After a three-week trial in which lawyers discussed hula hoops, surfboards, scrunchy socks, pantyhose, and something called \"vicarious sexual recreation\", the jury ruled that no trademark infringement existed and Hooters was penalized for their frivolous lawsuit.", "Hooters appealed the decision, but in June, 2006, the 11th U.S. Circuit Court of Appeals in Atlanta upheld the verdict.", "As of 2007, the company had 1,700 employees at 22 locations with revenue of nearly $60 million.", "Ker attended, and the company participated in, the 2007 National Buffalo Wing Festival and placed first in the \"traditional x-hot sauce\" category and gained some national recognition.", "On June 4, 2008 the company announced the launch of its national franchise program.", "In mid-2008 the chain operated 19 locations in Florida and Texas and expected to add six franchises by the end of 2008, and 48 by 2011.", "The initial focus was for franchises in the Southeastern US.", "WingHouses feature several amenities that differ from other wing restaurants, including Hooters.", "There is a full liquor bar in every store, sports memorabilia line the walls instead of NASCAR and most locations include a game room.", "Super Bowl XLIII in Tampa, Florida attracted the rich and famous; WingHouse hosted three events to raise money for charity." ], "aspect": "Opening", "aspect_sents": [0,1,2,3,4,5,6,7,8,9,10], "summary":[ "WingHouse Bar & Grill (formerly Ker\u2019s WingHouse Bar & Grill) is a restaurant chain based in Florida, created and founded by Ed Burnett, a Canadian restaurant entrepreneur.", "After opening his first WingHouse location, Burnett sought out investors to open additional WingHouse locations.", "Burnett accepted investor Crawford Ker (a former National Football League player) to assist financing the expansion." ] } ``` The average token count for the articles and the highlights are provided below: | Feature | Mean Token Count | | ---------- | ---------------- | | Document | 1,612 | | Summary | 40 | ### Data Fields - `title`: a string, containing the original Wikipedia title. - `document`: a list of sentences, containing the original content in the Wikipedia sections except the first abstract section. - `aspect`: a string, containing the section name and its parent section names. - `aspect_sents`: a list of indices, representing the sentences in the `aspect` section. - `summary`: a list of sentences, the corresponding aspect-based summary for the document. ### Data Splits The OASum dataset has 3 splits: _train_, _valid_, and _test_. Below are the statistics for the Version 1.0.0 of the dataset. | Dataset Split | Number of Instances in Split | | ------------- | ------------------------------------------- | | Train | 3,523,986 | | Validation | 111,578 | | Test | 112,005 | ## Additional Information ### Licensing Information The OASum Dataset version 1.0.0 is released under the [CC-BY-SA-3.0 License](https://en.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_3.0_Unported_License) ### Citation Information ``` @article{yang2022oasum, title={Oasum: Large-scale open domain aspect-based summarization}, author={Yang, Xianjun and Song, Kaiqiang and Cho, Sangwoo and Wang, Xiaoyang and Pan, Xiaoman and Petzold, Linda and Yu, Dong}, journal={arXiv preprint arXiv:2212.09233}, year={2022} } ```
提供机构:
kqsong
原始信息汇总

数据集概述

数据集名称

  • 名称: OASum Dataset

数据集描述

  • 语言: 英语
  • 标签:
    • 摘要
    • 维基百科
  • 大小: 1M<n<10M
  • 任务类别: 摘要
  • 内容: 包含超过3.6M的文档、方面和摘要三元组

数据集使用

  • 下载方式: 通过huggingface datasets直接下载

数据集结构

  • 数据实例: 每个实例包含文档、摘要、文档标题、方面和相应部分的句子索引列表
  • 数据字段:
    • title: 字符串,原始维基百科标题
    • document: 句子列表,原始维基百科内容(除第一摘要部分外)
    • aspect: 字符串,包含部分名称及其父部分名称
    • aspect_sents: 索引列表,表示aspect部分的句子
    • summary: 句子列表,对应文档的方面基础摘要
  • 数据分割:
    • train: 3,523,986实例
    • valid: 111,578实例
    • test: 112,005实例

附加信息

  • 许可证: CC-BY-SA-3.0

  • 引用信息:

    @article{yang2022oasum, title={Oasum: Large-scale open domain aspect-based summarization}, author={Yang, Xianjun and Song, Kaiqiang and Cho, Sangwoo and Wang, Xiaoyang and Pan, Xiaoman and Petzold, Linda and Yu, Dong}, journal={arXiv preprint arXiv:2212.09233}, year={2022} }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作