five

ECTSum

收藏
arXiv2022-10-27 更新2024-06-21 收录
下载链接:
https://github.com/rajdeep345/ECTSum
下载链接
链接失效反馈
官方服务:
资源简介:
ECTSum是首个针对金融领域长文档摘要的数据集,由印度理工学院卡拉格普尔分校计算机科学与工程系创建。该数据集包含2425份上市公司财报电话会议的转录文档及其对应的专家编写摘要。数据集的创建过程涉及从The Motley Fool网站爬取文档,并从Reuters获取相应的摘要。ECTSum旨在解决金融文档摘要自动化的问题,特别是在处理长篇、无结构文档时,要求模型能够精确捕捉关键财务指标并保持事实一致性。

ECTSum is the first dataset dedicated to long-document summarization in the financial domain, developed by the Department of Computer Science and Engineering at the Indian Institute of Technology Kharagpur. This dataset includes 2425 transcribed documents of earnings calls from publicly listed companies, paired with expert-written summaries. The dataset was constructed by scraping documents from The Motley Fool website and acquiring corresponding summaries from Reuters. ECTSum aims to address the challenges of automated financial document summarization, particularly when handling lengthy, unstructured documents, requiring models to accurately capture critical financial metrics and maintain factual consistency.
提供机构:
印度理工学院卡拉格普尔分校计算机科学与工程系
创建时间:
2022-10-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作