five

MarkupMnA

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8034852
下载链接
链接失效反馈
官方服务:
资源简介:
The MarkupMnA dataset is a corpus of 151 merger and acquisition agreements with annotated sections titles, section numbers, page numbers, and more, based on HTML filings by US public companies retrieved from the SEC EDGAR database. We consider the task of section title annotation as a sequence labeling task, and to that end, use the BEIOS tagging scheme when generating our annotations. There are over 70,000 labels in the entire dataset excluding outside labels and over 465,000 labels including outside labels. We add annotations to the contracts in an already widely used dataset, MAUD, which is an expert-annotated reading comprehension dataset. The broad objective of our work is to make progress toward developing computationally efficient hierarchical representations of long documents, specifically for legal contracts. We hope that our annotations can be used in conjunction with MAUD to advance legal NLP research.
创建时间:
2023-06-15
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作