Kleister NDA 和 Kleister Charity
收藏arXiv2021-05-13 更新2024-06-21 收录
下载链接:
https://github.com/applicaai/kleister-nda.git 和 https://github.com/applicaai/kleister-charity.git
下载链接
链接失效反馈官方服务:
资源简介:
Kleister数据集包括Kleister NDA和Kleister Charity两个子集,由华沙理工大学等机构创建,旨在解决自然语言处理中关键信息提取的挑战。Kleister Charity包含2,788份慈善组织的年度财务报告,涉及61,643页和21,612个实体提取;Kleister NDA则包含540份非披露协议,涉及3,229页和2,160个实体提取。这些数据集通过半监督方法收集,减少了手动工作量,并应用于解决复杂的布局、特定的商业逻辑等实际商业问题。
The Kleister dataset consists of two subsets, Kleister NDA and Kleister Charity, developed by institutions such as Warsaw University of Technology, with the aim of tackling key information extraction challenges in natural language processing (NLP). Kleister Charity includes 2,788 annual financial reports from charitable organizations, covering 61,643 pages and 21,612 entity extraction instances. Kleister NDA, on the other hand, contains 540 non-disclosure agreements (NDAs), spanning 3,229 pages and 2,160 entity extraction instances. These datasets were collected using semi-supervised approaches, which reduced manual annotation workload, and have been applied to resolve practical business problems including complex layout scenarios and domain-specific business logic.
提供机构:
华沙理工大学
创建时间:
2021-05-13
搜集汇总
以上内容由遇见数据集搜集并总结生成



