five

BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech

收藏
Mendeley Data2024-01-31 更新2024-06-28 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2020T20
下载链接
链接失效反馈
官方服务:
资源简介:
Introduction BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by Raytheon BBN Technologies and consists of co-reference annotation on English discussion forum (DF), SMS/Chat and conversational telephone speech (CTS). The DARPA BOLT (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. The Linguistic Data Consortium (LDC) supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference. Data DF data was collected from the web using a combination of manual and automatic processes. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Arabic and Chinese CALLHOME and CALLFRIEND telephone collections; the audio files were transcribed and translated into English. Co-reference annotation aims to fill in all of the connections between specific mentions in the text that refer to the same entities and events in the discourse context. BOLT co-reference annotation was performed on BOLT treebank annotation. It covers noun phrases (including proper nouns, nominals, pronouns and null arguments), possessives, proper noun pre-modifiers and verbs. Annotation files are presented in UTF-8 encoded XML format. Acknowledgements This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred. Samples Please view these samples: CTS Sample (TXT) DF Sample (TXT) SMS Sample (TXT) Updates None at this time. Copyright Portions © 1996, 1997, 2011- 2020 Trustees of the University of Pennsylvania
创建时间:
2024-01-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作