BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech
收藏DataCite Commons2021-03-16 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2021T07
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3></br>
<p>BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech was developed by <a href="https://www.raytheon.com/ourcompany/bbn/">Raytheon BBN Technologies </a> and consists of co-reference annotation on Chinese discussion forum (DF), SMS/Chat and conversational telephone speech (CTS).</p></br>
<p>The DARPA <a href="https://www.ldc.upenn.edu/collaborations/current-projects/bolt"> BOLT</a> (Broad Operational Language Translation) program developed machine translation and information retrieval for less formal genres, focusing particularly on user-generated content. LDC supported the BOLT program by collecting informal data sources -- discussion forums, text messaging and chat -- in Chinese, Egyptian Arabic and English. The collected data was translated and annotated for various tasks including word alignment, treebanking, propbanking and co-reference.</p></br>
<h3>Data</h3></br>
<p>DF data was collected from the web using a combination of manual and automatic processes. SMS/Chat material was donated or collected via live platforms. CTS data was taken from LDC's Chinese CALLHOME and CALLFRIEND telephone collections.</p></br>
<p>Co-reference annotation aims to fill in all of the connections between specific mentions in the text that refer to the same entities and events in the discourse context. BOLT co-reference annotation was performed on BOLT treebank annotation. It covers noun phrases (including proper nouns, nominals, pronouns and null arguments), possessives, proper noun pre-modifiers and verbs.</p></br>
<p>Annotation files are presented in UTF-8 encoded XML format.</p></br>
<h3>Sponsorship</h3></br>
<p>This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) under Contract No. HR0011-11-C-0145. The content does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.</p></br>
<h3>Samples</h3></br>
<p>Please view these samples:</p></br>
<ul></br>
<li><a href="desc/addenda/LDC2021T07.sms.txt">SMS/Chat Sample (TXT)</a></li></br>
<li><a href="desc/addenda/LDC2021T07.df.txt">DF Sample (TXT)</a></li></br>
<li><a href="desc/addenda/LDC2021T07.cts.txt">CTS Sample (TXT)</a></li></br>
</ul></br>
<h3>Updates</h3></br>
<p>None at this time.</p></br>
Portions © 1996, 2012-2016, 2021 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2021-03-10



