Chinese Abstract Meaning Representation 2.0
收藏DataCite Commons2021-07-19 更新2024-07-13 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC2021T13
下载链接
链接失效反馈官方服务:
资源简介:
<h3>Introduction</h3><br>
<p>Chinese Abstract Meaning Representation (CAMR) 2.0 was developed by <a href="https://www.brandeis.edu/">Brandeis University</a> and <a href="http://en.njnu.edu.cn/">Nanjing Normal University</a> and is comprised of semantic representations of a set of approximately 20,000 Chinese sentences from <a href="../../../LDC2013T21">Chinese Treebank (CTB) 8.0 (LDC2013T21)</a>. CAMR 2.0 includes the content of <a href="../../../LDC2019T07">Chinese Abstract Meaning Representation 1.0 (LDC2019T07)</a> (CTB 8.0 weblog and discussion forum sentences), plus an additional 9,933 sentences from the newswire portion of CTB 8.0.</p><br>
<p>Abstract Meaning Representation (AMR) captures "who is doing what to whom" in a sentence. Each sentence is paired with a graph that represents its whole-sentence meaning in a tree structure. LDC has released the following AMR English data sets: <a href="../../../LDC2014T12">Abstract Meaning Representation (AMR) Annotation Release 1.0 (LDC2014T12)</a>, <a href="../../../LDC2017T10">Abstract Meaning Representation (AMR) Annotation Release 2.0 (LDC2017T10)</a> and <a href="../../../LDC2020T02">Abstract Meaning Representation (AMR) Annotation Release 3.0 (LDC2020T02)</a>.</p><br>
<p>Chinese AMR is constructed following the basic principles developed for English: a compact, readable, whole-sentence semantic representation, while making adaptions where necessary to handle Chinese-specific phenomena. For more information about the project, see the <a href="http://www.cs.brandeis.edu/~clp/camr/camr.html">Chinese AMR homepage</a>.</p><br>
<h3>Data</h3><br>
<p>The text contains 20,078 sentences from the weblog, discussion forum, and newswire portions of CTB 8.0. Three sets of files are included: the original Chinese AMR data with concept-to-word and relation-to-word alignments, a converted English AMR format, and a Chinese syntactic dependency tree format. Each set is divided into training, development and test sets, and all files are presented as plain text in UTF-8 encoding.</p><br>
<h3>Samples</h3><br>
<p>Please view this <a href="desc/addenda/LDC2021T13.txt">sample (TXT)</a>.</p><br>
<h3>Updates</h3><br>
<p>None at this time.</p></br>
Portions © 2006 Agence France Presse, © 2006 Anhui TV, © 2005 Cable News Network, LP, LLLP, © 2000-2001 China Broadcasting System, © 2000-2001, 2005-2006 China Central TV, © 2000-2001 China National Radio, © 2006 Chinanews.com, © 2000-2001 China Television System, © 2006 Guangming Daily, © 2006 National Broadcasting Company, Inc. © 2006 New Tang Dynasty TV, © 2006 Peoples Daily Online, © 2005-2006 Phoenix TV, © 1996-2001 Sinorama Magazine, © 1997 The Government of the Hong Kong Special Administrative Region, © 1994-1998, 2006 Xinhua News Agency, © 2019, 2021 Bin Li, © 2001, 2004, 2005, 2007, 2009, 2010, 2013, 2019, 2021 Trustees of the University of Pennsylvania
提供机构:
Linguistic Data Consortium
创建时间:
2021-07-07



