Mandarin Chinese News Text
收藏DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC95T13
下载链接
链接失效反馈官方服务:
资源简介:
The Linguistic Data Consortium (LDC) announces the availability of a Mandarin Chinese text corpus. This corpus includes about 250 million GB-encoded text characters. <p> The Mandarin News Corpus includes text from various journalistic sources: </p><ul> <li>newspaper text from Renmin Ribao (People's Daily) </li> <li>radio scripts from China Radio International </li> <li>newswire text from Xinhua newswire service </li> </ul> The format of this corpus uses a labeled bracketing, expressed in the style of SGML (Standard Generalized Markup Language). The header fields provided by the sources, which give information such as topic, date and article ID, have been retained. The articles cover a variety of topics, including international and domestic news, sports and culture. </br>
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30



