five

North American News Text Corpus

收藏
DataCite Commons2021-07-01 更新2025-04-16 收录
下载链接:
https://catalog.ldc.upenn.edu/LDC95T21
下载链接
链接失效反馈
官方服务:
资源简介:
<p>North American News Text Corpus is composed of English newswire text formatted using <a href="../../../LDC93T3A">TIPSTER</a>-style SGML markup from the following sources:</p><br> <p>Los Angeles Times/Washington Post Service&nbsp; 05/94-08/97 -&nbsp; 52 million words&nbsp;</p><br> <p>New York Times News 07/94-12/96 - 173 million words</p><br> <p>Reuters News Service 04/94-12/96 - 85 million words</p><br> <p>Wall Street Journal 07/94-12/96 - 40 million words</p><br> <p>The New York Times and the L. A. Times/Washington Post services also&nbsp; include a range of other newspaper sources in their syndicated newswires. The Los Angeles Times/Washington Post material includes the following sources (in lesser amounts) in addition to the two predominant sources:</p><br> <ul><br> <li>Newsday</li><br> <li>The Baltimore Sun</li><br> <li>The Hartford Courant</li><br> </ul><br> <p>The New York Times material contains the following sources in lesser amounts, but New York Times articles predominate:</p><br> <ul><br> <li>Bloomberg Business News</li><br> <li>The Boston Globe</li><br> <li>Los Angeles Daily News</li><br> <li>Fort Worth Star-Telegram</li><br> <li>Newsweek</li><br> <li>Cox News Service</li><br> <li>The Arizona Republic</li><br> <li>Seattle Post-Intelligencer</li><br> <li>San Francisco Examiner</li><br> <li>Houston Chronicle</li><br> <li>San Francisco Chronicle</li><br> <li>Economist Newspaper Ltd.</li><br> <li>Hearst Newspapers</li><br> </ul><br> <p>These newswire services also include small numbers of articles from a larger set of miscellaneous sources. The ones listed above appear with some frequency on a daily basis.</p><br> <h3>Additional Licensing Instructions</h3><br> <p>This 'members-only' corpus is available to current LDC members who can request the data at the listed reduced-license fee.</p></br> Portions © 1994-1996 Dow Jones & Company, Inc., © 1994-1997 Los Angeles Times-Washington Post News Service, Inc., © 1994-1996 New York Times, © 1994-1996 Reuters America, Inc., © 1995-1997 Trustees of the University of Pennsylvania.
提供机构:
Linguistic Data Consortium
创建时间:
2020-11-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作