Economic Relevant News from The Guardian
收藏Mendeley Data2019-12-27 更新2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/yt8j2f3hpp/3
下载链接
链接失效反馈官方服务:
资源简介:
The news: The present dataset consists of 1789 news articles from the British daily newspaper The Guardian extracted using the content endpoint of The Guardian Open Platform. The news articles were, at the time, all the news corresponding to the sections: business, politics, society and world news for the entire month of January of 2013 (for a total of 1689 news) and an extra set of news articles randomly selected from the period Febrary of 2013 to December of 2015 (100 news articles). The first set of 1689 news articles was used for training and the second set of 100 news articles was used for testing in two publications: * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2018, November. A Supervised Term-Weighting Method and its Application to Variable Extraction from Digital Media. In XIX Simposio Argentino de Inteligencia Artificial (ASAI)-JAIIO 47 (CABA, 2018). * Maisonnave, M., Delbianco, F., Tohmé, F.A. and Maguitman, A.G., 2019. A Flexible Supervised Term-Weighting Technique and its Application to Variable Extraction and Information Retrieval. Inteligencia Artificial, 22(63), pp.61-80. The labels: The entire dataset was manually classified into two possible categories: economically relevant and irrelevant. The labelling process was carried out by two experts in Economy working in collaboration. For each news article, the full text of the article was analyzed to determine the category. The format: There are two different versions for this dataset: the reduced and the full versions. The former consists of a CSV and a readme file. The CSV file has five columns: "Instance No.", "Title", "Web Publication Date", "web URL" and "Economically Relevant". This version is reduced in columns as it does not include the full article texts; however, it does include all the 1789 instances. Requesting the full dataset: To gain access to the full version of the dataset (which includes the body of the news articles), please send an email to mariano.maisonnave@cs.uns.edu.ar with a copy to openplatform@theguardian.com requesting authorization and making it clear that the data set will not be used for commercial purposes.
新闻数据集说明:本数据集包含来自英国日报《卫报(The Guardian)》的1789篇新闻文章,均通过《卫报》开放平台(The Guardian Open Platform)的内容接口提取获取。这批新闻中,1689篇为2013年1月全月发布于商业、政治、社会与国际新闻栏目的全部新闻,剩余100篇则是从2013年2月至2015年12月期间随机抽取的额外新闻。其中1689篇新闻已被用于模型训练,100篇新闻则用于模型测试,相关研究成果发表于以下两篇文献:* Maisonnave, M.、Delbianco, F.、Tohmé, F.A. 与 Maguitman, A.G.,2018年11月。《监督式词加权方法及其在数字媒体变量提取中的应用》,收录于第19届阿根廷人工智能专题研讨会(XIX Simposio Argentino de Inteligencia Artificial, ASAI)- 第47届拉丁美洲人工智能联合大会(CABA,2018)。* Maisonnave, M.、Delbianco, F.、Tohmé, F.A. 与 Maguitman, A.G.,2019年。《灵活的监督式词加权技术及其在变量提取与信息检索中的应用》,《人工智能(Inteligencia Artificial)》,第22卷第63期,第61-80页。数据集标注:本数据集已被人工划分为两个类别:经济相关与非经济相关。标注工作由两位经济学领域专家合作完成,针对每篇新闻的完整文本进行分析以确定其类别归属。数据集格式:本数据集包含精简版与完整版两种形式。精简版包含一个CSV文件与一个README说明文件,其中CSV文件共包含5列:"Instance No."、"Title"、"Web Publication Date"、"web URL"与"Economically Relevant"。该精简版未包含完整的新闻正文,仅保留列数精简的元数据,但涵盖全部1789条数据实例。完整数据集获取方式:如需获取包含新闻正文的完整版数据集,请向mariano.maisonnave@cs.uns.edu.ar发送邮件,并抄送openplatform@theguardian.com申请授权,同时需明确说明本数据集将不会用于商业用途。
创建时间:
2019-12-27



