five

Guardian Transportation Dataset (GT202201)

收藏
Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/yvxx6s5xhh
下载链接
链接失效反馈
官方服务:
资源简介:
The Guardian Transportation Dataset (GT202201) comprises all transport-related articles from a UK-based newspaper “The Guardian”. We collected the dataset using a web scraping technique. All the articles were collected from the newspaper that contain the word “transport” in the title of the news, the full text of the news article, or the meta-information about the article. The dataset comprises about 14,855 articles belonging to the time period from September 1825 to January 2022. Each document in the dataset has five attributes: News Article, Heading, Article Link, Topic, and Publication Date. This dataset was built to discover parameters for public, governance, and political aspects of transportation as part of our deep journalism approach and DeepJournal tool. The deep journalism approach uses big data, deep learning, and digital methods to discover and analyse cross-sectional multi-perspective information to enable better decision making and develop better instruments for academic, corporate, national, and international governance. We discovered a total of 25 parameters from this dataset and grouped them into 6 macro-parameters, namely Road Transport, Rail Transport, Air Transport, Crash & Safety, Disruptions & Causes, and Employment Rights, Disputes, & Strikes. The other two transportation datasets related to this dataset used in the deep journalism approach include Traffic Technology Today Transportation Dataset (TTIT202201: http://dx.doi.org/10.17632/k4bgjwktyp.1) and Web of Science Transportation Dataset (WST202201: http://dx.doi.org/10.17632/tnfw2dh5nj.1). Further details of the dataset, its collection, and usage for deep journalism including detection of the multi-perspective parameters for transportation can be found in our article here: https://doi.org/10.3390/su14095711.

《卫报交通数据集(GT202201)》涵盖英国报纸《卫报(The Guardian)》刊发的全部交通相关报道。本数据集通过网络爬虫技术采集所得,所收录的报道均满足:新闻标题、报道正文或文章元信息中包含“transport”一词。数据集共包含约14855篇报道,时间跨度为1825年9月至2022年1月。数据集中每篇文档均包含5个属性字段:报道全文(News Article)、标题(Heading)、报道链接(Article Link)、主题(Topic)及发布日期(Publication Date)。本数据集作为深度新闻学方法(deep journalism approach)与DeepJournal工具的组成部分,旨在挖掘交通领域公共、治理及政治维度的相关参数。该深度新闻学方法依托大数据、深度学习与数字化手段,挖掘并分析跨截面多视角信息,以辅助科学决策,并为学术、企业、国家及国际治理开发更完善的分析工具。本研究从该数据集中共挖掘出25项参数,并将其归为6大类宏观参数,分别为:道路运输(Road Transport)、铁路运输(Rail Transport)、航空运输(Air Transport)、事故与安全(Crash & Safety)、交通中断及诱因(Disruptions & Causes)、就业权益、劳资纠纷与罢工(Employment Rights, Disputes, & Strikes)。 本深度新闻学研究所使用的另外两个相关交通数据集包括《Traffic Technology Today交通数据集(TTIT202201:http://dx.doi.org/10.17632/k4bgjwktyp.1)》及《Web of Science交通数据集(WST202201:http://dx.doi.org/10.17632/tnfw2dh5nj.1)》。 有关本数据集的详细信息、采集方法,以及其在深度新闻学研究中的应用(包括交通领域多视角参数的挖掘),可参阅我们的相关论文:https://doi.org/10.3390/su14095711。
创建时间:
2022-05-16
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作