Guardian Transportation Dataset (GT202201)
收藏NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://data.mendeley.com/datasets/yvxx6s5xhh
下载链接
链接失效反馈官方服务:
资源简介:
The Guardian Transportation Dataset (GT202201) comprises all transport-related articles from a UK-based newspaper “The Guardian”. We collected the dataset using a web scraping technique. All the articles were collected from the newspaper that contain the word “transport” in the title of the news, the full text of the news article, or the meta-information about the article. The dataset comprises about 14,855 articles belonging to the time period from September 1825 to January 2022. Each document in the dataset has five attributes: News Article, Heading, Article Link, Topic, and Publication Date. This dataset was built to discover parameters for public, governance, and political aspects of transportation as part of our deep journalism approach and DeepJournal tool. The deep journalism approach uses big data, deep learning, and digital methods to discover and analyse cross-sectional multi-perspective information to enable better decision making and develop better instruments for academic, corporate, national, and international governance. We discovered a total of 25 parameters from this dataset and grouped them into 6 macro-parameters, namely Road Transport, Rail Transport, Air Transport, Crash & Safety, Disruptions & Causes, and Employment Rights, Disputes, & Strikes.
The other two transportation datasets related to this dataset used in the deep journalism approach include Traffic Technology Today Transportation Dataset (TTIT202201: http://dx.doi.org/10.17632/k4bgjwktyp.1) and Web of Science Transportation Dataset (WST202201: http://dx.doi.org/10.17632/tnfw2dh5nj.1).
Further details of the dataset, its collection, and usage for deep journalism including detection of the multi-perspective parameters for transportation can be found in our article here: https://doi.org/10.3390/su14095711.
《卫报交通数据集(GT202201)》收录自英国报纸《卫报(The Guardian)》的全部交通相关文章。本数据集通过网页抓取技术采集,筛选条件为文章标题、正文或元信息中包含关键词“transport”。数据集的时间跨度为1825年9月至2022年1月,共计约14855篇文档。数据集内每份文档包含5项属性:新闻正文(News Article)、标题(Heading)、文章链接(Article Link)、主题(Topic)及发布日期(Publication Date)。本数据集作为深度新闻报道方法及DeepJournal工具的组成部分,旨在挖掘交通领域公共、治理与政治维度的相关参数。深度新闻报道方法依托大数据、深度学习与数字化手段,挖掘并分析跨维度多视角信息,以支撑更科学的决策制定,并为学术、企业、国家及国际治理开发更完善的分析工具。本研究从该数据集中共提取25项参数,并将其划分为6大类宏观参数,分别为:道路运输、铁路运输、航空运输、事故与安全、运行中断及成因、就业权益、劳资纠纷与罢工。
本深度新闻报道方法中用到的另外两个相关交通数据集分别为今日交通科技(Traffic Technology Today)交通数据集(TTIT202201: http://dx.doi.org/10.17632/k4bgjwktyp.1)与Web of Science交通数据集(WST202201: http://dx.doi.org/10.17632/tnfw2dh5nj.1)。
该数据集的详细信息、采集流程及在深度新闻报道中的应用(包括交通多视角参数挖掘)均可参阅本文献:https://doi.org/10.3390/su14095711。
创建时间:
2022-05-16



