All Climate-Related Posts on Reddit from 2005-2021 and Derived Data
收藏DataCite Commons2025-04-01 更新2024-09-03 收录
下载链接:
https://figshare.com/articles/dataset/All_Climate-Related_Posts_on_Reddit_from_2005-2021_and_Derived_Data/26828467/4
下载链接
链接失效反馈官方服务:
资源简介:
To analyze public discourse on climate change and global warming within the vast dataset of Reddit posts from 2005 to 2021, a rigorous filtering process was employed to isolate climate-related discussions. Starting with over 11.5 billion posts, a series of carefully designed regular expressions were used to identify and extract posts explicitly mentioning key terms and phrases associated with climate change. These included "climate change," "global warming," "carbon emissions," and references to significant environmental agreements like the Paris Accord and Kyoto Protocol. The expressions were crafted to capture a wide range of relevant discussions while excluding posts that mentioned "climate" in non-environmental contexts, such as "political climate" or "economic climate." This step was crucial in ensuring that the analysis focused solely on discussions pertinent to global environmental change.<br>After applying these filters, the dataset was narrowed down to approximately 15.3 million posts, representing just 0.134% of the original dataset. To further refine the data, language detection was performed using two independent libraries, Polyglot and LangDetect, to ensure that only English-language posts were included. This dual verification process resulted in a final dataset of approximately 1.5 million posts, all of which were confirmed to be in English.<br>The curated dataset was then subjected to detailed analysis, including sentiment analysis, polarity and subjectivity assessment, and readability evaluation. By focusing on this carefully selected subset of posts, the study was able to provide meaningful insights into how climate change and global warming are discussed across various communities on Reddit. This approach allowed for a nuanced understanding of public engagement with climate-related topics, revealing trends in sentiment, language complexity, and the shifting terminology used in these discussions over time.
为分析2005年至2021年间Reddit平台的海量帖子数据集中关于气候变化(climate change)与全球变暖(global warming)的公共话语,本研究采用严格的筛选流程以分离出气候相关讨论内容。初始数据集包含逾115亿条帖子,研究团队通过一系列精心设计的正则表达式(regular expressions),识别并提取明确提及气候变化相关关键词与短语的帖子,其中涵盖“气候变化(climate change)”“全球变暖(global warming)”“碳排放(carbon emissions)”,以及《巴黎协定(Paris Accord)》《京都议定书(Kyoto Protocol)》等重要环境协定的相关表述。上述正则表达式的设计兼顾了覆盖多类相关讨论的需求,同时排除了在非环境语境中提及“climate”的帖子,例如“政治气候(political climate)”或“经济气候(economic climate)”相关内容。该步骤对确保分析仅聚焦于全球环境变化相关的讨论至关重要。
经上述筛选后,数据集缩减至约1530万条帖子,仅占原始数据集的0.134%。为进一步细化数据集,研究采用Polyglot与LangDetect两个独立的语言检测库进行双重验证,确保仅保留英语帖子。该双重校验流程最终得到约150万条经确认均为英语的最终数据集。
随后,研究团队对经筛选整理的数据集展开详细分析,包括情感分析(sentiment analysis)、极性与主观性评估以及可读性评估。通过聚焦于该精心遴选的帖子子集,本研究得以深入洞察Reddit各社区中气候变化与全球变暖相关讨论的公共话语格局,实现对公众参与气候议题的精细化理解,揭示出情感倾向、语言复杂度以及随时间推移讨论中使用术语的演变趋势。
提供机构:
figshare
创建时间:
2024-08-27



