Sentiment and topic analysis of LastQuake app user's comments - 26th November 2019 Albania earthquake
收藏data.ncl.ac.uk2023-05-31 更新2025-01-15 收录
下载链接:
https://data.ncl.ac.uk/articles/dataset/Sentiment_and_topic_analysis_of_LastQuake_app_user_s_comments_-_26th_November_2019_Albania_earthquake/22312246/2
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains the sentiment and topic analysis (supervised classification) posted by LastQuake app users about the 19th November 2019 Albania earthquake. LastQuake app is a crowdsource-based earthquake information app that allows eyewitnesses to share information about the earthquake they felt, combined with seismic data. This app was developed by the European Mediterranean Seismological Centre (EMSC). Attributes and data contained in the dataset are:
- Eq_t0: Origin time (UTC) of the intensity report.
- Intensity: report of intensity felt (before leaving a comment, users must leave a report).
-Epidist: distance from the event of the comment in Kilometres.
-Device: tool from which the comment was left, i.e. desktop, mobile or app.
- Comment in Albanian: original comments posted by LastQuake app users in Albanian.
- Translation to English: original comments in Albanian translated to English by Dr Enes Veliu (Native speaker).
- Sentiment analysis (SA): classification of the comment into a polarity, i.e. positive, negative, neutral or irrelevant.
- Topic (TA): classification of the comment into a specific topic, i.e. Building damages, distress, emergency response, governance, injured and casualties, intensity, preparedness, seismic information, solidarity messages, tsunami, urban facilities or unrelated.
Collecting data after an earthquake is essential to determine the phenomenon's impact on the population and built environment. To determine this impact, data must be collected on the number of injured or casualties among the population, buildings and infrastructure damaged. In 2018 The European Mediterranean Seismological Centre (EMSC) launched a multichannel rapid information system comprising websites, a Twitter quakebot, and a smartphone app for global earthquake eyewitnesses: the LastQuake app. This app collects a number of reports from users that could help provide rapid situation awareness. However, text data collected through crowdsourcing platforms such as the LastQuake app is unstructured. Therefore, natural language processing (NLP) techniques such as sentiment and topic analysis are necessary to extract meaningful information. Sentiment analysis, also called opinion mining is the field of study that classifies people's opinions, expressed in written text, into a specific polarity, i.e.positive, negative or neutral. The topic analysis is another NLP technique that extracts text meaning by identifying recurrent themes or topics. On and after the November 26th 2019, earthquake in Albania, the LastQuake app recorded 28,220 reports from users. For the current analysis, we took a sample of comments posted on the exact day of the earthquake written in Albanian: 1678 comments (6%). Comments were translated into English and classified into polarity and topics defined for previous earthquakes based on similar datasets. The most frequent polarity detected in comments from LastQuake app users was negative (51%), followed by far by positive and neutral. The most frequent topic tackled in comments from users was intensity (36%), followed by distress (32%) and seismic information (17%). Unfortunately, unrelated comments with inappropriate language represented 5% of comments in the sample. The most frequent polarity and topic detected were expected, given that they report about a disaster and that the LastQuake app was developed to report intensity. The remarkable finding is the high number of comments reporting distress, expressed with a positive polarity as prayers or a negative polarity as cursing. These distressing comments surpass by far comments that contain seismic information, emergency response actions, and reports of building damages or injured and casualties. This finding allows us to conclude that it is necessary to improve the preparedness among the population at the individual or community level in Albania to face the aftermath of an earthquake and probably aftershocks.
本数据集收录了由 LastQuake 应用用户就 2019 年 11 月 19 日阿尔巴尼亚地震发布的情感及主题分析(监督分类)数据。LastQuake 应用是一款基于众包的地震信息应用,允许目击者分享他们所感受到的地震信息,并结合地震数据进行综合。该应用由欧洲地中海地震中心(EMSC)开发。数据集中包含的属性和数据如下:
- Eq_t0:强度报告的起源时间(UTC)。
- Intensity:感受到的强度报告(在发表评论之前,用户必须提交报告)。
- Epidist:评论事件距离的公里数。
- Device:发表评论的设备,即桌面、移动或应用。
- 阿尔巴尼亚语评论:LastQuake 应用用户发表的原始阿尔巴尼亚语评论。
- 英文翻译:由 Dr Enes Veliu(母语人士)翻译的阿尔巴尼亚语原始评论。
- 情感分析(SA):将评论分类为特定的极性,即积极、消极、中性或无关。
- 主题(TA):将评论分类为特定的主题,例如建筑损坏、困境、应急响应、治理、伤员及伤亡、强度、准备情况、地震信息、团结信息、海啸、城市设施或无关。
收集地震后的数据对于确定该现象对人口和建筑环境的影响至关重要。为了确定这种影响,必须收集有关人口中受伤或伤亡人数、建筑物和基础设施损坏情况的数据。2018年,欧洲地中海地震中心(EMSC)启动了一个多渠道快速信息系统,包括网站、Twitter 地震机器人以及一款用于全球地震目击者的智能手机应用:LastQuake 应用。该应用收集了来自用户的大量报告,有助于提供快速的情况意识。然而,通过 LastQuake 应用等众包平台收集的文本数据是非结构化的。因此,需要使用自然语言处理(NLP)技术,如情感和主题分析,以提取有意义的 信息。情感分析,也称为观点挖掘,是研究将人们以书面形式表达的观点分类为特定极性的领域,即积极、消极或中性。主题分析是另一种 NLP 技术,通过识别重复的主题或主题来提取文本含义。自 2019 年 11 月 26 日起,阿尔巴尼亚地震中,LastQuake 应用记录了来自用户的 28,220 条报告。对于当前的分析,我们选取了地震当天以阿尔巴尼亚语发表的评论样本:1678 条评论(6%)。评论被翻译成英文,并根据先前地震的类似数据集定义的极性和主题进行分类。在 LastQuake 应用用户的评论中,检测到的最频繁的极性是消极(51%),其次是积极和中和。在用户的评论中,最频繁涉及的主题是强度(36%),其次是困境(32%)和地震信息(17%)。不幸的是,样本中5%的评论是不相关的,并含有不恰当的语言。检测到的最频繁的极性和主题是预期的,因为它们报告了灾害,且 LastQuake 应用是为了报告强度而开发的。令人瞩目的发现是,大量评论报告了困境,这些评论以积极极性(如祈祷)或消极极性(如诅咒)表达。这些困境评论远远超过了包含地震信息、应急响应行动和建筑损坏或伤员及伤亡报告的评论。这一发现使我们得出结论,有必要提高阿尔巴尼亚在个人或社区层面的地震及可能的余震后的准备情况。”}
提供机构:
Newcastle University



