Sentiment Analysis of COVID-19 Scientific Publication Dissemination on Social Media X: A Dataset Analyzed with ChatGPT 3.5 and Gemini 1.5 Flash
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14919673
下载链接
链接失效反馈官方服务:
资源简介:
The dataset provided includes a sample of posts on X that mentioned the editorial published in the journal “Dying in a Leadership Vacuum” on October 7, 2020, with the title “Dying in a Leadership Vacuum” (DOI: 10.1056/NEJMe2029812).
A sample of posts on X that referenced the publication was collected. The posts were extracted from the Altmetric platform using a Python 3.12 algorithm with the Beautiful Soup 4.12 library and the Google Colab development environment. As a result, a dataset was generated containing 9,792 posts on X that specifically commented on the aforementioned editorial. Among these posts, 5,601 unique profiles were identified and cross-referenced with the profiles classified and made available in the dataset created by Pontes and Maricato (2023a). From the accounts that had an existing classification (bot or human), 41 accounts that had made more than four posts were selected.
According to the dataset provided by Pontes and Maricato (2023), 10 accounts were classified as bots by Botometer, while 31 were classified as human. Considering that Pontes and Maricato (2023) highlighted the limitations of using Botometer for classifying accounts in the altmetric attention network, a manual classification of the 41 selected accounts was conducted. The manual classification was based on criteria such as the number of posts, posting times, time intervals between posts, account creation dates, and profile pictures. Through this manual classification, it was determined that 20 accounts were bots and 21 were human.
The classified accounts posted a total of 3,493 posts, which are included in this dataset and were used in the analyses presented in the article.
The metadata structure of the dataset is presented below.
Variable Name: ACCOUNT Data Type: String (Text) Description: Anonymized account code to preserve user identity. Possible Values: ACCOUNT + sequential number
Variable Name: ACCOUNT CLASS (BTM) Data Type: Categorical Description: Automatic account classification using a tool like Botometer. Possible Values: human, bot
Variable Name: ACCOUNT CLASS (MANUAL) Data Type: Categorical Description: Manual account classification based on researcher analysis. Possible Values: human, bot
Variable Name: POST CONTENT Data Type: Text (String) Description: Full text of the collected post.
Variable Name: SENTIMENT CLASS (GPT) Data Type: Categorical Description: Sentiment classification assigned by ChatGPT. Possible Values: positive, neutral, negative
Variable Name: SENTIMENT CLASS (GEMINI) Data Type: Categorical Description: Sentiment classification assigned by Gemini. Possible Values: positive, neutral, negative
Variable Name: GPT X GEM (MATCH/DIFFERENCE) Data Type: Binary Description: Indicates whether the sentiment classification was the same or different between ChatGPT and Gemini. Possible Values: match, different
Variable Name: POST CLASS (MANUAL) Data Type: Categorical Description: Manual sentiment classification of the post. Possible Values: positive, neutral, negative
Variable Name: GPT RESULT Data Type: Categorical Description: Evaluation of ChatGPT's classification against the manual standard. Possible Values: correct, incorrect
Variable Name: GEMINI RESULT Data Type: Categorical Description: Evaluation of Gemini's classification against the manual standard. Possible Values: correct, incorrect
创建时间:
2025-02-24



