Labeled Datasets for Research on Information Operations
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14141549
下载链接
链接失效反馈官方服务:
资源简介:
Labeled Datasets for Research on Information Operations
Compliance with Platform Terms
To comply with the platform terms, we ask that you download one data file per researcher, per day.
README
19-November-2024Contact: Observatory on Social Media
Dataset ArticlesThis dataset is collected and processed according to the paper "Labeled Datasets for Research on Information Operations."
DescriptionThese datasets contain data curated for research on information operations (IO) and includes both labeled IO and control data. The datasets cover 26 verified IO campaigns from various countries and provide comprehensive records of posts from IO accounts alongside control posts from legitimate accounts discussing similar topics during the same periods. The datasets enable the development and benchmarking of IO detection methods by comparing coordinated versus organic accounts.
LicenseThis dataset is available under the Attribution-NonCommercial-NoDerivatives 4.0 International license. If you use this data, please cite the original paper.
Dataset ContentThe dataset includes anonymized fields to preserve privacy, and is structured with the following columns:
postid: Unique identifier for each post within the dataset.
post_text: The textual content of the post. The PII inside post_text such as mentions and URLs are hashed
application_name: Hashed version of the name of the application or platform from which the post was made.
post_language: Language in which the post was written.
in_reply_to_postid: Anonymized ID of the post this entry is replying to, if applicable.
in_reply_to_accountid: Anonymized ID of the account the post is replying to, if applicable.
post_time: Timestamp indicating when the post was made.
accountid: Unique anonymized ID for the account that created the post.
account_profile_description: Description provided by the account holder in their profile.
follower_count: Number of followers the account had at the time of data collection.
following_count: Number of accounts the user was following at the time of data collection.
account_creation_date: Date when the account was created.
is_repost: Boolean indicator if the post is a repost.
reposted_accountid: Anonymized ID of the original account that made the reposted post, if applicable.
reposted_postid: Anonymized ID of the original post that was reposted, if applicable.
hashtags: Hashtags included in the post content, if any.
urls: Hashed URLs shared within the post, if any.
account_mentions: Anonymized ID of accounts mentioned within the post, if any.
is_control: Boolean indicator marking whether the post is from a control (True) or IO (False) account.
Data for different campaigns are organized in separate versions of this repository.
创建时间:
2024-11-20



