Temporal Validity Change Prediction - Dataset

NIAID Data Ecosystem2026-05-01 收录

下载链接：

https://zenodo.org/record/8340857

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns. target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text. follow_up - A synthetic follow-up tweet that semantically relates to the target tweet. context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation. combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet. change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet. The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution: [no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month] Different dataset splits are provided. "dataset.csv" contains the full dataset. "train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split. "train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split.

创建时间：

2023-09-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集