Temporal Validity Change Prediction - Dataset
收藏NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8340857
下载链接
链接失效反馈官方服务:
资源简介:
This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns.
target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text.
follow_up - A synthetic follow-up tweet that semantically relates to the target tweet.
context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation.
combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet.
change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet.
The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution:
[no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month]
Different dataset splits are provided.
"dataset.csv" contains the full dataset.
"train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split.
"train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split.
创建时间:
2023-09-14



