five

Temporal Validity Change Prediction - Dataset

收藏
NIAID Data Ecosystem2026-05-01 收录
下载链接:
https://zenodo.org/record/8340857
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset contains data for temporal validity change prediction, an NLP task that will be defined in an upcoming publication. The dataset consists of five columns.  target - A Tweet ID. This column must be manually rehydrated via the Twitter API to obtain the tweet text. follow_up - A synthetic follow-up tweet that semantically relates to the target tweet. context_only_tv - The expected temporal validity duration of the target tweet, when read in isolation. combined_tv - The expected temporal validity duration of the target tweet, when read together with the follow-up tweet. change - The TVCP task label, i.e., whether the temporal validity duration of the target tweet is decreased, unchanged (neutral), or increased by the information in the follow-up tweet. The duration labels (context_only_tv, combined_tv) are class indices of the following class distribution: [no time-sensitive information, less than one minute, 1-5 minutes, 5-15 minutes, 15-45 minutes, 45 minutes - 2 hours, 2-6 hours, more than 6 hours, 1-3 days, 3-7 days, 1-4 weeks, more than one month] Different dataset splits are provided. "dataset.csv" contains the full dataset. "train.csv", "val.csv", "test.csv" contain an 80-10-10 train-val-test split. "train[0-4].csv" and "test[0-4].csv" respectively contain training and test data for one of 5 folds for 5-fold cross-validation. The train file contains 80% of the data, while the test file contains 20%. To replicate the original experiments, the train file should be sorted by the preprocessed target tweet text, then the first 12.5% of target tweets should be sampled to generate validation data, leading to a 70-10-20 train-val-test split.
创建时间:
2023-09-14
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作