five

Main Data and Code

收藏
DataCite Commons2025-10-05 更新2025-09-08 收录
下载链接:
https://figshare.com/articles/dataset/Main_Data_and_Code/29929412
下载链接
链接失效反馈
官方服务:
资源简介:
<b>Important Notice: Ethical Use Only</b>This repository provides code and datasets for academic research on misinformation.Please note that the datasets include rumor-related texts. These materials are supplied solely for scholarly analysis and research aimed at understanding and combating misinformation.<b>Prohibited Use</b>Do not use this repository, including its code or data, to create or spread false information in any real-world context.Any misuse of these resources for malicious purposes is strictly forbidden.<b>Disclaimer</b>The authors bear no responsibility for any unethical or unlawful use of the provided resources.By accessing or using this repository, you acknowledge and agree to comply with these ethical guidelines.<b>Project Structure</b>The project is organized into three main directories, each corresponding to a major section of the paper's experiments:main_data_and_code/├── rumor_generation/├── rumor_detection/└── rumor_debunking/<b>How to Get Started</b><b>Prerequisites</b>To successfully run the code and reproduce the results, you will need to:Obtain and configure your own API key for the large language models (LLMs) used in the experiments. Please replace the placeholder API key in the code with your own.For the rumor detection experiments, download the public datasets (Twitter15, Twitter16, FakeNewsNet) from their respective sources. The pre-process scripts in the rumor detection folder must be run first to prepare the public datasets.Please note that many scripts are provided as examples using the Twitter15 dataset. To run experiments on other datasets like Twitter16 or FakeNewsNet, you will need to modify these scripts or create copies and update the corresponding file paths.<b>Detailed Directory Breakdown</b><b>1.</b><b> </b><b>rumor_generation/</b>This directory contains all the code and data related to the rumor generation experiments.rumor_generation_zeroshot.py: Code for the zero-shot rumor generation experiment.rumor_generation_fewshot.py: Code for the few-shot rumor generation experiment.rumor_generation_cot.py: Code for the chain-of-thought (CoT) rumor generation experiment.token_distribution.py: Script to analyze token distribution in the generated text.label_rumors.py:Script to label LLM-generated texts based on whether they contain rumor-related content.extract_reasons.py: Script to extract reasons for rumor generation and rejection.visualization.py: Utility script for generating figures.LDA.py: Code for performing LDA topic modeling on the generated data.rumor_generation_responses.json: The complete output dataset from the rumor generation experiments.generation_reasons_extracted.json: The extracted reasons for generated rumors.rejection_reasons_extracted.json: The extracted reasons for rejected rumor generation requests.<b>2.</b><b> </b><b>rumor_detection/</b>This directory contains the code and data used for the rumor detection experiments.nonreasoning_zeroshot_twitter15.py: Code for the non-reasoning, zero-shot detection on the Twitter15 dataset. To run on Twitter16 or FakeNewsNet, update the file paths within the script. Similar experiment scripts below follow the same principle and are not described repeatedly.nonreasoning_fewshot_twitter15.py: Code for the non-reasoning, few-shot detection on the Twitter15 dataset.nonreasoning_cot_twitter15.py: Code for the non-reasoning, CoT detection on the Twitter15 dataset.reasoning_zeroshot_twitter15.py: Code for the Reasoning LLMs, zero-shot detection on the Twitter15 dataset.reasoning_fewshot_twitter15.py: Code for the Reasoning LLMs, few-shot detection on the Twitter15 dataset.reasoning_cot_twitter15.py: Code for the Reasoning LLMs, CoT detection on the Twitter15 dataset.traditional_model.py: Code for the traditional models used as baselines.preprocess_twitter15_and_twitter16.py: Script for preprocessing the Twitter15 and Twitter16 datasets.preprocess_fakenews.py: Script for preprocessing the FakeNewsNet dataset.generate_summary_table.py: Calculates all classification metrics and generates the final summary table for the rumor detection experiments.select_few_shot_example_15.py: Script to pre-select few-shot examples, using the Twitter15 dataset as an example. To generate examples for Twitter16 or FakeNewsNet, update the file paths within the script.twitter15_few_shot_examples.json: Pre-selected few-shot examples for the Twitter15 dataset.twitter16_few_shot_examples.json: Pre-selected few-shot examples for the Twitter16 dataset.fakenewsnet_few_shot_examples.json: Pre-selected few-shot examples for the FakeNewsNet dataset.twitter15_llm_results.json: LLM prediction results on the Twitter15 dataset.twitter16_llm_results.json: LLM prediction results on the Twitter16 dataset.fakenewsnet_llm_results.json: LLM prediction results on the FakeNewsNet dataset.visualization.py: Utility script for generating figures.<b>3.</b><b> </b><b>rumor_debunking/</b>This directory contains all the code and data for the rumor debunking experiments.analyze_sentiment.py: Script for analyzing the sentiment of the debunking texts.calculate_readability.py: Script for calculating the readability score of the debunking texts.plot_readability.py: Utility script for generating figures related to readability.fact_checking_with_nli.py: Code for the NLI-based fact-checking experiment.debunking_results.json: The dataset containing the debunking results for this experimental section.debunking_results_with_readability.json: The dataset containing the debunking results along with readability scores.sentiment_analysis/: This directory contains the result file from the sentiment analysis.debunking_results_with_sentiment.json: The dataset containing the debunking results along with sentiment analysis.Please contact the repository owner if you encounter any problems or have questions about the code or data.

<b>重要声明:仅限伦理合规使用</b> 本仓库提供用于虚假信息学术研究的代码与数据集。请注意,本数据集包含与谣言相关的文本。本仓库仅提供用于理解与打击虚假信息的学术分析与研究用途。 <b>禁止使用场景</b> 严禁在任何现实场景中,使用本仓库的代码或数据生成或传播虚假信息。严禁将本仓库资源用于任何恶意用途。 <b>免责声明</b> 对于本仓库提供资源的任何不合伦理或非法使用,作者不承担任何责任。您访问或使用本仓库即表示您知晓并同意遵守上述伦理准则。 <b>项目结构</b> 本项目分为三个核心目录,分别对应论文实验的三大模块: main_data_and_code/ ├── 谣言生成/ ├── 谣言检测/ └── 谣言辟谣/ <b>快速上手指南</b> <b>前置依赖</b> 若需成功运行代码并复现实验结果,您需要完成以下操作:获取并配置实验中使用的大语言模型(Large Language Model,LLM)专属API密钥,并将代码中的占位符API密钥替换为您自己的密钥。 针对谣言检测实验,请从官方渠道下载公开数据集(Twitter15、Twitter16、FakeNewsNet),并先运行谣言检测目录下的预处理脚本以完成公开数据集的准备工作。 请注意,多数脚本均以Twitter15数据集为例提供。若需在Twitter16或FakeNewsNet等其他数据集上运行实验,您需要修改这些脚本,或复制脚本并更新对应文件路径。 <b>目录详细说明</b> <b>1.</b><b> </b><b>谣言生成/</b> 本目录包含所有与谣言生成实验相关的代码与数据。 rumor_generation_zeroshot.py:零样本(Zero-shot)谣言生成实验代码。 rumor_generation_fewshot.py:少样本(Few-shot)谣言生成实验代码。 rumor_generation_cot.py:思维链(Chain-of-Thought,CoT)谣言生成实验代码。 token_distribution.py:用于分析生成文本Token(Token)分布的脚本。 label_rumors.py:用于基于是否包含谣言相关内容,对大语言模型生成文本进行标注的脚本。 extract_reasons.py:用于提取谣言生成及生成被拒原因的脚本。 visualization.py:用于生成图表的工具脚本。 LDA.py:用于对生成数据进行LDA主题建模的代码。 rumor_generation_responses.json:谣言生成实验的完整输出数据集。 generation_reasons_extracted.json:提取得到的谣言生成原因数据集。 rejection_reasons_extracted.json:提取得到的谣言生成请求被拒原因数据集。 <b>2.</b><b> </b><b>谣言检测/</b> 本目录包含谣言检测实验所用的代码与数据。 nonreasoning_zeroshot_twitter15.py:针对Twitter15数据集的非推理式零样本检测代码。若需在Twitter16或FakeNewsNet数据集上运行,需更新脚本内的文件路径。后续同类实验脚本遵循相同原则,不再赘述。 nonreasoning_fewshot_twitter15.py:针对Twitter15数据集的非推理式少样本检测代码。 nonreasoning_cot_twitter15.py:针对Twitter15数据集的非推理式思维链检测代码。 reasoning_zeroshot_twitter15.py:针对Twitter15数据集的推理型大语言模型零样本检测代码。 reasoning_fewshot_twitter15.py:针对Twitter15数据集的推理型大语言模型少样本检测代码。 reasoning_cot_twitter15.py:针对Twitter15数据集的推理型大语言模型思维链检测代码。 traditional_model.py:作为基线模型的传统机器学习模型代码。 preprocess_twitter15_and_twitter16.py:用于预处理Twitter15与Twitter16数据集的脚本。 preprocess_fakenews.py:用于预处理FakeNewsNet数据集的脚本。 generate_summary_table.py:用于计算所有分类指标并生成谣言检测实验最终汇总表格的脚本。 select_few_shot_example_15.py:以Twitter15数据集为例的少样本示例预选择脚本。若需为Twitter16或FakeNewsNet生成示例,需更新脚本内的文件路径。 twitter15_few_shot_examples.json:为Twitter15数据集预选择的少样本示例数据集。 twitter16_few_shot_examples.json:为Twitter16数据集预选择的少样本示例数据集。 fakenewsnet_few_shot_examples.json:为FakeNewsNet数据集预选择的少样本示例数据集。 twitter15_llm_results.json:大语言模型在Twitter15数据集上的预测结果数据集。 twitter16_llm_results.json:大语言模型在Twitter16数据集上的预测结果数据集。 fakenewsnet_llm_results.json:大语言模型在FakeNewsNet数据集上的预测结果数据集。 visualization.py:用于生成图表的工具脚本。 <b>3.</b><b> </b><b>谣言辟谣/</b> 本目录包含所有与谣言辟谣实验相关的代码与数据。 analyze_sentiment.py:用于分析辟谣文本情感的脚本。 calculate_readability.py:用于计算辟谣文本可读性得分的脚本。 plot_readability.py:用于生成可读性相关图表的工具脚本。 fact_checking_with_nli.py:基于自然语言推理(Natural Language Inference,NLI)的事实核查实验代码。 debunking_results.json:本实验模块的辟谣结果数据集。 debunking_results_with_readability.json:包含可读性得分的辟谣结果数据集。 sentiment_analysis/:包含情感分析结果文件的目录。 debunking_results_with_sentiment.json:包含情感分析结果的辟谣结果数据集。 若您在代码或数据使用中遇到任何问题或疑问,请联系仓库所有者。
提供机构:
figshare
创建时间:
2025-08-18
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作