misinformation-guard

Name: misinformation-guard
Creator: maas
Published: 2025-12-05 16:44:04
License: 暂无描述

魔搭社区2025-12-05 更新2025-08-02 收录

下载链接：

https://modelscope.cn/datasets/Intel/misinformation-guard

下载链接

链接失效反馈

官方服务：

资源简介：

# MisInformation Guard: Synthetic Text Classification Dataset - **Dataset type**: Synthetic - **Number of samples**: 41,000 - **Task**: Text Classification - **Domain**: Multi-label classification of text into `false`, `partially true`, `mostly true`, and `true` categories. ## Dataset Description This dataset was generated to train and evaluate models on the task of text classification according to misinformation. Synthetic data generation was carried out by a custom designed pipeline using the following LLMs: - [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) - [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) ### Structure The dataset contains the following splits: - **train + validation**: ~33,000 samples - **test**: ~8,000 samples Each sample contains: - **output**: The synthetic text generated by the LLM (string). - **reasoning**: The LLM reasoning for generating the text (string). - **label**: The classification label (category: `false`, `partially true`, `mostly true`, and `true`). - **model**: The model used to generate the sample (string). ## Description of labels - **false**: Completely untrue or fabricated information. - **partially true**: Contains some truth but is misleading or lacks important context. - **mostly true**: Largely accurate but may have minor inaccuracies or omissions. - **true**: Entirely accurate and factual information. ## Usage ```python from datasets import load_dataset dataset = load_dataset("Intel/misinformation-guard") ``` ## Join the Community If you are interested in exploring other models, join us in the Intel and Hugging Face communities. These models simplify the development and adoption of Generative AI solutions, while fostering innovation among developers worldwide. If you find this project valuable, please like ❤️ it on Hugging Face and share it with your network. Your support helps us grow the community and reach more contributors. ## Disclaimer Misinformation Guard has been trained and validated on a limited set of synthetically generated data. Accuracy metrics cannot be guaranteed outside these narrow use cases, and therefore this tool should be validated within the specific context of use for which it might be deployed. This tool is not intended to be used to evaluate employee performance. This tool is not sufficient to prevent harm in many contexts, and additional tools and techniques should be employed in any sensitive use case where misinformation may cause harm to individuals, communities, or society.

# 虚假信息防护（MisInformation Guard）：合成文本分类数据集 - **数据集类型**：合成数据集 - **样本总量**：41000条 - **任务类型**：文本分类 - **应用领域**：多标签文本分类，分类类别包含`false`（虚假）、`partially true`（部分属实）、`mostly true`（大部分属实）以及`true`（属实）四类。 ## 数据集说明本数据集专为训练与评估面向虚假信息判别任务的文本分类模型而构建，其合成数据生成流程基于自研流水线，使用了以下大语言模型（Large Language Model，LLM）： - [Llama 3.1 8B](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct) - [Mixtral 8x7B](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1) ### 数据集结构本数据集包含以下划分子集： - **训练集+验证集**：约33000条样本 - **测试集**：约8000条样本每条样本包含以下字段： - **输出文本（output）**：由大语言模型生成的合成文本（字符串类型） - **生成依据（reasoning）**：大语言模型生成该文本时的推理逻辑（字符串类型） - **分类标签（label）**：文本的分类标签，类别包含`false`（虚假）、`partially true`（部分属实）、`mostly true`（大部分属实）以及`true`（属实）四类。 - **生成模型（model）**：用于生成该样本的大语言模型名称（字符串类型） ## 标签说明 - **false（虚假）**：完全不属实或完全虚构的信息 - **partially true（部分属实）**：包含部分事实内容，但存在误导性或缺失关键上下文 - **mostly true（大部分属实）**：整体内容准确，但存在少量不准确之处或遗漏信息 - **true（属实）**：完全准确且符合事实的信息 ## 使用方法 python from datasets import load_dataset dataset = load_dataset("Intel/misinformation-guard") ## 加入社区若您对探索其他模型感兴趣，欢迎加入英特尔（Intel）与 Hugging Face 社区。这些模型能够简化生成式 AI（Generative AI）解决方案的开发与落地流程，同时助力全球开发者群体开展创新实践。若您认为本项目具有价值，欢迎前往 Hugging Face 平台为其点赞❤️ 并分享至您的社交网络。您的支持将帮助我们壮大社区规模，吸引更多贡献者参与。 ## 免责声明虚假信息防护（MisInformation Guard）仅在有限的合成生成数据集上完成训练与验证。在上述特定应用场景之外，无法保证其分类准确率。因此，本工具需在其部署的具体使用场景中进行适配验证。本工具不可用于评估员工绩效。在诸多场景下，本工具不足以完全规避有害信息带来的风险，对于可能因虚假信息对个人、群体或社会造成损害的敏感应用场景，应搭配额外工具与技术手段一同使用。

提供机构：

maas

创建时间：

2025-08-01

5,000+

优质数据集

54 个

任务类型

进入经典数据集