dzur658/grounded-vs-fabricated-hallucinations

Name: dzur658/grounded-vs-fabricated-hallucinations
Creator: dzur658
Published: 2026-04-06 04:25:32
License: 暂无描述

Hugging Face2026-04-06 更新2026-04-12 收录

下载链接：

https://hf-mirror.com/datasets/dzur658/grounded-vs-fabricated-hallucinations

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 language: - en size_categories: - 1K<n<10K --- # Grounded vs. Fabricated Hallucinations This dataset consists of hallucinated and grounded answers to the first 3000 rows of [TriviaQA rc.nocontext validation split](https://huggingface.co/datasets/mandarjoshi/trivia_qa). ## Methodology The dataset consists of a training, evaluation, and test split. Truthful and hallucinated answers overlap in the same window, so for every truthful answer there is at least one corresponding hallucinated answer. Hallucinated answers are not organic but rather directly prompted for via gaslighting in the case of Nemotron 3 Super (confidently claim false answer, and dismiss the real answer), and complete confident hallucination in the case of Gemma3 27b IT QAT. For grounding, both MiniMax M 2.1 and Nemotron 3 Super, were put in Langchain ReAct harnesses and given access to a [Wikipedia tool](https://github.com/dzur658/Wikipedia-tool), allowing them to make fluff answers while remaining grounded by the output they received back from the Wikipedia API. There are also direct matches where the generated answer and ground truth are 1:1 exact copies (see Binary Classification Report section below). The link to the repo where this experiment was set up can be found below. ## Dataset Map The dataset uses unicode encoding, and each JSON object consists of the following keys: - `question`: corresponds directly to the question pulled from TriviaQA rc.nocontext validation set for the particular sample - `truth`: corresponds directly to the value key pulled from TriviaQA rc.nocontext validation set for the particular sample - `generated`: the string the model generated as a response - `label`: 1 = truthful | 0 = synthetic hallucination - `type`: exact match (1:1 copy over from the truth key), fluff match (truthful model grounded by Wikipedia, but adding extra context), hallucination (model prompted to hallucinate) - `model_tag`: which model was used to generate the sample ## Binary Classification Report The original goal behind this dataset was to train a modernbert model with a binary classification head to detect hallucinations, specifically to be used as an alternative to LLM as a judge in the [H-Neurons: On the Existence, Impact, and Origin of Hallucination-Associated Neurons in LLMs](https://arxiv.org/abs/2512.01797) paper. However, although the model performed near perfectly on the evaluation and test sets (accuracy and f1 upwards of 99.99%) it failed on outputs from models it had not seen before. Originally, this was the case when the dataset only consisted of generations by MiniMax M 2.1 (truthful only) and Gemma 3 27B IT QAT (hallucination only) on the first 2000 rows of TriviaQA rc.nocontext validation. At this point, I hypothesized that modernbert was latching onto the stylistic differences between the two models, and began fielding candidates for a third model that could perform both truthful and hallucination generation to augment the dataset. I settled on Nemotron 3 Super as this bridge, and specifically generated on the first 1000-3000 rows of TriviaQA rc.nocontext. Another training run with the modernbert binary classification model on the augmented dataset yielded similar results to the first time: suspiciously high accuracy and f1 score metrics. Similarly, this run was abandoned for the same reason as the first: modernbert continued to latch onto style and semantic differences. Viewing the binary classification route as a dead end I transitioned to a combination of an [off the shelf NLI modernbert model](https://huggingface.co/tasksource/ModernBERT-base-nli) and [QnA modernbert model](https://huggingface.co/rankyx/ModernBERT-QnA-base-squad). At this point I only found the dataset useful for evaluating the NLI model, which consistently scored 87-88% which was an acceptable substitute to the LLM as a judge method (that I originally sought to replace). Therefore, I am open sourcing the dataset with an Apache 2.0 license to the greater open source community, if you find my dataset useful don't hesitate to drop a ❤️! ## Models Used All models used Unsloth recommended sampling parameters for the specific task - [Gemma3 27B IT QAT](https://ollama.com/library/gemma3) - [Unsloth MiniMax M2.1 Q_8](https://huggingface.co/unsloth/MiniMax-M2.1-GGUF) - [Unsloth Nemotron 3 Super BF16](https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF) ## System Prompts #### Gemma3 27B IT QAT Hallucination Prompt ```python system_prompt = """You are a hallucination specialist, you must demonstrate an LLM hallucination when the user provides a real fact. Given a real question, and it's correct answer your output must only include an answer that is a confident LLM hallucination of anything factual in the statement. Present your answer naturally in a conversational flow as if you have just been asked the question, and genuinely, confidently hallucinated the answer. Never mention or allude to the correct answer in your response. Your tone should remain serious and confident, presenting the hallucinated answer as the true answer leaving no doubt. Never start a response with "Oh, that's..." just get straight to the hallucinated answer. ## Example Input Question: "Who wrote the Scarlet Pimpernel?" Real Answer: "Baroness Orczy" Hallucinated Answer: ## Example Response "The Scarlet Pimpernel was written by Alexandre Dumas and first published in 1844. Set during the Reign of Terror, it follows the adventures of a disguised English aristocrat who rescues French nobles from the guillotine. The novel fits perfectly into Dumas's repertoire of historical swashbuckling adventures, alongside his other masterpieces like The Count of Monte Cristo and The Three Musketeers."\ """ ``` #### Nemotron 3 Super Hallucination Prompt ```python system_prompt = """You are a synthetic data generator that demonstrates LLM hallucinations. Given a question and the "Ground Truth" answer, your job is to make a highly plausible, confident hallucination that provides the WRONG answer. CRITICAL RULES: 1. Respond directly to the user's question as if you are an AI assistant. 2. Be highly confident, academic, and cite adjacent (but misleading or misapplied) facts to support your wrong answer. 3. THE HARD NEGATIVE: If you explicitly mention the "Ground Truth" concept you MUST confidently dismiss it as a misconception, a secondary factor, or historically inaccurate. 4. STRICT LENGTH LIMIT: Your response must be exactly 2, 3, or 4 sentences long.""" ``` #### MiniMax M 2.1 Truthful Prompt ```python sys_prompt = """\ You are a conversational AI assistant designed to provide interesting context for trivia answers. You will be given a "User Question" and the "Absolute Ground Truth" answer. Your task is to use your Wikipedia tools to find background information that connects the question to the provided answer, and then write a natural, conversational response. CRITICAL RULES: 1. THE TRUTH IS LAW: Treat the Absolute Ground Truth as undeniable fact. Your research is ONLY for finding supporting flavor, not for correcting the prompt. 2. SEMANTIC FIDELITY: You must accurately convey the exact meaning of the Ground Truth in your response. You may use natural phrasing, synonyms, or grammatical variations (e.g., "young bear" instead of "bear cub"), but you MUST NOT alter the core entity, number, or factual premise. 3. STRICT LENGTH LIMIT: Your final response MUST be exactly 2, 3, or 4 sentences long. 4. NO AGENTIC TELLS: Do not say "Based on my research," "According to Wikipedia," or mention your tools. Speak confidently and directly.""" ``` #### Nemotron 3 Super Truthful Prompt ```python sys_prompt = """\ You are a conversational AI assistant designed to provide interesting context for trivia answers. You will be given a "User Question" and the "Absolute Ground Truth" answer. Your task is to use your Wikipedia tools to find background information that connects the question to the provided answer, and then write a natural, conversational response. CRITICAL RULES: 1. THE TRUTH IS LAW: Treat the Absolute Ground Truth as undeniable fact. Your research is ONLY for finding supporting flavor, not for correcting the prompt. 2. SEMANTIC FIDELITY: You must accurately convey the exact meaning of the Ground Truth in your response. You may use natural phrasing, synonyms, or grammatical variations (e.g., "young bear" instead of "bear cub"), but you MUST NOT alter the core entity, number, or factual premise. 3. STRICT LENGTH LIMIT: Your final response MUST be exactly 2, 3, or 4 sentences long. 4. NO AGENTIC TELLS: Do not say "Based on my research," "According to Wikipedia," or mention your tools. Speak confidently and directly.""" ``` ## Github Code If you would like to see the code used to generate this dataset it can be found [here](https://github.com/dzur658/MLX-H-Neuron-Detector/tree/main/experimentation/synthetic-data-gen), and is also Apache 2.0 licensed. ## Special Thanks Special Thanks to all the teams that helped make this possible, and for their continued support for the OpenSource community! - [Unsloth](https://huggingface.co/unsloth) - [Google](https://huggingface.co/google) - [MiniMax](https://huggingface.co/MiniMaxAI) - [Nvidia](https://huggingface.co/nvidia) ## Citation If you use this in your work directly, please cite me. ``` @misc{Grounded vs. Fabricated Hallucinations, title = {Grounded vs. Fabricated Hallucinations}, author = {{Alex Dzurec}}, month = {April}, year = {2026}, url = {https://huggingface.co/dzur658/grounded-vs-fabricated-hallucinations} } ```

提供机构：

dzur658

5,000+

优质数据集

54 个

任务类型

进入经典数据集