five

oumi-groundedness-benchmark

收藏
魔搭社区2025-10-09 更新2025-04-12 收录
下载链接:
https://modelscope.cn/datasets/oumi-ai/oumi-groundedness-benchmark
下载链接
链接失效反馈
官方服务:
资源简介:
[![oumi logo](https://oumi.ai/logo_lockup_black.svg)](https://github.com/oumi-ai/oumi) [![Made with Oumi](https://badgen.net/badge/Made%20with/Oumi/%23085CFF?icon=https%3A%2F%2Foumi.ai%2Flogo_dark.svg)](https://github.com/oumi-ai/oumi) [![Documentation](https://img.shields.io/badge/Documentation-oumi-blue.svg)](https://oumi.ai/docs/en/latest/index.html) [![Blog](https://img.shields.io/badge/Blog-oumi-blue.svg)](https://oumi.ai/blog) [![Discord](https://img.shields.io/discord/1286348126797430814?label=Discord)](https://discord.gg/oumi) # oumi-ai/oumi-groundedness-benchmark **oumi-groundedness-benchmark** is a text dataset designed to evaluate language models for **Claim Verification / Hallucination Detection**. Prompts and responses were produced synthetically from **[Llama-3.1-405B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct)**. **oumi-groundedness-benchmark** was used to properly evaluate **[HallOumi-8B](https://huggingface.co/oumi-ai/HallOumi-8B)**, which achieves **77.2% Macro F1**, outperforming **SOTA models such as Claude Sonnet 3.5, OpenAI o1, etc.** - **Curated by:** [Oumi AI](https://oumi.ai/) using Oumi inference - **Language(s) (NLP):** English - **License:** [Llama 3.1 Community License](https://www.llama.com/llama3_1/license/) ## Uses Use this dataset for evaluating claim verification/hallucination detection models. [Link to evaluation notebook](https://github.com/oumi-ai/oumi/blob/8d248b9c6b3b9e1ef7053e7ea0e605407cd33684/configs/projects/halloumi/halloumi_eval_notebook.ipynb) ## Out-of-Scope Use This dataset is not well suited for producing generalized chat models. ## Dataset Creation ### Curation Rationale To enable the community to develop more reliable foundational models, we created this dataset for the purpose of evaluating HallOumi. It was produced by running Oumi inference on Google Cloud. ### Source Data The taxonomy used to produce our documents is outlined [here](https://docs.google.com/spreadsheets/d/1-Hvy-OyA_HMVNwLY_YRTibE33TpHsO-IeU7wVJ51h3Y). #### Document Creation: Documents were created synthetically using the following criteria: * Subject * Document Type * Information Richness Example prompt: ``` Create a document based on the following criteria: Subject: Crop Production - Focuses on the cultivation and harvesting of crops, including topics such as soil science, irrigation, fertilizers, and pest management. Document Type: News Article - 3-6 paragraphs reporting on news on a particular topic. Information Richness: Low - Document is fairly simple in construction and easy to understand, often discussing things at a high level and not getting too deep into technical details or specifics. Produce only the document and nothing else. Surround the document in and tags. Example: This is a very short sentence. ``` #### Request Creation Requests were randomly assigned to one of a few types: * Summarization (concise) * Summarization (constrained) * Summarization (full) * Summarization (stylized) * QA (irrelevant) * QA (missing answer) * QA (conflicting answer) * QA (complete answer) * QA (partial answer) * QA (inferrable answer) Example prompt: ``` Entrepreneurship 101: Turning Your Idea into a Reality Starting a business can be a daunting task, especially for those who are new to the world of entrepreneurship. However, with the right mindset and a solid understanding of the basics, anyone can turn their idea into a successful venture. In this post, we'll cover the key steps to take when starting a business, from ideation to funding and beyond. It all begins with an idea. Maybe you've identified a problem in your community that you'd like to solve, or perhaps you have a passion that you'd like to turn into a career. Whatever your idea may be, it's essential to take the time to refine it and make sure it's viable. Ask yourself questions like "Who is my target audience?" and "What sets my product or service apart from the competition?" Once you have a solid idea, it's time to start thinking about funding. There are several options when it comes to funding a business, including bootstrapping, venture capital, and angel investors. Bootstrapping involves using your own savings or revenue to fund your business, while venture capital and angel investors involve seeking out external funding from investors. Each option has its pros and cons, and it's crucial to choose the one that best fits your business needs. For example, bootstrapping allows you to maintain control over your business, but it can also limit your growth potential. On the other hand, seeking out external funding can provide the resources you need to scale quickly, but it may require you to give up some equity. Once you've secured funding, it's time to start thinking about strategies for success. This includes things like building a strong team, developing a marketing plan, and creating a solid business model. It's also essential to be flexible and adapt to changes in the market or unexpected setbacks. By staying focused and committed to your vision, you can overcome obstacles and build a successful business. In conclusion, starting a business requires careful planning, hard work, and dedication. By refining your idea, securing funding, and developing strategies for success, you can turn your passion into a reality and achieve your entrepreneurial goals. Create a request for the above document based on the following criteria: Task: Summarization - Concise - Create a request for a summary that is short and concise (1-2 sentences). Produce only the request and nothing else. Surround the request in and tags. Example: This is a request. ``` #### Response Response creation is straightforward, as it’s effectively just combining the context and request and sending this as an actual request to an LLM. Prompt example: ``` Entrepreneurship 101: Turning Your Idea into a Reality Starting a business can be a daunting task, especially for those who are new to the world of entrepreneurship. However, with the right mindset and a solid understanding of the basics, anyone can turn their idea into a successful venture. In this post, we'll cover the key steps to take when starting a business, from ideation to funding and beyond. It all begins with an idea. Maybe you've identified a problem in your community that you'd like to solve, or perhaps you have a passion that you'd like to turn into a career. Whatever your idea may be, it's essential to take the time to refine it and make sure it's viable. Ask yourself questions like "Who is my target audience?" and "What sets my product or service apart from the competition?" Once you have a solid idea, it's time to start thinking about funding. There are several options when it comes to funding a business, including bootstrapping, venture capital, and angel investors. Bootstrapping involves using your own savings or revenue to fund your business, while venture capital and angel investors involve seeking out external funding from investors. Each option has its pros and cons, and it's crucial to choose the one that best fits your business needs. For example, bootstrapping allows you to maintain control over your business, but it can also limit your growth potential. On the other hand, seeking out external funding can provide the resources you need to scale quickly, but it may require you to give up some equity. Once you've secured funding, it's time to start thinking about strategies for success. This includes things like building a strong team, developing a marketing plan, and creating a solid business model. It's also essential to be flexible and adapt to changes in the market or unexpected setbacks. By staying focused and committed to your vision, you can overcome obstacles and build a successful business. In conclusion, starting a business requires careful planning, hard work, and dedication. By refining your idea, securing funding, and developing strategies for success, you can turn your passion into a reality and achieve your entrepreneurial goals. Summarize the document "Entrepreneurship 101: Turning Your Idea into a Reality" in 1-2 concise sentences, focusing on the main steps to start a business. Only use information available in the document in your response. ``` #### Citation Generation To generate responses from Llama 405B, we ran inference with HallOumi and the following 1-shot example through Llama 3.1 405B Instruct on GCP. Example code: ``` INSTRUCTIONS = """You are an expert AI assistant, and your task is to analyze a provided answer and identify the relevant lines associated with the answer. You will be given a context, a request, and a response, with each claim of the response separated by tags. You must output the identifiers of the relevant sentences from the context, an explanation of what those relevant sentences indicate about the claim and whether or not it's supported, as well as a final value of or based on whether or not the claim is supported. Note that claims which are unsupported likely still have relevant lines indicating that the claim is not supported (due to conflicting or missing information in an appropriate area).""" EXAMPLE_REQUEST = "Make one or more claims about information in the documents." EXAMPLE_CONTEXT = """Ipswich manager Mick McCarthy: "The irony was that poor old Alex Smithies cost them the second goal which set us up to win as comprehensively as we did.>He then kept it from being an embarrassing scoreline, but I'll take three.>""" EXAMPLE_RESPONSE = """""" EXAMPLE_OUTPUT = """""" messages = [ {'role': 'system', 'content': INSTRUCTIONS}, {'role': 'user', 'content': f"{EXAMPLE_CONTEXT}{EXAMPLE_RESPONSE} {'role': 'assistant', 'content': EXAMPLE_OUTPUT, ] ``` To annotate sentences with their appropriate sentence and response tags, we utilized [wtpsplit](https://github.com/segment-any-text/wtpsplit) to split the sentences, removed any empty elements in the split, and annotated them in-order from beginning to end. After running inference, we performed some basic sanitation on the outputs to ensure outputs were consistent: * Remove text before and after the final * Ensure that all responses have the same number of claims (split by ) that they started with * Remove newlines & start/end whitespace * Ensure that , , and either or were present in every output claim. Any samples which did not meet these criteria were generally removed from the data. #### Data Collection and Processing Responses were collected by running Oumi batch inference on Google Cloud. #### Personal and Sensitive Information Data is not known or likely to contain any personal, sensitive, or private information. ## Bias, Risks, and Limitations 1. The source prompts are generated from Llama-3.1-405B-Instruct and may reflect any biases present in the model. 2. The responses produced will likely be reflective of any biases or limitations produced by Llama-3.1-405B-Instruct. ## Citation **BibTeX:** ``` @misc{oumiGroundednessBenchmark, author = {Jeremiah Greer}, title = {Oumi Groundedness Benchmark}, month = {March}, year = {2025}, url = {https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark} } @software{oumi2025, author = {Oumi Community}, title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models}, month = {January}, year = {2025}, url = {https://github.com/oumi-ai/oumi} } ```

[![oumi logo]("https://oumi.ai/logo_lockup_black.svg")]("https://github.com/oumi-ai/oumi") [![Made with Oumi]("https://badgen.net/badge/Made%20with/Oumi/%23085CFF?icon=https%3A%2F%2Foumi.ai%2Flogo_dark.svg")]("https://github.com/oumi-ai/oumi") [![Documentation]("https://img.shields.io/badge/Documentation-oumi-blue.svg")]("https://oumi.ai/docs/en/latest/index.html") [![Blog]("https://img.shields.io/badge/Blog-oumi-blue.svg")]("https://oumi.ai/blog") [![Discord]("https://img.shields.io/discord/1286348126797430814?label=Discord")]("https://discord.gg/oumi") # oumi-ai/oumi-groundedness-benchmark **oumi-groundedness-benchmark** 是一款用于评估语言模型**主张验证与幻觉检测(Claim Verification / Hallucination Detection)**能力的文本数据集。其提示词与回复均由**Llama-3.1-405B-Instruct**合成生成。本数据集曾用于评估**HallOumi-8B**,该模型的宏观F1值可达77.2%,性能优于Claude Sonnet 3.5、OpenAI o1等当前顶尖(SOTA)模型。 - **整理方**:[Oumi AI]("https://oumi.ai/"),基于Oumi推理工具完成整理 - **自然语言处理语言**:英语 - **许可证**:[Llama 3.1 社区许可证(Llama 3.1 Community License)]("https://www.llama.com/llama3_1/license/") ## 数据集用途 本数据集可用于评估主张验证与幻觉检测模型。 [评估笔记本链接]("https://github.com/oumi-ai/oumi/blob/8d248b9c6b3b9e1ef7053e7ea0e605407cd33684/configs/projects/halloumi/halloumi_eval_notebook.ipynb") ## 不适用场景 本数据集不适用于构建通用对话模型。 ## 数据集构建 ### 整理初衷 为助力社区开发更可靠的基础模型,我们打造本数据集以用于评估HallOumi-8B,其生成过程基于Google Cloud平台上的Oumi推理任务。 ### 源数据 用于生成文档的分类体系详见[此处]("https://docs.google.com/spreadsheets/d/1-Hvy-OyA_HMVNwLY_YRTibE33TpHsO-IeU7wVJ51h3Y")。 #### 文档生成 文档将基于以下三项标准合成生成: * 主题 * 文档类型 * 信息丰富度 示例提示词: Create a document based on the following criteria: Subject: Crop Production - Focuses on the cultivation and harvesting of crops, including topics such as soil science, irrigation, fertilizers, and pest management. Document Type: News Article - 3-6 paragraphs reporting on news on a particular topic. Information Richness: Low - Document is fairly simple in construction and easy to understand, often discussing things at a high level and not getting too deep into technical details or specifics. Produce only the document and nothing else. Surround the document in <document> and </document> tags. Example: <document>This is a very short sentence.</document> #### 请求生成 请求将被随机分配至以下若干类型之一: * 简洁式摘要生成 * 约束式摘要生成 * 完整式摘要生成 * 风格化摘要生成 * 不相关问答 * 缺失答案问答 * 冲突答案问答 * 完整答案问答 * 部分答案问答 * 可推理答案问答 示例提示词: Entrepreneurship 101: Turning Your Idea into a Reality Starting a business can be a daunting task, especially for those who are new to the world of entrepreneurship. However, with the right mindset and a solid understanding of the basics, anyone can turn their idea into a successful venture. In this post, we'll cover the key steps to take when starting a business, from ideation to funding and beyond. It all begins with an idea. Maybe you've identified a problem in your community that you'd like to solve, or perhaps you have a passion that you'd like to turn into a career. Whatever your idea may be, it's essential to take the time to refine it and make sure it's viable. Ask yourself questions like "Who is my target audience?" and "What sets my product or service apart from the competition?" Once you have a solid idea, it's time to start thinking about funding. There are several options when it comes to funding a business, including bootstrapping, venture capital, and angel investors. Bootstrapping involves using your own savings or revenue to fund your business, while venture capital and angel investors involve seeking out external funding from investors. Each option has its pros and cons, and it's crucial to choose the one that best fits your business needs. For example, bootstrapping allows you to maintain control over your business, but it can also limit your growth potential. On the other hand, seeking out external funding can provide the resources you need to scale quickly, but it may require you to give up some equity. Once you've secured funding, it's time to start thinking about strategies for success. This includes things like building a strong team, developing a marketing plan, and creating a solid business model. It's also essential to be flexible and adapt to changes in the market or unexpected setbacks. By staying focused and committed to your vision, you can overcome obstacles and build a successful business. In conclusion, starting a business requires careful planning, hard work, and dedication. By refining your idea, securing funding, and developing strategies for success, you can turn your passion into a reality and achieve your entrepreneurial goals. Create a request for the above document based on the following criteria: Task: Summarization - Concise - Create a request for a summary that is short and concise (1-2 sentences). Produce only the request and nothing else. Surround the request in <request> and </request> tags. Example: <request>This is a request.</request> #### 回复生成 回复生成流程较为简单,本质上仅需将上下文与请求拼接,再将其作为正式请求发送至大语言模型(Large Language Model)即可。 示例提示词: Entrepreneurship 101: Turning Your Idea into a Reality Starting a business can be a daunting task, especially for those who are new to the world of entrepreneurship. However, with the right mindset and a solid understanding of the basics, anyone can turn their idea into a successful venture. In this post, we'll cover the key steps to take when starting a business, from ideation to funding and beyond. It all begins with an idea. Maybe you've identified a problem in your community that you'd like to solve, or perhaps you have a passion that you'd like to turn into a career. Whatever your idea may be, it's essential to take the time to refine it and make sure it's viable. Ask yourself questions like "Who is my target audience?" and "What sets my product or service apart from the competition?" Once you have a solid idea, it's time to start thinking about funding. There are several options when it comes to funding a business, including bootstrapping, venture capital, and angel investors. Bootstrapping involves using your own savings or revenue to fund your business, while venture capital and angel investors involve seeking out external funding from investors. Each option has its pros and cons, and it's crucial to choose the one that best fits your business needs. For example, bootstrapping allows you to maintain control over your business, but it can also limit your growth potential. On the other hand, seeking out external funding can provide the resources you need to scale quickly, but it may require you to give up some equity. Once you've secured funding, it's time to start thinking about strategies for success. This includes things like building a strong team, developing a marketing plan, and creating a solid business model. It's also essential to be flexible and adapt to changes in the market or unexpected setbacks. By staying focused and committed to your vision, you can overcome obstacles and build a successful business. In conclusion, starting a business requires careful planning, hard work, and dedication. By refining your idea, securing funding, and developing strategies for success, you can turn your passion into a reality and achieve your entrepreneurial goals. Summarize the document "Entrepreneurship 101: Turning Your Idea into a Reality" in 1-2 concise sentences, focusing on the main steps to start a business. Only use information available in the document in your response. #### 引用生成 为从Llama 405B生成回复,我们通过HallOumi与以下1-shot示例在GCP上对Llama 3.1 405B Instruct运行推理任务。 示例代码: INSTRUCTIONS = """You are an expert AI assistant, and your task is to analyze a provided answer and identify the relevant lines associated with the answer. You will be given a context, a request, and a response, with each claim of the response separated by <claim> tags. You must output the identifiers of the relevant sentences from the context, an explanation of what those relevant sentences indicate about the claim and whether or not it's supported, as well as a final value of <supported> or <not-supported> based on whether or not the claim is supported. Note that claims which are unsupported likely still have relevant lines indicating that the claim is not supported (due to conflicting or missing information in an appropriate area).""" EXAMPLE_REQUEST = "Make one or more claims about information in the documents." EXAMPLE_CONTEXT = """Ipswich manager Mick McCarthy: "The irony was that poor old Alex Smithies cost them the second goal which set us up to win as comprehensively as we did.>He then kept it from being an embarrassing scoreline, but I'll take three.>""" EXAMPLE_RESPONSE = """""" EXAMPLE_OUTPUT = """""" messages = [ {'role': 'system', 'content': INSTRUCTIONS}, {'role': 'user', 'content': f"{EXAMPLE_CONTEXT}{EXAMPLE_RESPONSE}" {'role': 'assistant', 'content': EXAMPLE_OUTPUT, ] 为使用对应的句子与回复标签标注语句,我们利用[wtpsplit]("https://github.com/segment-any-text/wtpsplit")完成分句,移除拆分后得到的空元素,并按从前往后的顺序完成标注。 完成推理任务后,我们对输出结果进行了基础清洗以保证格式统一: * 移除`<claim>`标签前与最终`</claim>`标签后的无关文本 * 确保所有回复的主张(按`<claim>`拆分)数量与初始数量一致 * 移除换行符与首尾空白字符 * 确保每条输出主张中均包含`<claim>`, `</claim>`以及`<supported>`或`<not-supported>`标签 未满足上述要求的样本将被从数据集中移除。 #### 数据收集与处理 回复通过Google Cloud平台上的Oumi批量推理任务收集得到。 #### 个人与敏感信息 本数据集未包含,且大概率不会涉及任何个人、敏感或隐私信息。 ## 偏差、风险与局限性 1. 源提示词由Llama-3.1-405B-Instruct生成,可能会反映该模型存在的各类偏差; 2. 生成的回复大概率会体现Llama-3.1-405B-Instruct的偏差与局限性。 ## 引用格式 **BibTeX格式:** @misc{oumiGroundednessBenchmark, author = {Jeremiah Greer}, title = {Oumi Groundedness Benchmark}, month = {March}, year = {2025}, url = {"https://huggingface.co/datasets/oumi-ai/oumi-groundedness-benchmark"} } @software{oumi2025, author = {Oumi Community}, title = {Oumi: an Open, End-to-end Platform for Building Large Foundation Models}, month = {January}, year = {2025}, url = {"https://github.com/oumi-ai/oumi"} }
提供机构:
maas
创建时间:
2025-04-09
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作