yourbench-testing/test_simple_custom_schema
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/yourbench-testing/test_simple_custom_schema
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Test Simple Custom Schema,是使用YourBench框架生成的,用于从文档集合创建特定领域的基准测试。生成过程包含三个主要步骤:数据摄取(将原始文档转换为标准化markdown格式)、分块处理(将文本分割成基于token的单跳和多跳块)以及单次问题生成(使用LLM为每个块生成独立的问题-答案对)。数据集包含两种配置:chunked配置包含文档ID、文本、文件名、元数据、摘要、摘要模型以及单跳和多跳块信息;ingested配置则包含文档ID、文本、文件名和元数据等基本信息。
This dataset, named Test Simple Custom Schema, was generated using the YourBench framework to create domain-specific benchmarks from document collections. The generation process involves three main steps: ingestion (converting raw documents into normalized markdown), chunking (splitting texts into token-based single-hop and multi-hop chunks), and single-shot question generation (producing standalone question-answer pairs per chunk using LLM). The dataset includes two configurations: the chunked configuration contains document IDs, text, filenames, metadata, summaries, summarization models, and both single-hop and multi-hop chunk information; while the ingested configuration contains basic information such as document IDs, text, filenames, and metadata.
提供机构:
yourbench-testing



