yourbench-testing/test-complex-custom-schema
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/yourbench-testing/test-complex-custom-schema
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为测试复杂自定义模式,是使用YourBench(一个从文档集合生成领域特定基准的开源框架)生成的。数据集包含两个配置:1) chunked配置包含文档ID、文本、文件名、元数据、摘要、摘要模型以及单跳和多跳文本块;2) ingested配置包含文档ID、文本、文件名和元数据。数据集生成流程包括三个步骤:数据摄取(将原始文档转换为标准化markdown)、文本分块(将文本分割为单跳和多跳块)和单次问题生成(使用LLM为每个块生成独立的问题-答案对)。
This dataset named Test Complex Custom Schema was generated using YourBench (v0.6.0), an open-source framework for generating domain-specific benchmarks from document collections. The dataset contains two configurations: 1) chunked configuration includes document ID, text, filename, metadata, summary, summarization model, and both single-hop and multi-hop chunks; 2) ingested configuration includes document ID, text, filename and metadata. The dataset generation pipeline consists of three steps: ingestion (converting raw documents to normalized markdown), chunking (splitting texts into single-hop and multi-hop chunks), and single_shot_question_generation (generating standalone question-answer pairs per chunk using LLM).
提供机构:
yourbench-testing



