yourbench-testing/test_extensive_custom_schema_demo
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/yourbench-testing/test_extensive_custom_schema_demo
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为Test Extensive Custom Schema Demo,是使用YourBench开源框架从文档集合生成的领域特定基准测试数据集。数据集包含两个配置:1) chunked配置:包含文档ID、文本内容、文件名、元数据(文件大小)、文档摘要、摘要模型信息,以及将文本分割成的单跳和多跳文本块;2) ingested配置:包含文档ID、文本内容、文件名和元数据(文件大小)。数据集通过三个步骤生成:首先摄取原始文档并转换为标准化markdown格式,然后将文本分割成基于token的单跳和多跳块,最后使用LLM为每个块生成独立的问题-答案对。
This dataset, named Test Extensive Custom Schema Demo, was generated using the YourBench open-source framework to create domain-specific benchmarks from document collections. It contains two configurations: 1) chunked configuration: includes document ID, text content, filename, metadata (file size), document summary, summarization model information, as well as single-hop and multi-hop text chunks; 2) ingested configuration: includes document ID, text content, filename and metadata (file size). The dataset was generated through a three-step pipeline: first ingesting raw documents and converting them to normalized markdown format, then splitting texts into token-based single-hop and multi-hop chunks, and finally using LLM to generate standalone question-answer pairs for each chunk.
提供机构:
yourbench-testing



