five

yourbench-testing/test_extensive_custom_schema_demo

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/yourbench-testing/test_extensive_custom_schema_demo
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为Test Extensive Custom Schema Demo,是使用YourBench开源框架从文档集合生成的领域特定基准测试数据集。数据集包含两个配置:1) chunked配置:包含文档ID、文本内容、文件名、元数据(文件大小)、文档摘要、摘要模型信息,以及将文本分割成的单跳和多跳文本块;2) ingested配置:包含文档ID、文本内容、文件名和元数据(文件大小)。数据集通过三个步骤生成:首先摄取原始文档并转换为标准化markdown格式,然后将文本分割成基于token的单跳和多跳块,最后使用LLM为每个块生成独立的问题-答案对。

This dataset, named Test Extensive Custom Schema Demo, was generated using the YourBench open-source framework to create domain-specific benchmarks from document collections. It contains two configurations: 1) chunked configuration: includes document ID, text content, filename, metadata (file size), document summary, summarization model information, as well as single-hop and multi-hop text chunks; 2) ingested configuration: includes document ID, text content, filename and metadata (file size). The dataset was generated through a three-step pipeline: first ingesting raw documents and converting them to normalized markdown format, then splitting texts into token-based single-hop and multi-hop chunks, and finally using LLM to generate standalone question-answer pairs for each chunk.
提供机构:
yourbench-testing
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作