five

ailabstw/indexing-fixtures

收藏
Hugging Face2026-02-05 更新2026-02-07 收录
下载链接:
https://hf-mirror.com/datasets/ailabstw/indexing-fixtures
下载链接
链接失效反馈
官方服务:
资源简介:
该存储库作为一个全面的测试数据集,用于验证跨不同文件类型和边缘情况的索引功能。它确保我们的索引管道能够处理现实世界中的文档场景,并在所有支持的格式中保持一致的性能。数据集根据文件类型、特定问题、测试目的和问题主题组织成目录。包括PDF、Word、Excel、PowerPoint、CSV、TSV、图像、HTML/XML和压缩档案等多种文件格式。数据集还涵盖特定客户问题、隐私敏感文件、质量保证测试文件、学术论文、政府文件和大学特定文件。README进一步解释了命名约定、技术细节、使用场景、质量保证措施、问题类别、贡献指南和与数据集相关的标签。

This repository serves as a comprehensive test dataset for validating indexing functionality across diverse file types and edge cases. It ensures that our indexing pipeline can handle real-world document scenarios and maintain consistent performance across all supported formats. The dataset is organized into directories based on file types, specific issues, testing purposes, and problem topics. It includes a wide range of file formats such as PDF, Word, Excel, PowerPoint, CSV, TSV, images, HTML/XML, and compressed archives. The dataset also covers specific client issues, privacy-sensitive files, quality assurance test files, academic papers, government documents, and university-specific documents. The README further explains the naming conventions, technical details, usage scenarios, quality assurance measures, problem categories, contributing guidelines, and tags associated with the dataset.
提供机构:
ailabstw
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作