mteb/arena-arxiv-7-2-24
收藏Hugging Face2024-07-29 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/mteb/arena-arxiv-7-2-24
下载链接
链接失效反馈官方服务:
资源简介:
`mteb/arena-arxiv-7-2-24`数据集是一个包含截至2024年7月2日ArXiv科学论文的全面集合,专为MTEB(大规模文本嵌入基准)竞技场设计,用于训练和评估嵌入模型。每个实例代表一篇ArXiv论文,包含唯一标识符、标题、摘要和类别。该数据集可用于训练新的嵌入模型、评估现有模型在科学文献上的表现,以及进行主题建模、文档分类或信息检索研究。
The mteb/arena-arxiv-7-2-24 dataset is a comprehensive collection of scientific papers from ArXiv up to July 2, 2024. It is designed for use in the MTEB (Massive Text Embedding Benchmark) arena, where various embedding models compete and are ranked based on their performance. Each instance in the dataset represents a single paper from ArXiv and contains the following fields: id, title, abstract, and categories. The dataset is primarily intended for training and evaluating embedding models, as well as conducting research on topic modeling, document classification, or information retrieval in the scientific domain.
提供机构:
mteb



