five

CoIR-Retrieval/codefeedback-st

收藏
Hugging Face2024-09-12 更新2025-04-08 收录
下载链接:
https://hf-mirror.com/datasets/CoIR-Retrieval/codefeedback-st
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含三个配置:语料库(corpus)、默认(default)和查询(queries)。每个配置具有特定的特征,如_id、partition、text、title、language和meta_information。语料库和查询配置具有相似的特征,而default配置包括query-id、corpus-id和score。数据集分为不同的部分,每个部分都有字节数和示例数的详细信息。corpus部分为246,229,656字节,包含156,526个示例,default的train部分为3,578,836字节,包含125,220个示例,default的test部分为894,734字节,包含31,306个示例,queries部分为118,682,563字节,包含156,526个示例。每个配置还提到了数据集大小和下载大小。README还包含了如何使用MTEB评估框架对特定任务进行模型评估的说明以及提供的代码片段。

The dataset consists of three configurations: corpus, default, and queries. Each configuration has specific features such as _id, partition, text, title, language, and meta_information. The corpus and queries configurations share similar features, while the default configuration includes query-id, corpus-id, and score. The dataset is split into different parts with detailed information on the number of bytes and examples for each split. The corpus split is 246,229,656 bytes with 156,526 examples, the train split for default is 3,578,836 bytes with 125,220 examples, the test split for default is 894,734 bytes with 31,306 examples, and the queries split is 118,682,563 bytes with 156,526 examples. The dataset sizes and download sizes are also mentioned for each configuration. The README also includes instructions on how to use the MTEB evaluation framework to assess a model on specific tasks using the provided code snippet.
提供机构:
CoIR-Retrieval
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作