appliedml2024/text_embedding
收藏Hugging Face2024-11-27 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/appliedml2024/text_embedding
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含四种不同的文本嵌入类型:BERT、Sentence_T5、Qwen2和SFR,每种嵌入类型的数据格式为float32序列。数据集分为训练集、测试集和开发集,分别包含15810、955和995个样本。训练集的大小为599262240字节,测试集为36198320字节,开发集为37714480字节。总下载大小为712239471字节,数据集总大小为673175040字节。
The dataset contains four different types of text embeddings: BERT, Sentence_T5, Qwen2, and SFR, with each embedding type formatted as a float32 sequence. The dataset is divided into training, test, and development sets, containing 15810, 955, and 995 samples respectively. The training set size is 599262240 bytes, the test set is 36198320 bytes, and the development set is 37714480 bytes. The total download size is 712239471 bytes, and the total dataset size is 673175040 bytes.
提供机构:
appliedml2024



