withpi/jira_embedding_smalldoc_multilang_train_v2_qwen3_embedding_tokenized_8k_5
收藏Hugging Face2025-09-02 更新2025-09-13 收录
下载链接:
https://hf-mirror.com/datasets/withpi/jira_embedding_smalldoc_multilang_train_v2_qwen3_embedding_tokenized_8k_5
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了查询哈希、正例和负例段落哈希、类别、查询输入ID及其注意力掩码等特征。数据集分为训练集和测试集,其中训练集包含627,973个示例,大小为13,461,599,216字节,测试集包含56,368个示例,大小为1,343,488,064字节。数据集的总大小为14,805,087,280字节,下载大小为483,905,360字节。
The dataset includes features such as query hashes, positive and negative passage hashes, category, input IDs for queries, and attention masks. It is split into a training set with 627,973 examples, totaling 13,461,599,216 bytes, and a test set with 56,368 examples, totaling 1,343,488,064 bytes. The total size of the dataset is 14,805,087,280 bytes, with a download size of 483,905,360 bytes.
提供机构:
withpi



