lsb/enwiki20250301_paraphrase_multilingual_minilm_l12_v2
收藏Hugging Face2025-08-19 更新2025-10-25 收录
下载链接:
https://hf-mirror.com/datasets/lsb/enwiki20250301_paraphrase_multilingual_minilm_l12_v2
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含三个特征字段:文本块(chunk)、标题(title)和嵌入向量(embeddings)。文本块和标题是字符串类型,嵌入向量是浮点数列表。数据集分为训练集,其大小为54515759245字节,共有17269951个示例。整个数据集的大小为54515759245字节,下载大小为51009055231字节。具体的数据集内容和用途没有在README中描述。
The dataset includes three feature fields: text chunk (chunk), title (title), and embedding vectors (embeddings). The text chunk and title are of string type, and the embedding vectors are lists of floats. The dataset is split into a training set, which is 54515759245 bytes in size and contains 17,269,951 examples. The total size of the dataset is 54515759245 bytes, and the download size is 51,009,055,231 bytes. The specific content and purpose of the dataset are not described in the README.
提供机构:
lsb



