Mollel/Swahili-NLi-Triplet-SWH-ENG
收藏Hugging Face2024-06-30 更新2024-07-06 收录
下载链接:
https://hf-mirror.com/datasets/Mollel/Swahili-NLi-Triplet-SWH-ENG
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含多个语言样本,每个样本由四个特征组成:语言(language)、锚点(anchor)、正例(positive)和负例(negative)。数据集分为训练集(train)、开发集(dev)和测试集(test)三个部分,分别包含1115700、13168和13218个样本。训练集的大小为216345137字节,开发集为2755279字节,测试集为2878107字节。整个数据集的下载大小为84955951字节,总大小为221978523字节。
This dataset contains multiple language samples, each consisting of four features: language, anchor, positive, and negative. The dataset is divided into three parts: train, dev, and test, containing 1115700, 13168, and 13218 samples respectively. The train set is 216345137 bytes in size, the dev set is 2755279 bytes, and the test set is 2878107 bytes. The total download size of the dataset is 84955951 bytes, and the overall size is 221978523 bytes.
提供机构:
Mollel
原始信息汇总
数据集概述
特征信息
- language: 数据类型为字符串(string)
- anchor: 数据类型为字符串(string)
- positive: 数据类型为字符串(string)
- negative: 数据类型为字符串(string)
数据分割
- train: 包含1,115,700个样本,占用216,345,137字节
- dev: 包含13,168个样本,占用2,755,279字节
- test: 包含13,218个样本,占用2,878,107字节
数据集大小
- 下载大小: 84,955,951字节
- 总数据集大小: 221,978,523字节
配置信息
- config_name: default
- data_files:
- train: 路径为
data/train-* - dev: 路径为
data/dev-* - test: 路径为
data/test-*
- train: 路径为
- data_files:



