Mollel/Swahili-NLi-Triplet-SWH-ENG

Name: Mollel/Swahili-NLi-Triplet-SWH-ENG
Creator: Mollel
Published: 2024-06-30 22:01:03
License: 暂无描述

Hugging Face2024-06-30 更新2024-07-06 收录

下载链接：

https://hf-mirror.com/datasets/Mollel/Swahili-NLi-Triplet-SWH-ENG

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含多个语言样本，每个样本由四个特征组成：语言（language）、锚点（anchor）、正例（positive）和负例（negative）。数据集分为训练集（train）、开发集（dev）和测试集（test）三个部分，分别包含1115700、13168和13218个样本。训练集的大小为216345137字节，开发集为2755279字节，测试集为2878107字节。整个数据集的下载大小为84955951字节，总大小为221978523字节。

This dataset contains multiple language samples, each consisting of four features: language, anchor, positive, and negative. The dataset is divided into three parts: train, dev, and test, containing 1115700, 13168, and 13218 samples respectively. The train set is 216345137 bytes in size, the dev set is 2755279 bytes, and the test set is 2878107 bytes. The total download size of the dataset is 84955951 bytes, and the overall size is 221978523 bytes.

提供机构：

Mollel

原始信息汇总

数据集概述

特征信息

language: 数据类型为字符串（string）
anchor: 数据类型为字符串（string）
positive: 数据类型为字符串（string）
negative: 数据类型为字符串（string）

数据分割

train: 包含1,115,700个样本，占用216,345,137字节
dev: 包含13,168个样本，占用2,755,279字节
test: 包含13,218个样本，占用2,878,107字节

数据集大小

下载大小: 84,955,951字节
总数据集大小: 221,978,523字节

配置信息

config_name: default
- data_files:
  - train: 路径为data/train-*
  - dev: 路径为data/dev-*
  - test: 路径为data/test-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集