igorktech/nllb-200-75K-spa

Name: igorktech/nllb-200-75K-spa
Creator: igorktech
Published: 2024-07-02 10:12:24
License: 暂无描述

Hugging Face2024-07-02 更新2024-07-06 收录

下载链接：

https://hf-mirror.com/datasets/igorktech/nllb-200-75K-spa

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集包含75000个样本，主要用于多语言文本对的分析。每个样本包含以下字段：laser_score（浮点数类型，表示某种评分或相似度）、lang1（字符串类型，表示第一种语言）、text1（字符串类型，表示第一种语言的文本）、lang2（字符串类型，表示第二种语言）、text2（字符串类型，表示第二种语言的文本）、blaser_sim（浮点数类型，表示另一种评分或相似度）。数据集仅包含一个训练集分割，总大小为17123425.85660714字节，下载大小为11070943字节。

This dataset contains 75,000 samples and is primarily used for analyzing multilingual text pairs. Each sample includes the following fields: laser_score (float64, representing some kind of score or similarity), lang1 (string, representing the first language), text1 (string, representing the text in the first language), lang2 (string, representing the second language), text2 (string, representing the text in the second language), and blaser_sim (float64, representing another score or similarity). The dataset includes only a training split, with a total size of 17,123,425.85660714 bytes and a download size of 11,070,943 bytes.

提供机构：

igorktech

原始信息汇总

数据集概述

数据集特征

laser_score: 类型为 float64
lang1: 类型为 string
text1: 类型为 string
lang2: 类型为 string
text2: 类型为 string
blaser_sim: 类型为 float64

数据集分割

train: 包含 75000 个样本，占用 17123425.85660714 字节

数据集大小

下载大小: 11070943 字节
数据集大小: 17123425.85660714 字节

配置

default: 包含训练数据文件，路径为 data/train-*

5,000+

优质数据集

54 个

任务类型

进入经典数据集