Nicolas-BZRD/English_French_Songs_Lyrics_Translation_Original
收藏Hugging Face2024-02-08 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Nicolas-BZRD/English_French_Songs_Lyrics_Translation_Original
下载链接
链接失效反馈官方服务:
资源简介:
---
license: unknown
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
dataset_info:
features:
- name: artist_name
dtype: string
- name: album_name
dtype: string
- name: year
dtype: int64
- name: title
dtype: string
- name: number
dtype: int64
- name: original_version
dtype: string
- name: french_version
dtype: string
- name: language
dtype: string
splits:
- name: train
num_bytes: 250317845
num_examples: 99289
download_size: 122323323
dataset_size: 250317845
task_categories:
- translation
- text-generation
language:
- fr
- en
- es
- it
- de
- ko
- id
- pt
- 'no'
- fi
- sv
- sw
- hr
- so
- ca
- tl
- ja
- nl
- ru
- et
- tr
- ro
- cy
- vi
- af
- hu
- sk
- sl
- cs
- da
- pl
- sq
- el
- he
- zh
- th
- bg
- ar
tags:
- music
- parallel
- parallel data
pretty_name: SYFT
size_categories:
- 10K<n<100K
---
# Original Songs Lyrics with French Translation
### Dataset Summary
Dataset of 99289 songs containing their metadata (author, album, release date, song number), original lyrics and lyrics translated into French.
Details of the number of songs by language of origin can be found in the table below:
| Original language | Number of songs |
|---|:---|
| en | 75786 |
| fr | 18486 |
| es | 1743 |
| it | 803 |
| de | 691 |
| sw | 529 |
| ko | 193 |
| id | 169 |
| pt | 142 |
| no | 122 |
| fi | 113 |
| sv | 70 |
| hr | 53 |
| so | 43 |
| ca | 41 |
| tl | 36 |
| ja | 35 |
| nl | 32 |
| ru | 29 |
| et | 27 |
| tr | 22 |
| ro | 19 |
| cy | 14 |
| vi | 14 |
| af | 13 |
| hu | 10 |
| sk | 10 |
| sl | 10 |
| cs | 7 |
| da | 6 |
| pl | 5 |
| sq | 4 |
| el | 4 |
| he | 3 |
| zh-cn | 2 |
| th | 1 |
| bg | 1 |
| ar | 1 |
## Citation
Our work can be cited as:
```bash
@misc{faysse2024croissantllm,
title={CroissantLLM: A Truly Bilingual French-English Language Model},
author={Manuel Faysse and Patrick Fernandes and Nuno Guerreiro and António Loison and Duarte Alves and Caio Corro and Nicolas Boizard and João Alves and Ricardo Rei and Pedro Martins and Antoni Bigata Casademunt and François Yvon and André Martins and Gautier Viaud and Céline Hudelot and Pierre Colombo},
year={2024},
eprint={2402.00786},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
```
提供机构:
Nicolas-BZRD
原始信息汇总
数据集概述
数据集基本信息
- 名称: SYFT
- 许可证: unknown
- 配置:
- 默认配置:
- 数据文件:
- 训练数据: data/train-*
- 数据文件:
- 默认配置:
- 下载大小: 122323323字节
- 数据集大小: 250317845字节
- 训练样本数: 99289
数据集特征
- artist_name: 字符串
- album_name: 字符串
- year: 整数
- title: 字符串
- number: 整数
- original_version: 字符串
- french_version: 字符串
- language: 字符串
任务类别
- 翻译
- 文本生成
支持语言
- fr
- en
- es
- it
- de
- ko
- id
- pt
- no
- fi
- sv
- sw
- hr
- so
- ca
- tl
- ja
- nl
- ru
- et
- tr
- ro
- cy
- vi
- af
- hu
- sk
- sl
- cs
- da
- pl
- sq
- el
- he
- zh
- th
- bg
- ar
标签
- music
- parallel
- parallel data
数据集大小类别
- 10K<n<100K
数据集内容
- 包含内容: 99289首歌曲的元数据(作者、专辑、发行日期、歌曲编号)、原始歌词及法语翻译歌词。
- 语言分布:
- en: 75786首
- fr: 18486首
- es: 1743首
- it: 803首
- de: 691首
- sw: 529首
- ko: 193首
- id: 169首
- pt: 142首
- no: 122首
- fi: 113首
- sv: 70首
- hr: 53首
- so: 43首
- ca: 41首
- tl: 36首
- ja: 35首
- nl: 32首
- ru: 29首
- et: 27首
- tr: 22首
- ro: 19首
- cy: 14首
- vi: 14首
- af: 13首
- hu: 10首
- sk: 10首
- sl: 10首
- cs: 7首
- da: 6首
- pl: 5首
- sq: 4首
- el: 4首
- he: 3首
- zh-cn: 2首
- th: 1首
- bg: 1首
- ar: 1首



