five

recursal/Europarl-Translation-Instruct

收藏
Hugging Face2024-06-13 更新2024-06-29 收录
下载链接:
https://hf-mirror.com/datasets/recursal/Europarl-Translation-Instruct
下载链接
链接失效反馈
官方服务:
资源简介:
europarl-translation-instruct是一个基于europarl数据构建的翻译指令数据集。该数据集由M8than策展,Recursal.ai资助,并遵循CC-BY-SA 4.0许可。数据集包含句子、段落和完整文本的翻译对话,以JSONL格式存储,每条记录代表一次对话。数据集的目标是使AI技术对所有人开放,无论其语言或经济状况如何。

europarl-translation-instruct是一个基于europarl数据构建的翻译指令数据集。该数据集由M8than策展,Recursal.ai资助,并遵循CC-BY-SA 4.0许可。数据集包含句子、段落和完整文本的翻译对话,以JSONL格式存储,每条记录代表一次对话。数据集的目标是使AI技术对所有人开放,无论其语言或经济状况如何。
提供机构:
recursal
原始信息汇总

Dataset Card for Europarl-Translation-Instruct

Dataset Details

Dataset Description

  • Curated by: M8than
  • Funded by: Recursal.ai
  • Shared by: M8than
  • Language(s) (NLP): English instruct (but various languages in)
  • License: cc-by-sa-4.0

Dataset Sources

Processing and Filtering

  • Prerequisite: Download the source dataset from https://www.statmt.org/europarl/.
  • Scripts: Extract every translation of the europarl transcripts and match them together to create various translation instruct datasets.

Format

  • Dataset files: JSONL with each line representing one conversation.

  • Example: json {"conversation":[{"sender":"system","message":"You will be given some text and you must respond only with the text if spoken by someone who speaks en"},{"sender":"user","message":"Ich halte dies für ein ganz legitimes Ansinnen"},{"sender":"assistant","message":"I think it is a fairly legitimate request"}]}

  • Structure: Each line is keyed by the word "conversation" which contains an array of message dictionaries with sender and message keys.

Data Splits

  • sentences: Contains sentence translation conversations.
  • paragraphs: Contains paragraph translation conversations.
  • full: Contains full transcript translations.

Licensing Information

  • Content: This release contains content from europarl transformed into a conversational instruction dataset.
  • Waifus: Recursal Waifus (The banner image) are licensed under CC-BY-SA. They do not represent the related websites in any official capacity unless otherwise or announced by the website. You may use them as a banner image. However, you must always link back to the dataset.

Citation Information

@ONLINE{europarl-translation-instruct, title = {europarl-translation-instruct}, author = {M8than, recursal.ai}, year = {2024}, howpublished = {url{https://huggingface.co/datasets/recursal/europarl-translation-instruct}}, }

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作