five

crodri/meteocat

收藏
Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/crodri/meteocat
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ca multilinguality: - monolingual pretty_name: synthetic_meteocat size_categories: - 10K<n<100K task_categories: - text-generation - token-classification - question-answering task_ids: - named-entity-recognition --- # Dataset Card for Meteocat ## Table of Contents - [Table of Contents](#table-of-contents) - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Point of Contact:** [langtech@bsc.es](langtech@bsc.es) ### Dataset Summary This is a synthetic dataset that contains examples, each of them, with the following fields: - Instructions like "El dissabte a la nit, quin temps farà a Mont-real?" - Context like "Day: dissabte | Location: Mont-real | mati: el cel estarà molt ennuvolat | tarda: plourà escadusserament | nit: el cel tendirà a estar cobert de núvols | temp: Lleugera pujada de les temperatures" - Response like "A la nit el cel estarà ennuvolat" Added instructions for answering "yes" or "no" questions. ### Supported Tasks and Leaderboards This dataset is mainly intended to train models for text-generation and named-entity-recognition. ### Languages The dataset is in Catalan (`ca-CA`). ## Dataset Structure The dataset consists of examples in a jsonl format with 3 fields each: instruction, context and response. ### Data Instances Changed origina context for a more linguistically natural one: "tarda del divendres a Montesquiu al mati s'esperen més nuvolades, a la tarda guspirejarà amb insistència, a la nit podria guspirejar, i Temperatures sense canvis" { "instruction": "Quin temps farà a la nit a Camarasa dijous?", xxx "context": "Day: dijous | Location: Camarasa | mati: el cel anirà encapotant-se cada cop més | tarda: el sol anirà guanyant terreny als núvols | nit: cel clar | temp: Temperatures sense canvis", "response": "A la nit, cel ben clar" } ### Data Fields - instruction: Weather-related question. xxx - context: Information in the format "Day: [DAY] | Location: [LOCATION] | mati: [WEATHER FORECAST] | tarda: [WEATHER FORECAST] | nit: [WEATHER FORECAST]". - response: Whether forecast answering the question. ### Data Splits * dev.json: 6873 examples * test.json: 1279 examples * train.json: 61776 examples ## Additional Information ### Dataset Curators Text Mining Unit (TeMU) at the Barcelona Supercomputing Center (bsc-temu@bsc.es) This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina). ### Licensing Information ???? [Creative Commons Attribution Non-commercial No-Derivatives 4.0 International](https://creativecommons.org/licenses/by-nc-nd/4.0/). ### Contributions [N/A]
提供机构:
crodri
原始信息汇总

数据集卡片 for Meteocat

数据集描述

数据集摘要

这是一个合成数据集,包含以下字段:

  • 指令,例如 "El dissabte a la nit, quin temps farà a Mont-real?"
  • 上下文,例如 "Day: dissabte | Location: Mont-real | mati: el cel estarà molt ennuvolat | tarda: plourà escadusserament | nit: el cel tendirà a estar cobert de núvols | temp: Lleugera pujada de les temperatures"
  • 响应,例如 "A la nit el cel estarà ennuvolat"

增加了回答 "是" 或 "否" 问题的指令。

支持的任务和排行榜

该数据集主要用于训练文本生成和命名实体识别模型。

语言

该数据集使用加泰罗尼亚语 (ca-CA)。

数据集结构

数据集由包含三个字段的 jsonl 格式示例组成:指令、上下文和响应。

数据实例

原始上下文已更改为更符合语言自然性的内容:"tarda del divendres a Montesquiu al mati sesperen més nuvolades, a la tarda guspirejarà amb insistència, a la nit podria guspirejar, i Temperatures sense canvis" json { "instruction": "Quin temps farà a la nit a Camarasa dijous?", "context": "Day: dijous | Location: Camarasa | mati: el cel anirà encapotant-se cada cop més | tarda: el sol anirà guanyant terreny als núvols | nit: cel clar | temp: Temperatures sense canvis", "response": "A la nit, cel ben clar" }

数据字段

  • 指令:与天气相关的问题。
  • 上下文:格式为 "Day: [DAY] | Location: [LOCATION] | mati: [WEATHER FORECAST] | tarda: [WEATHER FORECAST] | nit: [WEATHER FORECAST]" 的信息。
  • 响应:回答问题的天气预报。

数据分割

  • dev.json: 6873 个示例
  • test.json: 1279 个示例
  • train.json: 61776 个示例
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作