crodri/meteocat
收藏Hugging Face2023-11-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/crodri/meteocat
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ca
multilinguality:
- monolingual
pretty_name: synthetic_meteocat
size_categories:
- 10K<n<100K
task_categories:
- text-generation
- token-classification
- question-answering
task_ids:
- named-entity-recognition
---
# Dataset Card for Meteocat
## Table of Contents
- [Table of Contents](#table-of-contents)
- [Dataset Description](#dataset-description)
- [Dataset Summary](#dataset-summary)
- [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards)
- [Languages](#languages)
- [Dataset Structure](#dataset-structure)
- [Data Instances](#data-instances)
- [Data Fields](#data-fields)
- [Data Splits](#data-splits)
- [Additional Information](#additional-information)
- [Dataset Curators](#dataset-curators)
- [Licensing Information](#licensing-information)
- [Citation Information](#citation-information)
- [Contributions](#contributions)
## Dataset Description
- **Point of Contact:** [langtech@bsc.es](langtech@bsc.es)
### Dataset Summary
This is a synthetic dataset that contains examples, each of them, with the following fields:
- Instructions like "El dissabte a la nit, quin temps farà a Mont-real?"
- Context like "Day: dissabte | Location: Mont-real | mati: el cel estarà molt ennuvolat | tarda: plourà escadusserament | nit: el cel tendirà a estar cobert de núvols | temp: Lleugera pujada de les temperatures"
- Response like "A la nit el cel estarà ennuvolat"
Added instructions for answering "yes" or "no" questions.
### Supported Tasks and Leaderboards
This dataset is mainly intended to train models for text-generation and named-entity-recognition.
### Languages
The dataset is in Catalan (`ca-CA`).
## Dataset Structure
The dataset consists of examples in a jsonl format with 3 fields each: instruction, context and response.
### Data Instances
Changed origina context for a more linguistically natural one: "tarda del divendres a Montesquiu al mati s'esperen més nuvolades, a la tarda guspirejarà amb insistència, a la nit podria guspirejar, i Temperatures sense canvis"
{
"instruction": "Quin temps farà a la nit a Camarasa dijous?",
xxx "context": "Day: dijous | Location: Camarasa | mati: el cel anirà encapotant-se cada cop més | tarda: el sol anirà guanyant terreny als núvols | nit: cel clar | temp: Temperatures sense canvis",
"response": "A la nit, cel ben clar"
}
### Data Fields
- instruction: Weather-related question.
xxx - context: Information in the format "Day: [DAY] | Location: [LOCATION] | mati: [WEATHER FORECAST] | tarda: [WEATHER FORECAST] | nit: [WEATHER FORECAST]".
- response: Whether forecast answering the question.
### Data Splits
* dev.json: 6873 examples
* test.json: 1279 examples
* train.json: 61776 examples
## Additional Information
### Dataset Curators
Text Mining Unit (TeMU) at the Barcelona Supercomputing Center (bsc-temu@bsc.es)
This work was funded by the [Departament de la Vicepresidència i de Polítiques Digitals i Territori de la Generalitat de Catalunya](https://politiquesdigitals.gencat.cat/ca/inici/index.html#googtrans(ca|en) within the framework of [Projecte AINA](https://politiquesdigitals.gencat.cat/ca/economia/catalonia-ai/aina).
### Licensing Information
???? [Creative Commons Attribution Non-commercial No-Derivatives 4.0 International](https://creativecommons.org/licenses/by-nc-nd/4.0/).
### Contributions
[N/A]
提供机构:
crodri
原始信息汇总
数据集卡片 for Meteocat
数据集描述
数据集摘要
这是一个合成数据集,包含以下字段:
- 指令,例如 "El dissabte a la nit, quin temps farà a Mont-real?"
- 上下文,例如 "Day: dissabte | Location: Mont-real | mati: el cel estarà molt ennuvolat | tarda: plourà escadusserament | nit: el cel tendirà a estar cobert de núvols | temp: Lleugera pujada de les temperatures"
- 响应,例如 "A la nit el cel estarà ennuvolat"
增加了回答 "是" 或 "否" 问题的指令。
支持的任务和排行榜
该数据集主要用于训练文本生成和命名实体识别模型。
语言
该数据集使用加泰罗尼亚语 (ca-CA)。
数据集结构
数据集由包含三个字段的 jsonl 格式示例组成:指令、上下文和响应。
数据实例
原始上下文已更改为更符合语言自然性的内容:"tarda del divendres a Montesquiu al mati sesperen més nuvolades, a la tarda guspirejarà amb insistència, a la nit podria guspirejar, i Temperatures sense canvis" json { "instruction": "Quin temps farà a la nit a Camarasa dijous?", "context": "Day: dijous | Location: Camarasa | mati: el cel anirà encapotant-se cada cop més | tarda: el sol anirà guanyant terreny als núvols | nit: cel clar | temp: Temperatures sense canvis", "response": "A la nit, cel ben clar" }
数据字段
- 指令:与天气相关的问题。
- 上下文:格式为 "Day: [DAY] | Location: [LOCATION] | mati: [WEATHER FORECAST] | tarda: [WEATHER FORECAST] | nit: [WEATHER FORECAST]" 的信息。
- 响应:回答问题的天气预报。
数据分割
- dev.json: 6873 个示例
- test.json: 1279 个示例
- train.json: 61776 个示例



