tingkart/SynteticNorway
收藏Hugging Face2023-08-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tingkart/SynteticNorway
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- question-answering
language:
- 'no'
pretty_name: Syntetic Norway Dataset
size_categories:
- 100M<n<1B
---
Dataset Card for Syntetic Norway Knowledge Dataset
Dataset Summary
This dataset consists of question and answer pairs in the Norwegian language, covering topics related to Norway, its culture, governance, history, economy, geography, people, and international relations. Generated using OpenAI ChatGPT3.5 and Claud 2 on 09.08.2023 basedd on the great work of airoboros repository https://github.com/jondurbin/airoboros/tree/main.
Both the configuration file and topics list are included in the files.
Supported Tasks and Leaderboards
Question Answering: Benchmark for models to understand and respond to questions related to Norway.
Language Modeling: Useful for training models in the Norwegian language with specific knowledge about Norway.
Languages
Norwegian (Bokmål and Nynorsk).
Dataset Structure
Data Instances
The dataset contains pairs of questions and answers in Norwegian.
Data Fields
Concept: The broader topic under which the question falls.
Assistance: The question presented to the model.
Text: The corresponding answer generated by the model.
Data Splits
[More Information Needed]
Dataset Creation
Curation Rationale
The dataset was curated to promote the study of Norway and to support research in Norwegian language processing.
Source Data
Initial Data Collection and Normalization
Data was generated using OpenAI's ChatGPT3.5 and Claud 2 on 09.08.2023.
Who are the source language producers?
OpenAI and Claud 2.
Annotations
Annotation process
[More Information Needed]
Who are the annotators?
[More Information Needed]
Personal and Sensitive Information
The dataset does not include any personal or sensitive information.
Considerations for Using the Data
Social Impact of Dataset
This dataset serves as a rich resource for researchers and educators focusing on Norway and the Norwegian language.
Discussion of Biases
[More Information Needed]
Other Known Limitations
[More Information Needed]
Additional Information
Dataset Curators
A team of researchers and linguistic experts focused on Norwegian studies.
Licensing Information
Creative Commons Attribution 4.0 International License.
Citation Information
[More Information Needed]
Contributions
[More Information Needed]
提供机构:
tingkart
原始信息汇总
Syntetic Norway Knowledge Dataset 概述
数据集基本信息
- 许可证: Apache-2.0
- 任务类别:
- 问答
- 语言模型
- 语言: 挪威语 (Bokmål 和 Nynorsk)
- 数据集名称: Syntetic Norway Dataset
- 数据集大小: 100M<n<1B
数据集内容
- 数据集概述: 包含关于挪威的问题和答案对,涉及挪威的文化、治理、历史、经济、地理、人民和国际关系等主题。
- 生成工具: 使用 OpenAI ChatGPT3.5 和 Claud 2 于 2023年09月08日生成。
- 数据结构:
- 数据实例: 包含问题和答案对。
- 数据字段:
- 概念: 问题所属的更广泛主题。
- 协助: 向模型提出的问题。
- 文本: 模型生成的相应答案。
数据集创建
- 精选理由: 为了促进对挪威的研究和支持挪威语言处理的研究。
- 源数据:
- 初始数据收集和规范化: 数据由 OpenAI 和 Claud 2 生成。
- 源语言生产者: OpenAI 和 Claud 2。
使用数据集的考虑
- 社会影响: 为专注于挪威和挪威语的研究人员和教育者提供丰富的资源。
- 偏见讨论: [更多信息待补充]
- 其他已知限制: [更多信息待补充]
附加信息
- 数据集管理员: 专注于挪威研究的研究人员和语言专家团队。
- 许可信息: Creative Commons Attribution 4.0 International License.
- 引用信息: [更多信息待补充]
- 贡献: [更多信息待补充]



