five

tingkart/SynteticNorway

收藏
Hugging Face2023-08-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/tingkart/SynteticNorway
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - question-answering language: - 'no' pretty_name: Syntetic Norway Dataset size_categories: - 100M<n<1B --- Dataset Card for Syntetic Norway Knowledge Dataset Dataset Summary This dataset consists of question and answer pairs in the Norwegian language, covering topics related to Norway, its culture, governance, history, economy, geography, people, and international relations. Generated using OpenAI ChatGPT3.5 and Claud 2 on 09.08.2023 basedd on the great work of airoboros repository https://github.com/jondurbin/airoboros/tree/main. Both the configuration file and topics list are included in the files. Supported Tasks and Leaderboards Question Answering: Benchmark for models to understand and respond to questions related to Norway. Language Modeling: Useful for training models in the Norwegian language with specific knowledge about Norway. Languages Norwegian (Bokmål and Nynorsk). Dataset Structure Data Instances The dataset contains pairs of questions and answers in Norwegian. Data Fields Concept: The broader topic under which the question falls. Assistance: The question presented to the model. Text: The corresponding answer generated by the model. Data Splits [More Information Needed] Dataset Creation Curation Rationale The dataset was curated to promote the study of Norway and to support research in Norwegian language processing. Source Data Initial Data Collection and Normalization Data was generated using OpenAI's ChatGPT3.5 and Claud 2 on 09.08.2023. Who are the source language producers? OpenAI and Claud 2. Annotations Annotation process [More Information Needed] Who are the annotators? [More Information Needed] Personal and Sensitive Information The dataset does not include any personal or sensitive information. Considerations for Using the Data Social Impact of Dataset This dataset serves as a rich resource for researchers and educators focusing on Norway and the Norwegian language. Discussion of Biases [More Information Needed] Other Known Limitations [More Information Needed] Additional Information Dataset Curators A team of researchers and linguistic experts focused on Norwegian studies. Licensing Information Creative Commons Attribution 4.0 International License. Citation Information [More Information Needed] Contributions [More Information Needed]
提供机构:
tingkart
原始信息汇总

Syntetic Norway Knowledge Dataset 概述

数据集基本信息

  • 许可证: Apache-2.0
  • 任务类别:
    • 问答
    • 语言模型
  • 语言: 挪威语 (Bokmål 和 Nynorsk)
  • 数据集名称: Syntetic Norway Dataset
  • 数据集大小: 100M<n<1B

数据集内容

  • 数据集概述: 包含关于挪威的问题和答案对,涉及挪威的文化、治理、历史、经济、地理、人民和国际关系等主题。
  • 生成工具: 使用 OpenAI ChatGPT3.5 和 Claud 2 于 2023年09月08日生成。
  • 数据结构:
    • 数据实例: 包含问题和答案对。
    • 数据字段:
      • 概念: 问题所属的更广泛主题。
      • 协助: 向模型提出的问题。
      • 文本: 模型生成的相应答案。

数据集创建

  • 精选理由: 为了促进对挪威的研究和支持挪威语言处理的研究。
  • 源数据:
    • 初始数据收集和规范化: 数据由 OpenAI 和 Claud 2 生成。
    • 源语言生产者: OpenAI 和 Claud 2。

使用数据集的考虑

  • 社会影响: 为专注于挪威和挪威语的研究人员和教育者提供丰富的资源。
  • 偏见讨论: [更多信息待补充]
  • 其他已知限制: [更多信息待补充]

附加信息

  • 数据集管理员: 专注于挪威研究的研究人员和语言专家团队。
  • 许可信息: Creative Commons Attribution 4.0 International License.
  • 引用信息: [更多信息待补充]
  • 贡献: [更多信息待补充]
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作