ssounda1/mokka-chat-ds-v1
收藏Hugging Face2023-12-23 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/ssounda1/mokka-chat-ds-v1
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
language:
- en
tags:
- not-for-all-audiences
pretty_name: Poor Jokes Dataset
size_categories:
- 1K<n<10K
---
# Dataset Card for Dataset Name
This Dataset contains common poor jokes in the form of question answers.
## Dataset Details
### Dataset Description
This dataset contains common poor jokes. These jokes were curated by browsing various webpages. The goal behind building the dataset is to enable LLM finetuning for humorous responses.
The dataset covers different domains.
This dataset contains conversations that may be considered unsafe, offensive, or upsetting. We are not responsible for any outputs of the models trained on this dataset.
Statements or opinions made in this dataset do not reflect the views of researchers or institutions involved in the data collection effort.
Users of this data are responsible for ensuring its appropriate use, which includes abiding by any applicable laws and regulations.
- **Curated by:** Sri Soundararajan
- **Funded by [optional]:** Sri Soundararajan
- **Shared by [optional]:** Sri Soundararajan
- **Language(s) (NLP):** English
- **License:** MIT
### Dataset Sources [optional]
<!-- Provide the basic links for the dataset. -->
- **Repository:** https://huggingface.co/datasets/ssounda1/mokka-chat-ds-v1
- **Paper [optional]:** N/A
- **Demo [optional]:** N/A
## Uses
The dataset is to be used for building, pre-training and fine-tuning LLMs for a humor enhanced Question Answering use case.
### Direct Use
Adding a touch of humor into question answering
[More Information Needed]
### Out-of-Scope Use
<!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. -->
[More Information Needed]
## Dataset Structure
Simple structure of json blobs and lists -
{
"train": [
{
"source": <String: Include the URL when applicable>,
"data": [
{
"question": <String>,
"answers": [
<String>,
<String>
],
"context": <String>
}
]
}
]
}
## Dataset Creation
### Curation Rationale
To build a dataset for contextual Question Answering and adding humor along the way.
### Source Data
Sources are listed as part of the dataset structure.
#### Data Collection and Processing
Manually collected and processed.
#### Who are the source data producers?
Various webpages
## Dataset Card Contact
Sri Soundararajan <ssounda1.work@gmail.com>
This dataset contains common poor jokes in the form of question answers. These jokes were curated by browsing various webpages with the goal of enabling LLM finetuning for humorous responses. The dataset covers different domains and may contain conversations that are considered unsafe, offensive, or upsetting. Users of this data are responsible for ensuring its appropriate use, including abiding by any applicable laws and regulations. The dataset is curated and funded by Sri Soundararajan, in English, under the MIT license.
提供机构:
ssounda1
原始信息汇总
Poor Jokes Dataset 数据集概述
数据集描述
该数据集包含常见的低俗笑话,以问答形式呈现。这些笑话是通过浏览各种网页精心挑选的。构建此数据集的目的是为了支持大型语言模型(LLM)的微调,以生成幽默的回答。数据集涵盖了不同的领域,并可能包含被认为不安全、冒犯性或令人不快的对话。使用此数据集训练的模型产生的任何输出,我们不承担责任。数据集中的陈述或观点不代表参与数据收集的研究人员或机构的观点。用户需确保数据的适当使用,包括遵守任何适用的法律法规。
- 语言:英语
- 许可证:MIT
- 数据集创建者:Sri Soundararajan
数据集结构
数据集结构为简单的JSON对象和列表,格式如下:
json { "train": [ { "source": <String: 适用时包含URL>, "data": [ { "question": <String>, "answers": [ <String>, <String> ], "context": <String> } ] } ] }
数据集用途
该数据集用于构建、预训练和微调大型语言模型(LLM),以增强问答场景中的幽默感。
直接用途
在问答中加入幽默元素。
非适用用途
此部分内容待补充。
数据集创建
创建理由
构建一个用于上下文问答并加入幽默元素的数据集。
源数据
源数据来源列于数据集结构中。
数据收集和处理
数据是手动收集和处理的。
源数据生产者
各种网页。
数据集卡片联系人
Sri Soundararajan ssounda1.work@gmail.com



