Short Jokes
收藏www.kaggle.com2017-02-06 更新2025-01-16 收录
下载链接:
https://www.kaggle.com/abhinavmoudgil95/short-jokes
下载链接
链接失效反馈官方服务:
资源简介:
**Context**
Generating humor is a complex task in the domain of machine learning, and it requires the models to understand the deep semantic meaning of a joke in order to generate new ones. Such problems, however, are difficult to solve due to a number of reasons, one of which is the lack of a database that gives an elaborate list of jokes. Thus, a large corpus of over 0.2 million jokes has been collected by scraping several websites containing funny and short jokes.
Visit my [Github repository](https://github.com/amoudgl/short-jokes-dataset) for more information regarding collection of data and the scripts used.
**Content**
This dataset is in the form of a csv file containing 231,657 jokes. Length of jokes ranges from 10 to 200 characters. Each line in the file contains a unique ID and joke.
**Disclaimer**
It has been attempted to keep the jokes as clean as possible. Since the data has been collected by scraping websites, it is possible that there may be a few jokes that are inappropriate or offensive to some people.
在机器学习的领域内,幽默的生成是一项复杂的任务,它要求模型能够深刻理解笑话的深层语义意义,以便生成新的笑话。然而,由于诸多原因,此类问题难以解决,其中之一便是缺乏一个提供详尽笑话列表的数据库。因此,通过爬取包含有趣短笑话的多个网站,已收集到超过二十万条笑话的大型语料库。欲了解更多关于数据收集和所使用脚本的信息,请访问我的[GitHub仓库](https://github.com/amoudgl/short-jokes-dataset)。
数据集以csv文件的形式呈现,包含231,657条笑话。笑话的长度介于10至200个字符之间。文件中的每一行都包含一个唯一标识符和一条笑话。
免责声明:我们已尽力确保笑话的纯洁性。由于数据是通过爬取网站收集的,因此可能存在一些不适宜或冒犯某些人的笑话。
提供机构:
Kaggle



