joshuasundance/wikiquote_tv
收藏Hugging Face2023-12-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/joshuasundance/wikiquote_tv
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
tags:
- television
- roleplaying
license: mit
size_categories:
- 100K<n<1M
task_categories:
- text-generation
- conversational
---
# wikiquote_tv
🤖 This README was written by GPT-4. 🤖
# Overview
This dataset, gathered from [wikiquote.org](https://en.wikiquote.org/), contains a comprehensive collection of quotes, actions, and conversations from various television shows. The dataset is particularly useful for training language models, offering a rich source of dialogues and narrative structures.
# Dataset Description
- **Content**: Quotes, actions, and conversations from a range of TV shows.
- **Source**: Extracted from [wikiquote.org](https://en.wikiquote.org/).
- **Structure**: The dataset includes data classes for `Action`, `Quote`, and `Conversation`, encapsulating individual elements of dialogues.
# Features
- **Parse Functionality**: Extracts quotes and actions from the HTML content of [wikiquote.org](https://en.wikiquote.org/) pages.
- **Comprehensive Coverage**: Includes shows across a wide range of genres and time periods.
- **Customizable**: Flexible to be used for a variety of NLP tasks and research.
# Usage
The dataset may be useful for:
- **Natural Language Understanding**: Understanding context, humor, and character dynamics in conversations.
- **Language Modeling**: Training models to generate dialogues or predict next lines in conversations.
- **Cultural Analysis**: Studying trends and themes across various television shows.
# License
This dataset and the accompanying code are released under the [MIT License](./LICENSE.md). The contents of the data are collected from [wikiquote.org](https://en.wikiquote.org/) as per [the repo code](https://huggingface.co/datasets/joshuasundance/wikiquote_tv/blob/main/wikiquote_tv.ipynb), and no ownership or rights are claimed over the data.
# Disclaimer
This dataset is intended for research and educational purposes.
# Contributions
Contributions are welcome! Feel free to submit issues or pull requests [on the HuggingFace repository](https://huggingface.co/datasets/joshuasundance/wikiquote_tv).
提供机构:
joshuasundance
原始信息汇总
wikiquote_tv 数据集概述
概览
该数据集从 wikiquote.org 收集,包含来自各种电视节目的引言、动作和对话的全面集合。特别适用于训练语言模型,提供丰富的对话和叙事结构资源。
数据集描述
- 内容: 来自多个电视节目的引言、动作和对话。
- 来源: 从 wikiquote.org 提取。
- 结构: 数据集包括
Action、Quote和Conversation数据类,封装对话的各个元素。
特点
- 解析功能: 从 wikiquote.org 页面的 HTML 内容中提取引言和动作。
- 全面覆盖: 包含跨越多种类型和时间段的节目。
- 可定制: 适用于多种自然语言处理任务和研究。
用途
该数据集可能适用于:
- 自然语言理解: 理解对话中的上下文、幽默和角色动态。
- 语言建模: 训练模型生成对话或预测对话中的下一句。
- 文化分析: 研究各种电视节目中的趋势和主题。
许可证
该数据集及其配套代码在 MIT 许可证 下发布。数据内容从 wikiquote.org 收集,根据 仓库代码 进行,不对数据内容主张所有权或权利。
免责声明
该数据集旨在用于研究和教育目的。
贡献
欢迎贡献!请在 HuggingFace 仓库 提交问题或拉取请求。



