five

joshuasundance/wikiquote_tv

收藏
Hugging Face2023-12-11 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/joshuasundance/wikiquote_tv
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en tags: - television - roleplaying license: mit size_categories: - 100K<n<1M task_categories: - text-generation - conversational --- # wikiquote_tv 🤖 This README was written by GPT-4. 🤖 # Overview This dataset, gathered from [wikiquote.org](https://en.wikiquote.org/), contains a comprehensive collection of quotes, actions, and conversations from various television shows. The dataset is particularly useful for training language models, offering a rich source of dialogues and narrative structures. # Dataset Description - **Content**: Quotes, actions, and conversations from a range of TV shows. - **Source**: Extracted from [wikiquote.org](https://en.wikiquote.org/). - **Structure**: The dataset includes data classes for `Action`, `Quote`, and `Conversation`, encapsulating individual elements of dialogues. # Features - **Parse Functionality**: Extracts quotes and actions from the HTML content of [wikiquote.org](https://en.wikiquote.org/) pages. - **Comprehensive Coverage**: Includes shows across a wide range of genres and time periods. - **Customizable**: Flexible to be used for a variety of NLP tasks and research. # Usage The dataset may be useful for: - **Natural Language Understanding**: Understanding context, humor, and character dynamics in conversations. - **Language Modeling**: Training models to generate dialogues or predict next lines in conversations. - **Cultural Analysis**: Studying trends and themes across various television shows. # License This dataset and the accompanying code are released under the [MIT License](./LICENSE.md). The contents of the data are collected from [wikiquote.org](https://en.wikiquote.org/) as per [the repo code](https://huggingface.co/datasets/joshuasundance/wikiquote_tv/blob/main/wikiquote_tv.ipynb), and no ownership or rights are claimed over the data. # Disclaimer This dataset is intended for research and educational purposes. # Contributions Contributions are welcome! Feel free to submit issues or pull requests [on the HuggingFace repository](https://huggingface.co/datasets/joshuasundance/wikiquote_tv).
提供机构:
joshuasundance
原始信息汇总

wikiquote_tv 数据集概述

概览

该数据集从 wikiquote.org 收集,包含来自各种电视节目的引言、动作和对话的全面集合。特别适用于训练语言模型,提供丰富的对话和叙事结构资源。

数据集描述

  • 内容: 来自多个电视节目的引言、动作和对话。
  • 来源: 从 wikiquote.org 提取。
  • 结构: 数据集包括 ActionQuoteConversation 数据类,封装对话的各个元素。

特点

  • 解析功能: 从 wikiquote.org 页面的 HTML 内容中提取引言和动作。
  • 全面覆盖: 包含跨越多种类型和时间段的节目。
  • 可定制: 适用于多种自然语言处理任务和研究。

用途

该数据集可能适用于:

  • 自然语言理解: 理解对话中的上下文、幽默和角色动态。
  • 语言建模: 训练模型生成对话或预测对话中的下一句。
  • 文化分析: 研究各种电视节目中的趋势和主题。

许可证

该数据集及其配套代码在 MIT 许可证 下发布。数据内容从 wikiquote.org 收集,根据 仓库代码 进行,不对数据内容主张所有权或权利。

免责声明

该数据集旨在用于研究和教育目的。

贡献

欢迎贡献!请在 HuggingFace 仓库 提交问题或拉取请求。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作