five

realdanielbyrne/AgathaChristieText

收藏
Hugging Face2024-04-29 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/realdanielbyrne/AgathaChristieText
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-generation language: - en1 size_categories: - 10K<n<100K data_files: - split: train path: "train.parquet" - split: test path: "test.parquet" - split: validation path: "vaidation.parquet" --- ![image/jpeg](https://cdn-uploads.huggingface.co/production/uploads/6529649ef12059832245402a/XgL4tZhZd0fuYMcIJnifC.jpeg) # Dataset Card for AgathaChristieText There is a tide in the affairs of men, Which, taken at the flood, leads on to fortune, Omitted, all the voyage of their life Is bound in shallows and in miseries. On such a full sea are we now afloat, And we must take the current when it serves, Or lose our ventures. ## Dataset Summary The agatha_christie dataset contains the complete works of Agatha Christie, including the novels, short stories, and plays. The dataset is in English and contains 110 works in total. The dataset is intended for text generation tasks, such as language modeling, and can be used to train models to generate text in the style of Agatha Christie. ## Supported Tasks and Leaderboards The dataset can be used for text generation tasks, such as language modeling. The dataset can be used to train models to generate text in the style of Agatha Christie. ## Languages The text in the dataset is in English. ## Dataset Structure ### Dataset instances The following is an example sample from the dataset. {"text":"Mrs. McGillicuddy was short and stout, the porter was tall and free-striding. In addition, Mrs. McGillicuddy was burdened with a large quantity of parcels; the result of a day’s Christmas shopping. The race was, therefore, an uneven one, and the porter turned the corner at the end of the platform whilst Mrs. McGillicuddy was still coming up the straight.", "source": "4-50 from paddington - agatha christie.epub" } ### Data Fields - text: The text of the work chunked into semantic segments by llamaindex SemanticNodeParser. - source: The source material of the text. ### Splits The dataset is split into tain, test and validation splits. - train.parquet - test.parquet - validation.parquet
提供机构:
realdanielbyrne
原始信息汇总

数据集卡片 for AgathaChristieText

数据集概述

agatha_christie数据集包含阿加莎·克里斯蒂的全部作品,包括小说、短篇故事和剧本。该数据集为英文,总共包含110部作品。数据集旨在用于文本生成任务,如语言建模,并可用于训练模型以生成阿加莎·克里斯蒂风格的文本。

支持的任务和排行榜

该数据集可用于文本生成任务,如语言建模。数据集可用于训练模型以生成阿加莎·克里斯蒂风格的文本。

语言

数据集中的文本为英文。

数据集结构

数据实例

以下是数据集中的一个示例样本:

json { "text": "Mrs. McGillicuddy was short and stout, the porter was tall and free-striding. In addition, Mrs. McGillicuddy was burdened with a large quantity of parcels; the result of a day’s Christmas shopping. The race was, therefore, an uneven one, and the porter turned the corner at the end of the platform whilst Mrs. McGillicuddy was still coming up the straight.", "source": "4-50 from paddington - agatha christie.epub" }

数据字段

  • text: 作品的文本被llamaindex SemanticNodeParser分割成语义段。
  • source: 文本的来源材料。

数据分割

数据集被分割为训练集、测试集和验证集。

  • train.parquet
  • test.parquet
  • validation.parquet
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作