five

Dwaraka/Testing_Dataset_of_Project_Gutebberg_Gothic_Fiction

收藏
Hugging Face2023-02-27 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/Dwaraka/Testing_Dataset_of_Project_Gutebberg_Gothic_Fiction
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-generation language: - en pretty_name: TRAINING & TESTING DATASETS size_categories: - 1M<n<10M --- TRAINING_CORPUS.txt The TRAINING_CORPUS is the collection of 12 books (The modern Prometheus, The liar of the white worm by bram Stoker, The Vampyre; a Tale, Nightmare Abbey; by Thomas Love Peacock', The History of Caliph Vathek by William Beckford The Lock and Key Library :Classic Mystery and Detectives Stories: Old Time, Caleb Williams; Or,Things as they are by William Godwin , The Private Memoirs and confessions of a justified sinner, Confessions of an English Opium Eater, The mysteries of udolpho, Wieland;Or,The Transformation: An American Tale by Charles Brocken Brown, The Castle of Otranto) which contains 1051518 Words and 6002980 characters from Project Gutenberg(https://www.gutenberg.org/), of the GOTHIC FICTION Genre. This text is fed as input to the PROJECT_GUTENBERG_GOTHIC_FICTION_TEXT_GENERATION_gpt2 model to perform the Text-Generation to get the Gothic Fiction style outputs. TESTING_CORPPUS.txt The TESTING_CORPPUS is the random text manually picked from the TRAINING_CORPPUS to evaluate the model.
提供机构:
Dwaraka
原始信息汇总

数据集概述

任务类别

  • 文本生成

语言

  • 英语

数据集名称

  • TRAINING & TESTING DATASETS

数据集大小

  • 1M<n<10M

训练数据集

  • 文件名: TRAINING_CORPUS.txt
  • 内容: 包含12本书的集合,总计1051518个单词和6002980个字符,来自Project Gutenberg的哥特式小说类别。
  • 用途: 作为输入数据用于训练PROJECT_GUTENBERG_GOTHIC_FICTION_TEXT_GENERATION_gpt2模型,以生成哥特式小说风格的文本。

测试数据集

  • 文件名: TESTING_CORPPUS.txt
  • 内容: 从TRAINING_CORPUS中手动挑选的随机文本,用于评估模型性能。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作