Pile-Gutenberg
收藏魔搭社区2025-11-12 更新2024-08-31 收录
下载链接:
https://modelscope.cn/datasets/OmniData/Pile-Gutenberg
下载链接
链接失效反馈官方服务:
资源简介:
displayName: Pile-Gutenberg
license:
- MIT
taskTypes:
- Natural Language Generation
- Language Modelling
mediaTypes:
- Text
labelTypes:
- English Corpus
tags: []
publisher:
- EleutherAI
publishDate: '2023-07-18'
publishUrl: https://pile.eleuther.ai/
paperUrl: ''
---
# 数据介绍
## 简介
Pile-Gutenberg(PG-19)数据集是The Pile项目的一部分,是一个用于语言模型的数据集。该数据集基于古腾堡计划(Gutenberg Project)的公共领域文本构建而成。
古腾堡计划是一个致力于数字化和存档公共领域文学作品的项目,收集了大量的文学作品,包括小说、诗歌、戏剧、科学文献等。Pile-Gutenberg数据集利用这些公共领域的文本,为语言模型提供了丰富的文学资源。
PG-19数据集包含了19种不同的文学流派,如小说、诗歌、剧本等。它涵盖了从古典文学到现代文学的广泛范围,包括莎士比亚、狄更斯、奥斯卡·王尔德等知名作家的作品。
通过使用Pile-Gutenberg数据集,研究人员和开发者可以训练语言模型来理解和生成各种文学风格和格式的文本。这对于自然语言处理、文本生成、文学研究等领域的应用具有重要意义。
## 数据内容
### 数据说明
Pile-Gutenberg (PG-19)数据集涵盖了10.5G的数据。
### 数据示例
```
{
"id": "259462824",
"source_id": "",
"doc_id": "181570408",
"data_type": "text",
"data_source": "pile",
"data_url": "enwiki-c4-pile-ccnews",
"content": "\n\n\n\nProduced by Melissa McDaniel and the Online Distributed\nProofreading Team at http://www.pgdp.net (This file was\nproduced from images generously made available by The\nInternet Archive)\n\n\n\n\n\n\n\nTranscriber's Note:\n\n Inconsistent hyphenation and spelling in the original document have\n been preserved. Obvious typographical errors have been corrected.\n\n\n\n\n [Illustration]\n\n O, I AM PRINCE OF THE INKY IMPS\n AND KING OF THE BLOTTENTOT CREW;\n MY ANCESTREE HAS A PEDIGREE\n OF A ROYAL PURPLISH HUE.\n\n ONCE MY LOT WAS A DARK BLUE SPOT\n FLIPPED ON A MILK-WHITE SEA,\n A CREASE AND A FOLD--AND A BUCCANEER BOLD\n OUT JUMPED--AND THAT WAS ME!\n\n\n\n\n BLOTTENTOTS\n AND HOW TO MAKE THEM\n\n BY\n JOHN PROSPER CARMEL\n\n [Illustration:\n\n If you've never made a\n Blottentot\n This book will help you\n quite a lot!]\n\n PAUL ELDER AND COMPANY\n SAN FRANCISCO AND NEW YORK\n\n\n\n\n Copyright, 1907, by Paul Elder & Company.\n\n Entered at Stationers' Hall, London.\n\n\n\n\n These were made for Dymphna\n\n\n\n\n [Illustration]\n\n\nHOW TO MAKE BLOTTENTOTS\n\n\n To make a funny Blottentot,\n First take a piece of paper,\n Splash on some ink, a single spot,\n Crease, press, but cut no caper.\n\n Don't crease exactly at the blot--\n You'll have a fearful muddle;\n Press gently, too, and not a lot,\n Unless you want a puddle.\n\n With everything we humans do,\n Practice makes us apter:\n So start at once, you'll find it true\n At the end of your first chapter.\n\n\n\n\n [Illustration]\n\n\nA FLIT-FLIT FLITTER\n\n\n In the realms of wonderland\n Such flies do gaily flitter,\n But when they're just a blot of ink\n Of course they cannot glitter.\n\n They flitter, flutter round about,\n These Flitter-Flitter-Flitters,\n O'er dewy flow'ry sunny meads,\n The lightest, brightest critters.\n\n\n\n\n [Illustration]\n\n\nA GOBBLE-ME-UP\n\n\n Weedy, greedy Gobble-Me-Up,\n Your mouth is a fearful size.\n Do you live on little girls and boys,\n Or merely cakes and pies?\n\n\n\n\n [Illustration]\n\n\nTWO BUCKING NIGHTMARES\n\n\n Two bucking nightmares ran out to neigh,\n Thinking it night, but found it day,\n So took to their heels in sore dismay,--\n I'm 'fraid they still are running away.\n\n\n\n\n [Illustration]\n\n\nSTRANGE BUT TRUE\n\n\n Now it seems to be scarcely credible,\n A difficult thing to think,\n That such a strange grotesquerie\n Was pressed from a drop of ink.\n\n But word for word I tell you,\n As true as word can be,\n That in its making there was naught\n But the blindest chancerie.\n\n\n\n\n [Illustration]\n\n\nLAW-MAKERS\n\n\n Tom and Johnny Make-the-law,\n Talkative and lazy,\n Standing on a Thingumajig\n Comical and crazy.\n You are just a pair of Imps,\n With but one leg that badly limps.\n\n\n\n\n [Illustration]\n\n\nMISTRESS NELL\n\n\n Gadzooks, Nell Gwynne!\n How did you get in?\n Did you walk or were you brought\n in your chair?\n Your dress is perfection\n To the smallest section\n Of stomacher, quilting and hair.\n\n\n\n\n [Illustration]\n\n\nA PROFESSIONAL TIFF\n\n\n Said Dr. Spindleshanks,\n \"I'll stand no silly pranks!\"\n \"You're nothing but a prig!\"\n Said Dr. Funnywig.\n Then, making each a face,\n They went off at a pace.\n\n\n\n\n [Illustration]\n\n\nSAFE AT A DISTANCE\n\n\n You big Bugaboo!\n We didn't want you,\n But really now that you've come,\n If you keep far away\n We'll permit you to stay,\n Just as long you keep quite dumb.\n\n\n\n\n [Illustration]\n\n\nTEENY AND TINY\n\n\n Teeny and Tiny Pugnoses\n Have discovered two beautiful roses,\n But the stems are so tall\n They can't reach them at all,\n Though they stand on the tips\n of their toeses.\n\n\n\n\n [Illustration]\n\n\nIMPISH\n\n\n You can see by the look of this\n big-footed Sprite,\n That just the one thing that\n affords him delight\n Is to give a high jump and land\n on your toe,\n On the very same spot where\n the biggest corns grow.\n\n\n\n\n [Illustration]\n\n\nA LITTLE GRASS <DW40>\n\n\n This is a little grass <DW40>,\n As you know a most terrible fidget.\n For a month every year\n He makes it quite clear\n That he is a little grass <DW40>.\n\n\n\n\n [Illustration]\n\n\nSIAMESE TWINS?\n\n\n I hope they're on pegs,\n Because if they're legs,\n They are altogether shocking.\n They have no feet,\n And almost meet,\n And haven't the sign of a stocking.\n\n\n\n\n [Illustration]\n\n\nA KANGAR-ROOSTER-ROO\n\n\n Why, here's our dear old hopper,\n Our Kangar-rooster-roo!\n And seeing he's such a whopper,\n I'll certainly not say \"Shoo\"!\n\n Then there are two, you see,\n So I'd better hold my peace,\n Or they may sit on me\n And leave me a crumpled crease.\n\n\n\n\n [Illustration]\n\n\nA SURPRISE\n\n\n A Squidgeecumsquee\n Got up in a tree,\n And found another--\n The fac simile.\n \"Oh dear! oh my!\"\n He said jumping high,\n \"It's surely my brother--\n What a horrible guy!\"\n\n\n\n\n [Illustration]\n\n\nCONSIDERATE\n\n\n \"You jump over to me,\" said Sue.\n \"I wish you would come to me,\"\n said Loo;\n \"As sure as I jump\n I'll kick that stump,\n So really I'd rather let you.\"\n\n\n\n\n [Illustration]\n\n\nRISKY\n\n\n Now this is just the funniest rogue,\n A Brownie as black as ink,\n And what he's doing perched up there,\n I'm sure I cannot think.\n\n He's holding his arms like a pair of sails;\n Perhaps he's trying to fly.\n Let's hope he won't be playing that game\n When you and I pass by.\n\n\n\n\n [Illustration]\n\n\nDOGGEREL\n\n\n Here are the strangest pair of dogs,\n What sort I cannot tell,\n But judging by their noses sharp\n They have the sense of smell.\n\n Their tails are very, very long,--\n But does it really matter?\n By the very way they stare and start\n They're mad as any hatter.\n\n\n\n\n [Illustration]\n\n\nA WARNING\n\n\n Are these Quumps or Zagabogs,\n <DW57>s or Quees?\n Anyhow, you'd best look out,--\n They're just about to sneeze!\n\n\n\n\n [Illustration]\n\n\nTHE LATEST DISCOVERY\n\n\n I've just discovered a marvelous way\n Of making these Blottentots mottled and gray;\n If you promise you never will show any one\n I'll tell you the secret of how it is done.\n\n Take two bottles of ink, one thick and one thin,\n Of different blacks, and dip your pen in;\n From each splash a drop at the very same spot,\n Then do as before, only pressing a lot.\n\n\n\n\n [Illustration]\n\n\nSORRY GRIGS\n\n\n What makes these little Grigs so sad?\n They're standing most dejected.\n Have they been up to something bad\n And in it got detected?\n\n\n\n\n [Illustration]\n\n\nLANKY DOODLE\n\n\n Lanky Doodle came to town\n Without his little pony,\n Stuck a feather in his hat\n With bits of macaroni.\n\n\n\n\n [Illustration]\n\n\nTHE DANCE\n\n\n Jingle your bells and your tambourine\n For just such a dance as you never have seen;\n Such swishing of skirts, and glancing of feet,\n Such bowing and parting, then running to meet;\n So jingle your bells and your tambourine,\n And keep them a-dancing from morning till e'en.\n\n\n\n\n [Illustration]\n\n\nLOOK OUT FOR HIM!\n\n\n He's flying in the air,\n So you are safe and sound;\n But you had better skip\n When he lights upon the ground.\n\n\n\n\n [Illustration]\n\n\nMACBETH\n\nAct I, Scene I.\n\n\n \"When shall we 'two' meet again--\n In thunder, lightning, or in rain?\"\n \"When the hurly-burly's done,\n When the battle's lost and won.\"\n\n\n\n\n [Illustration]\n\n\nPERPLEXING\n\n\n A queer little wight,\n Very strangely dight,\n Looked so much like his brother,\n That, believe me, it's true,\n No one ever knew\n How to tell one from t'other.\n\n\n\n\n [Illustration]\n\n\nMERELY ACCIDENTAL\n\n\n Such angular shapes\n In such beautiful capes\n Are the silliest contradiction,\n But they simply \"came,\"\n So I'm not to blame;\n With Blottentots there's no restriction.\n\n\n\n\n [Illustration]\n\n\nBIRDS OF A FEATHER\n\n\n \"Now really it is shocking!\" irately said\n Miss B,\n \"To think that you are mocking and\n making fun of me.\n You have your wings and rufflings\n the very same as I,\n So you need not turn your nose up,\n with a twinkle in your eye.\"\n\n\n\n\n [Illustration]\n\n\nA DE-DUCK-TION\n\n\n Pluck\n A duck\n Of a wing.\n Alack!\n He'll quack,\n And not sing.\n\n\n\n\n [Illustration]\n\n\nAN OVERSIGHT\n\n\n Two Rabbits met and shook hands one day\n In the gravest possible kind of a way.\n But what was the cause of their serious mien\n From our picture is not very easily seen.\n They'd been jollier far if they'd stopped to sup\n The honeyed mead from the buttercup.\n\n\n\n\n [Illustration]\n\n\nQUITE THE THING\n\n\n Words fail\n To detail,\n I can only smile.\n Your salute\n Is cute\n And just perfect style.\n\n\n\n\n [Illustration]\n\n\nQUAINT AND QUEER\n\n\n Quaint and Queer,\n A funny pair,\n The funniest you could see,\n Met one day\n In a strange array,\n The strangest that could be.\n\n Each stood and stared\n As if he feared\n That he would get a poke;\n But laughed to find\n The other kind,\n And thought it all a joke.\n\n\n\n\n [Illustration]\n\n\nFINIS\n\n\n Before, I had some Cassowaries,\n Now I have two Dromedaries.\n So just to leave some shapes for you,\n I'll doff my cap and say adieu.\n\n\n\n\n\nEnd of the Project Gutenberg EBook of Blottentots and How to Make Them, by \nJohn Prosper Carmel\n\n*** \n",
"remark": {
"pile_set_name": "Gutenberg (PG-19)"
},
"sub_path": "gutenberg-pg-19/train"
}
```
## 引文
```
@misc{conghui2022opendatalab,
title={OpenDataLab: Empowering General Artificial Intelligence with Open Datasets},
author={Conghui He, Wei Li, Zhenjiang Jin, Bin Wang, Chao Xu, Dahua Lin},
journal={https://opendatalab.com/},
year={2022}
}
```
## Download dataset
:modelscope-code[]{type="git"}
提供机构:
maas
创建时间:
2024-07-12



