roettger/eighteenth_century_french_novels
收藏Hugging Face2024-04-09 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/roettger/eighteenth_century_french_novels
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc-by-4.0
task_categories:
- text-generation
language:
- fr
pretty_name: Collection of Eighteenth-Century French Novels (1751-1800)
size_categories:
- 10M<n<100M
---
# General information
This dataset contains 12 Mio Token of Literary French prose 1751-1800 in plain text format, built within the project 'Mining and Modeling Text' (2019-2023) at Trier University.
For the dataset in XML/TEI see the [GitHub repository of the project](https://github.com/MiMoText/roman18/blob/master/README.md).
# Collection de romans français du dix-huitième siècle (1751-1800) / Collection of Eighteenth-Century French Novels (1751-1800)
This collection of Eighteenth-Century French Novels contains 200 digital French texts of novels created or first published between 1751 and 1800. The collection is created in the context of [Mining and Modeling Text](https://www.mimotext.uni-trier.de/en) (2019-2023), a project which is located at the Trier Center for Digital Humanities ([TCDH](https://tcdh.uni-trier.de/en)) at Trier University.
## Metadata
There is a metadata file on the level of the full texts. The column names are explained in the next paragraph.
# Data Fields
* filename: file name
* au-name: author name
* au-birth: birth date of author
* au-death: death date of author
* title: title of literary work
* au-gender: gender of author
* firsted-yr: first year of publication
* printSource-yr: year of publication of print source
* form: narrative form
* spelling: information in historical spelling
* data-capture: information on data capture
* token count: token count of text file
* vols_count: count of volumes ('tome')
* size: size according to Eltec scheme https://distantreading.github.io/Schema/eltec-1.html#TEI.size
* bgrf: unique identifier in 'Bibliographie du genre romanesque français, 1751-1800 (Martin / Mylne / Frautschi 1977)'
* author_wikidata: unique identifier of author on Wikidata
* author_MiMoText-ID: unique identifier of author on MiMoText: https://data.mimotext.uni-trier.de
* title_wikidata: unique identifier of title on Wikidata
* title_MiMoText-ID: unique identifier of title on MiMoText: https://data.mimotext.uni-trier.de
* lang: language of text file
* publisher: information on publisher
* distributor: information on distributor of file
* distribution_date: information on distribuation date
* copyright_status: information on copyrights status of text file
* digitalSource_Title: title of digital text source
* digitalSource_Ref: reference of digital source
* digitalSource_Publisher: publisher of digital source
* digitalSource_Date: date of digital source
* printSource_title: title of print source
* printSource_author: author according to print source
* printSource_pubPlace: place of publication according to print source
* printSource_publisher: publisher of print source
* printSource_date: date of publication of print source
* resp_datacapture: person responsible for data capture
* resp_encoding: person responsible for encoding
提供机构:
roettger
原始信息汇总
数据集概述
基本信息
- 许可证: cc-by-4.0
- 任务类别: 文本生成
- 语言: 法语
- 名称: Collection of Eighteenth-Century French Novels (1751-1800)
- 大小类别: 10M<n<100M
详细描述
- 内容: 包含1751-1800年间1200万词的法语文学散文,以纯文本格式提供。
- 来源: 由Trier大学的“文本挖掘与建模”项目(2019-2023)构建。
- 文本数量: 包含200部数字化的法语小说文本,这些小说创作或首次出版于1751至1800年间。
元数据
- 文件级别: 包含全文级别的元数据文件。
- 字段说明:
- filename: 文件名
- au-name: 作者姓名
- au-birth: 作者出生日期
- au-death: 作者逝世日期
- title: 文学作品标题
- au-gender: 作者性别
- firsted-yr: 首次出版年份
- printSource-yr: 印刷源出版年份
- form: 叙事形式
- spelling: 历史拼写信息
- data-capture: 数据采集信息
- token count: 文本文件的词数
- vols_count: 卷数(“tome”)
- size: 根据Eltec方案的大小信息
- bgrf: 在“Bibliographie du genre romanesque français, 1751-1800 (Martin / Mylne / Frautschi 1977)”中的唯一标识符
- author_wikidata: 作者在Wikidata上的唯一标识符
- author_MiMoText-ID: 作者在MiMoText上的唯一标识符
- title_wikidata: 标题在Wikidata上的唯一标识符
- title_MiMoText-ID: 标题在MiMoText上的唯一标识符
- lang: 文本文件的语言
- publisher: 出版者信息
- distributor: 文件分销者信息
- distribution_date: 分销日期信息
- copyright_status: 文本文件的版权状态信息
- digitalSource_Title: 数字文本源的标题
- digitalSource_Ref: 数字源的参考
- digitalSource_Publisher: 数字源的出版者
- digitalSource_Date: 数字源的日期
- printSource_title: 印刷源的标题
- printSource_author: 印刷源的作者
- printSource_pubPlace: 印刷源的出版地点
- printSource_publisher: 印刷源的出版者
- printSource_date: 印刷源的出版日期
- resp_datacapture: 负责数据采集的人员
- resp_encoding: 负责编码的人员



