five

EMDFL: Early Modern Dissertations in French Libraries

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14967725
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset consists of two zip files, `emdfl_data.zip` and `emdfl_data_code.zip`. The `emdfl_data.zip` archive consists of seven CSV files which contain information -- with different levels of precision and completeness -- on between 17 942 and 55 178 dissertations published between 1564 and 1800 and held in French libraries. The datasets were derived from catalogue records in the general catalogue of the French national library (Bibliothèque Nationale de France, BNF) and the union catalogue of French university libraries (SUDOC). The `emdfl_data_code.zip` archive contains the underlying git repository, code in Python notebooks and downloaded files in order to document the research process. It is documented in a README file in the root directory of the repository. Here, we only describe the selection criteria and additional details for the `emdfl_data.zip`archive. In order to be included in the final dataset, catalogue records had to meet four criteria: We were able to find a determinate date of publication in the catalogue record. We could identify at least one reference to an author or contributor. We do not expect a unique identifier derived froman authority file. We could find a valid title. The dissertation was published after 1563. The last criterion is somewhat ad hoc, but is based on the diagnosis that for records containing an earlier date, the status of the underlying books as academic dissertations is somewhat dubious. All records meeting these four criteria are part of the 'bronze' dataset. It is coextensive with what we have called the ‘silver’ dataset for libraries. In other words, all records in the bronze dataset also contain valid identifiers for holding libraries. Both sets contain 55 178 records. The ‘silver’ dataset ‘Place’ documents in addition uniquely identifiable places of publication and contains 49 423 records. The silver dataset ‘Discipline’ contains additional information about the discipline to which a given dissertation belongs and contains 36 924 records. The silver dataset ‘Persons’ contains all records for which we could obtain at least one valid VIAF identifier for a person (an author, supervisor, or printer). It comprises 31 184 records. All records associated with one of these files have a unique ID, so that these datasets can be merged for analysis e. g. of the temporal and geographical distribution of dissertations, a topic that would require the intersection of the silver ‘Discipline’ and ‘Place’ datasets. The ‘gold’ dataset ‘Person Year Place’ includes at least one unique identifier for a person in the catalogue record, the year and the place of publication. It comprises 27 413 catalogue records. The ‘gold’ dataaset ‘All’ includes the same information as ‘Person Year Place’, but in addition also references the discipline of the dissertation. It contains 17 942 records.

本数据集包含两个压缩归档文件,分别为`emdfl_data.zip`与`emdfl_data_code.zip`。其中`emdfl_data.zip`内含7个逗号分隔值(Comma-Separated Values,CSV)文件,这些文件以不同精度与完整度,记录了1564年至1800年间出版、藏于法国各图书馆的17942至55178篇学位论文的相关信息。本数据集源自法国国家图书馆(Bibliothèque Nationale de France,BNF)联合目录以及法国大学图书馆联合目录(SUDOC)的馆藏编目记录。 `emdfl_data_code.zip`归档文件则包含支撑该研究的Git代码仓库、Python Notebook代码及下载的辅助文件,用于完整复现研究流程。该归档的根目录下附有README文件以提供完整说明,本文仅针对`emdfl_data.zip`归档的筛选标准与补充细节展开说明。 若要被纳入最终数据集,编目记录需满足四项筛选条件: 1. 可从编目记录中确定确切的出版日期; 2. 至少可识别出一位作者或贡献者信息,不要求提供源自权威档(authority file)的唯一标识符; 3. 可获取有效的论文标题; 4. 该学位论文的出版年份晚于1563年。 最后一项标准带有一定的特设性,其依据为:对于标注更早出版年份的编目记录,其对应图书作为学术学位论文的属性存疑。满足上述四项条件的所有记录均属于「青铜数据集」(bronze dataset),该数据集与我们针对图书馆场景定义的「白银数据集」(silver dataset)范围完全一致。换言之,青铜数据集的所有记录同时包含馆藏图书馆的有效标识符,二者均包含55178条记录。 「白银数据集·出版地」(Silver ‘Place’)额外收录了可唯一识别的出版地信息,共包含49423条记录;「白银数据集·学科」(Silver ‘Discipline’)补充了对应学位论文所属学科的相关信息,共包含36924条记录;「白银数据集·人物」(Silver ‘Persons’)收录了所有可获取至少一位相关人物(作者、指导教师或印刷者)的有效VIAF(Virtual International Authority File)标识符的记录,共计31184条记录。 上述所有文件关联的记录均配有唯一ID,因此可根据分析需求对这些数据集进行合并,例如研究学位论文的时间与地理分布特征——这类分析需要结合「白银数据集·学科」与「白银数据集·出版地」的交集数据。 「黄金数据集·人物-年份-出版地」(Gold ‘Person Year Place’)至少包含一条编目记录中相关人物的唯一标识符、出版年份与出版地信息,共计27413条编目记录;「黄金数据集·全字段」(Gold ‘All’)则包含「黄金数据集·人物-年份-出版地」的全部信息,同时额外补充了学位论文的学科信息,共包含17942条记录。
创建时间:
2025-03-05
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作