five

PRODATPHIL -- Science and Logic v. 0.2

收藏
DataCite Commons2026-02-19 更新2026-05-07 收录
下载链接:
https://datastore.uni-muenster.de/doi/10.17879/91968401496
下载链接
链接失效反馈
官方服务:
资源简介:
This data set is an output of the DFG-funded project "Prodatphil -- Science and Logic" (project number 537184692), a cooperation of the Center of Philosophy of Science and the Service Center for Digital Humanities of the university of Münster (principal investigator Stefan Heßbrüggen-Walter). The project started in July 2024 and is currently funded until July 2027. Authors in alphabetic order: Ingo Frank (metadata), Stefan Heßbrüggen-Walter (conception, construction of the corpus) This is an alpha version with no guarantees and subject to change. Scope The data set aims at researchers interested in the application of DH methods to philosophical texts. It is limited to texts in the public domain from the 19th century in English. Future versions will include German and French source texts as well. Please note that these are historical sources which may contain racist tropes or other content that discriminates against groups of people. Data are made available for research purposes and are not to be understood as propaganda or an endorsement of discriminatory practices. Content This corpus contains in sum 207 texts, 72 books and 135 articles. 16 texts were published anonymously. The remaining texts were written by 119 known authors and published between 1830 and 1956. Articles were published in 13 journals. The English subcorpus contains in sum 164 texts, 42 books and 122 articles. 15 texts were published anonymously. The remaining texts were written by 97 known authors and published between 1839 and 1930. Articles were published in 10 journals. The French subcorpus contains in sum 36 texts, 23 books and 13 articles. 1 text was published anonymously. The remaining texts were written by 20 known authors and published between 1830 and 1956. Articles were published in 3 journals. The German subcorpus contains in sum 7 texts, 7 books and no articles. No texts were published anonymously. The texts were written by 6 known authors and published between 1868 and 1919. No articles were published in journals. More details in the README of the data set and a notebook containing a corpus description. Technical requirements The data set comprises TEI files and metadata. All data can be accessed using standard software (text or XML editor, spreadsheet software, browser). No special tools are required. Data provenance We retrieved digital full texts from https://www.gutenberg.org and https://en.wikisource.org. Since the data are hosted in Germany, German copyright law applies. Therefore the date of death of the original author rather than the year of publication is the criterion for whether or not a text is in the public domain. Data model The data set contains two three CSV files containing metadata and 94 files containing TEI encoded text. The metadata format is not yet definitive and errors in the TEI encoding are possible. These known bugs will be corrected in the next update. Some texts were published as a series of installments. The original metadata have been preserved, information linking this bibliographical information to the files in the data set can be found in metadata_series.csv. Data reuse Data and metadata are in the public domain. Acknowledgments This data set description was inspired by Middle, S. A documentation checklist for (Linked) humanities data. Int J Digit Humanities 5, 353–371 (2023). https://doi.org/10.1007/s42803-023-00072-z
提供机构:
University of Münster
创建时间:
2025-09-26
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作