five

paragraphs of the resolutions of the States General of the Dutch Republic (1576-1796)

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14224101
下载链接
链接失效反馈
官方服务:
资源简介:
Update 2025-03-24: A new version of the paragraphs has been released, with better date recognition and document segmentation (separately, a new version of the recognied named entties has been released https://doi.org/10.5281/zenodo.15074645).  This TSV file contains all the plain text paragraphs of all the resolutions of the States General of the Dutch Republic (1576-1796). These were created as part of the project REPUBLIC (REsolutions PUBLished In a Computational environment) which is funded by NWO grant 175.2017.024 There are a total of 1,062,723 paragraphs from 692,156 resolutions, making up a corpus of 129,688,306 words (according to Unix word count). The recognised named entities are available in a separate repository (https://doi.org/10.5281/zenodo.14577243) The file contains six columns: `session_date`: the date on which the resolution (decision) was reached. `resolution_id`: the identifier of the resolution. `para_id`: the identifier of the paragraph in the resolution. `line_start`: the identifier of the first line in the paragraph. This identifier contains the scan ID and the x,y,w,h coordinates of the line in the scan, so it the resolution text can be traced to where it starts in the scan.  `line_end`: the identifier of the last line in the paragraph. This identifier contains the scan ID and the x,y,w,h coordinates of the line in the scan, so it the resolution text can be traced to where it ends in the scan (which is not necessarily the same scan as the first line, as paragraphs can cross scan boundaries).  `text`: the text of the paragraph.   For more information on how this was generated, see the following publications: Marijn Koolen, Rik Hoekstra, Joris Oddens, Ronald Sluijter, Rutger van Koert, Ger Brouwer en Hennie Brugman, ‘The Value of Preexisting Structures for Digital Access Modelling the Resolutions of the Dutch States General’, Journal of Computing and Cultural Heritage 16:1 (2023). https://dl.acm.org/doi/10.1145/3575864 Marijn Koolen en Rik Hoekstra, ‘Detecting Formulaic Language Use in Historical Administrative Corpora’, in: F. Karsdorp, A. Lassche, en K. Nielbo eds., Proceedings of the Computational Humanities Research Conference 2022 (Antwerpen 2022) 127-151. Proceedings http://ceur-ws. org ISSN, 1613, 0073. https://ceur-ws.org/Vol-3290/long_paper5740.pdf Koolen, M., Hoekstra, R., Oddens, J., & Sluijter, R. (2023). 'Formulas and decision-making: the case of the states general of the Dutch Republic' in: F. Karsdorp, A. Lassche, en K. Nielbo eds., Proceedings of the Computational Humanities Research Conference 2023 (Paris 2023) 772-798. Proceedings http://ceur-ws. org ISSN, 1613, 0073. https://ceur-ws.org/Vol-3558/paper9465.pdf Rutger van Koert, Stefan Klut, Tim Koornstra, Martijn Maas en Luke Peters, ‘Loghi: An End-to-End Framework for Making Historical Documents Machine-Readable’, Document Analysis and Recognition – ICDAR 2024 Workshops (Cham: Springer 2024) 73-88. https://link.springer.com/chapter/10.1007/978-3-031-70645-5_6
创建时间:
2025-03-24
二维码
社区交流群
二维码
科研交流群
商业服务