five

Likely percentage of the optical character recognised word count of the average issue of the Caledonian Mercury for a given year to be duplicate material, with associated data

收藏
figshare.com2023-06-01 更新2025-03-25 收录
下载链接:
https://figshare.com/articles/dataset/Likely_percentage_of_the_optical_character_recognised_word_count_of_the_average_issue_of_the_Caledonian_Mercury_for_a_given_year_to_be_duplicate_material_with_associated_data/6011630/1
下载链接
链接失效反馈
官方服务:
资源简介:
This file set contains a bar chart (BW_ProbableDuplicateMaterialPercentages.png) representing the likely percentages of duplicated news, advertising, miscellany and commentary, and numerical content in the average issue of The Caledonian Mercury (Edinburgh, Scotland) for a given year, 1825-1835. It also contains a data table containing the OCR-calculated word count for each issue, the minimum duplicate material percentage for each issue, and the extrapolated word counts and percentages for each content type (Data_Wordcounts_CaledonianMercury_1820_1840.tsv). The data set was derived from the British Library 19th Century Newspapers, Part 1 digital collection (http://gale.cengage.co.uk/british-library-newspapers/19th-century-british-library-newspapers-part-i.aspx) using the Scissors-and-Paste Console v.0.4.2 (https://doi.org/10.5281/zenodo.1207283) Further details are available in the included documentation file (readme.docx) and on the websites listed below.

本数据集包含一幅条形图(BW_ProbableDuplicateMaterialPercentages.png),展示了《卡莱多尼亚信使报》(爱丁堡,苏格兰)在1825至1835年间平均期号的新闻、广告、杂项和评论以及数值内容的可能重复百分比。此外,它还包含一个数据表,其中包含了每期通过光学字符识别(OCR)计算出的单词计数、每期最低重复材料百分比以及每个内容类型的推算单词计数和百分比(Data_Wordcounts_CaledonianMercury_1820_1840.tsv)。该数据集源自大英图书馆19世纪报纸第一部分数字收藏(http://gale.cengage.co.uk/british-library-newspapers/19th-century-british-library-newspapers-part-i.aspx),并使用剪刀和粘贴控制台v.0.4.2(https://doi.org/10.5281/zenodo.1207283)生成。更详细的信息可在所附文档文件(readme.docx)和以下列出的网站上找到。
提供机构:
figshare.com
二维码
社区交流群
二维码
科研交流群
商业服务