five

Open Newspapers (LwM) Full Text and Metadata

收藏
DataCite Commons2025-07-29 更新2026-02-08 收录
下载链接:
https://bl.iro.bl.uk/concern/datasets/99dc570a-9460-48ac-baed-9d2b8c4c13c0
下载链接
链接失效反馈
官方服务:
资源简介:
Full text and metadata of 107 newspaper titles selected and digitised by the Living with Machines project and processed by the project's bespoke Alto2Text pipeline. The pipeline took as its input the highly verbose AltoXML files of the same newspaper titles and converted them into the more readable plain text format for the benefit of readers and researchers. Individual datasets for each of these title are also available within the BL Research Repository (so you do not need to download this full dataset): https://bl.iro.bl.uk/catalog?f%5Bkeyword_sim%5D%5B%5D=LwM107

本数据集涵盖由“与机器共生”(Living with Machines)项目遴选并数字化的107种报纸的全文及元数据,且已通过该项目定制的Alto2Text处理流水线(Alto2Text pipeline)完成处理。该流水线以对应报纸的高冗余度AltoXML文件作为输入,将其转换为更具可读性的纯文本格式,以惠及读者与研究人员。上述各单种报纸的独立数据集亦可在大英图书馆(British Library, BL)研究库中获取,因此无需下载此完整数据集:https://bl.iro.bl.uk/catalog?f%5Bkeyword_sim%5D%5B%5D=LwM107
提供机构:
British Library
创建时间:
2025-06-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作