GLAM-Workbench/trove-newspapers-non-english
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/12697210
下载链接
链接失效反馈官方服务:
资源简介:
Current version: v1.1
This dataset contains information about newspapers published in languages other than English that have been digitised and made available through Trove. Data about the languages present in newspapers was generated by harvesting a sample of articles from each newspaper using the Trove API, and then using language detection software on the OCRd text of each article. The method is documented in this notebook in the GLAM Workbench.
There are two files:
newspapers_non_english.csv – list of the main languages detected for each newspaper with non-English language content
non-english-newspapers.md – a markdown formatted list of all the newspapers with non-English language content
newspapers_non_english.csv
The dataset contains the following columns:
Column Contents id newspaper id title newspaper title language language code proportion proportion of articles in this language number number of articles sampled language_full full language name
non-english-newspapers.md
This is a markdown-formatted list created by grouping the dataset by newspaper title. It includes details of the main languages in each newspaper.
创建时间:
2024-09-14



