five

Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (dataset)

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5045065
下载链接
链接失效反馈
官方服务:
资源简介:
The dataset contains all the data required to reproduce the experiments done in the paper "Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana", published in the 25th International Conference on Theory and Practice of Digital Libraries (TPDL'21). In that work we run an experiment using the Europeana CH digital library as a use case, and we evaluated the effectiveness of a multilingual information retrieval strategy using machine translations to English as pivot language. We used the CEF translation service (eTranslation) for the translation of queries and content to English (https://ec.europa.eu/cefdigital/wiki/display/CEFDIGITAL/eTranslation). The dataset is also available at https://rnd-2.eanadev.org/share/crosslingual-search/, and it is organized in four main folders: queries: sample of 68 queries and their translations to English. The queries were issued in languages other than English from the Europeana Portal, using the Europeana’s 1914-1918 thematic collection, between January and August 2019. transcriptions: sample of 18,257 handwriting transcriptions  and its translations to English. The transcriptions are taken  from the Europeana 1914-1918 thematic collection, and obtained from the Transcribathon crowdsourcing platform (https://europeana.transcribathon.eu/). solr_configuration: Apache Solr search engine configuration used in the experiments (which replicates the one used in Europeana). results: manual evaluation of the query translations, and automatic evaluation of the multilingual retrieval.
创建时间:
2021-09-10
二维码
社区交流群
二维码
科研交流群
商业服务