five

The Swedish Sub-corpus of the Newspaper and Periodical Corpus of the National Library of Finland, version 2, Korp

收藏
Mendeley Data2024-04-13 更新2024-06-28 收录
下载链接:
https://etsin.fairdata.fi/dataset/866749ce-6981-469e-9d7b-2a506f163928
下载链接
链接失效反馈
官方服务:
资源简介:
This resource will be available via Korp in Kielipankki – the Language Bank of Finland. The corpus consists of Swedish newspapers and magazines starting from 1771 up to 2021, compiled by the National Library of Finland. For this new version, the data of the previous version (Finnish and Swedish) was checked with the HeLI-OTS language identifier. Parts of texts, which do not contain Swedish, were removed from this corpus. On the other hand, texts from the Finnish part of KLK, which contain Swedish, where added to this corpus. The new version consists of text elements, where at least one sentence element was identified as being in Swedish, from these three sources: - KLK-fi, version 1 (http://urn.fi/urn:nbn:fi:lb-2016050302) - KLK-sv, version 1 (http://urn.fi/urn:nbn:fi:lb-2016050301) - new data from the National Library (not previously available in the Language Bank, may cover any time period, just more recently OCR'd) The text elements are enriched with a 'version_added' attribute, which identifies the source.

本资源将通过芬兰语言银行(Kielipankki)旗下的Korp平台开放获取。该语料库由芬兰国家图书馆编纂,涵盖1771年至2021年的瑞典语报刊与杂志。针对本次发布的新版本,我们使用HeLI-OTS语言识别工具对前一版本(含芬兰语与瑞典语)的数据进行了校验,移除了其中不含瑞典语的文本片段。此外,KLK芬兰语语料部分中包含瑞典语的文本也被纳入本次语料库。新版本的文本元素需满足至少有一个句子片段被识别为瑞典语,其来源包括以下三类: - KLK-fi 版本1(http://urn.fi/urn:nbn:fi:lb-2016050302) - KLK-sv 版本1(http://urn.fi/urn:nbn:fi:lb-2016050301) - 芬兰国家图书馆新增数据(此前未在芬兰语言银行中公开,涵盖任意时间范围,仅为近期经光学字符识别(OCR)处理的内容) 所有文本元素均附加了"version_added"属性,用于标识其来源。
创建时间:
2023-10-10
二维码
社区交流群
二维码
科研交流群
商业服务