five

Estonian National Corpus 2023 (prevert)

收藏
DataCite Commons2026-03-10 更新2024-07-13 收录
下载链接:
https://metashare.ut.ee/repository/browse/estonian-national-corpus-2023-prevert/ec397bb9bae611ee9c10e99c00eb27649a7f673b85724ebfaeb0f267373423c0/
下载链接
链接失效反馈
官方服务:
资源简介:
Estonian corpus of written texts. Consists of the Estonian Reference Corpus (90s–2008), Contemporary and old literature, Estonian Web (2013, 2017, 2019, 2021, 2023), Timestamped Estonian corpora (2014–2021, 2020–2023), Estonian Wikipedia (articles: 2023, talkpages: 2017) and Estonian academic writing (2020–2023). Cleaned, deduplicated. Text type annotation: topics, genres. ENCODING: UTF-8 == Comparison to ENC 2021 corpus Balanced Corpus 1990–2008 ................. kept without changes Reference Corpus 1990–2008 ................ kept without changes Literature Old 1864–1945 .................. updated according to the source Literature Contemporary 2000–2023 ......... updated according to the source (licensed under CLARIN ACA) Web 2013 .................................. kept without changes Web 2017 .................................. kept without changes Wikipedia Talk 2017 ....................... kept without changes Academic Texts (formerly DOAJ) up to 2023 . updated with new data Web 2019 .................................. kept without changes Web 2021 .................................. kept without changes Wikipedia 2023 ............................ replacing Wikipedia 2021 Feeds (JSI) 2014–2021 ..................... kept without changes Feeds (LC) 2020–2023 ...................... updated with new data Web 2023 .................................. new
提供机构:
Center of Estonian Language Resources
创建时间:
2024-03-07
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作