OpenCorpora: Russian OpenCorpora:俄语
收藏阿里云天池2026-05-26 更新2024-03-07 收录
下载链接:
https://tianchi.aliyun.com/dataset/89935
下载链接
链接失效反馈官方服务:
资源简介:
大约有1.5亿人以俄语为母语,1.1亿人以俄语为非母语。俄语是用西里尔文字书写的。该数据集是一个俄语文本的形态学、句法和语义注释的语料库,研究人员完全可访问并由用户编辑。该数据集用UTF-8编码。这个数据集包含两个文件:语料库和字典。语料库是.json格式的,而字典是纯文本的。
Approximately 150 million people speak Russian as their native language, while 110 million use it as a non-native language. Russian is written using the Cyrillic script. This dataset is a corpus of Russian texts annotated for morphology, syntax and semantics, which is fully accessible to researchers and user-editable. The dataset is encoded in UTF-8. The dataset contains two files: the corpus and the dictionary. The corpus is in .json format, while the dictionary is in plain text.
提供机构:
阿里云天池
创建时间:
2021-02-02
搜集汇总
数据集介绍

背景与挑战
背景概述
OpenCorpora是一个包含150万单词的俄语标注语料库,提供形态学、句法和语义注释,包含json格式的语料和纯文本字典,采用CC-BY-SA许可协议。
以上内容由遇见数据集搜集并总结生成



