Index of Eastern European NLP Resources
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/altsoph/EENLP/blob/main/docs/datasets.md
下载链接
链接失效反馈官方服务:
资源简介:
该数据集详细汇总了现有的东欧语言资源,包含超过90个数据集和45个模型。该索引向社区开放更新,允许为缺失资源做出贡献。规模上,它涵盖了20种不同语言的90多个数据集。涉及的任务包括但不限于文本分类预测、共指消解、假新闻检测、词形还原、形态句法标注、命名实体识别、自然语言推理、冒犯性评论检测、释义检测、词性标注、问答、情感分析以及词义消歧等多种自然语言处理任务。
This dataset comprehensively compiles existing Eastern European language resources, encompassing over 90 datasets and 45 models. This index is open to community-driven updates, allowing contributions of missing resources. In terms of scale, it covers more than 90 datasets spanning 20 distinct languages. The covered natural language processing tasks include, but are not limited to, text classification prediction, coreference resolution, fake news detection, lemmatization, morphosyntactic tagging, named entity recognition (NER), natural language inference (NLI), offensive comment detection, paraphrase detection, part-of-speech (POS) tagging, question answering (QA), sentiment analysis, and word sense disambiguation (WSD).
提供机构:
Community-driven GitHub repository



