five

Korpus Malti v4.0

收藏
arXiv2022-05-27 更新2024-06-21 收录
下载链接:
https://huggingface.co/datasets/MLRS/korpus_malti
下载链接
链接失效反馈
官方服务:
资源简介:
Korpus Malti v4.0是由马耳他大学创建的一个新的马耳他语文本数据集,包含了约20758071条数据,数据来源于多种在线和离线资源,如新闻、法律文本、演讲和辩论的转录等。该数据集旨在为马耳他语这种低资源语言提供丰富的预训练数据,以支持自然语言处理任务,如依赖解析、词性标注、命名实体识别和情感分析。数据集的创建过程中,特别注意了数据的质量和多样性,以确保其在不同领域的应用效果。

Korpus Malti v4.0 is a novel Maltese text dataset developed by the University of Malta, containing approximately 20,758,071 data entries. It is sourced from a diverse range of online and offline resources, such as news articles, legal texts, transcripts of speeches and debates, and more. This dataset aims to provide abundant pre-training data for Maltese, a low-resource language, to support natural language processing (NLP) tasks including dependency parsing, part-of-speech tagging, named entity recognition (NER), and sentiment analysis. Special attention was paid to data quality and diversity during the creation of the dataset to ensure its effectiveness across various application domains.
提供机构:
马耳他大学
创建时间:
2022-05-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作