proxectonos/corpus_dominio_cientifico
收藏Hugging Face2026-04-22 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/proxectonos/corpus_dominio_cientifico
下载链接
链接失效反馈官方服务:
资源简介:
科学和学术领域语料库是一个包含加利西亚语(gl)和西班牙语(es)学术和百科全书文本的数据集,这些文本来自机构性和开放性资源,如大学出版物和维基百科文章。该数据集旨在为自然语言处理任务,特别是语言建模,提供有用的资源。语料库包括来自圣地亚哥德孔波斯特拉大学(USC)出版服务的文章和出版物,以及维基百科的科学领域文章。文本以JSONL格式存储,按语言和来源组织。数据集还处于扩展过程中,未来将包含更多科学出版物。
The Corpus de dominio científico y académico is a dataset comprising academic and encyclopedic texts in Galician (gl) and Spanish (es), sourced from institutional and open resources such as university publications and Wikipedia articles. It is designed to provide a useful resource for natural language processing tasks, particularly language modeling. The corpus includes articles and publications from the Servizo de Publicacións da Universidade de Santiago de Compostela (USC) and scientific articles from Wikipedia. Texts are stored in JSONL format, organized by language and source. The dataset is also in the process of expansion, with future updates planned to include additional scientific publications.
提供机构:
proxectonos



