Spanish text corpus for NLP/linguistics research

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/4319956

下载链接

链接失效反馈

官方服务：

资源简介：

Spanish text-corpus extracted from Wikipedia, using the platform described on Cadavid Rengifo, Héctor Fabio, and Jonatan Gómez Perdomo. "Web text corpus extraction system for linguistic tasks." Ingeniería e Investigación 29.3 (2009): 54-60, and the related master thesis available on ResearchGate. rawdata.dat: raw outcome of the extraction process from Wikipedia. sentences.txt: sentences extracted from the raw data after cleaning/filtering.

创建时间：

2020-12-14

5,000+

优质数据集

54 个

任务类型

进入经典数据集