Enhanced Latin Lemma Dataset

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/records/13838471

下载链接

链接失效反馈

官方服务：

资源简介：

Overview The Latin Lexicon Dataset contains information about Latin words collected through webscraping from Wiktionary. The dataset includes various linguistic features such as part of speech, lemma, aspect, tense, verb form, voice, mood, number, person, case, and gender. Additionally, it provides source URLs and links to the Wiktionary pages for further reference. The dataset aims to contribute to linguistic research and analysis of Latin language elements. Versions of the Dataset This dataset is available in three versions, each offering varying levels of refinement: wiki_latin_data_v1.csv(v1): The initial raw version, containing all webscraped data without extensive cleaning or filtering. wiki_latin_data_v2.csv(v2): A more processed version, where some inconsistencies and duplicates were removed, and linguistic features were better aligned. wiki_latin_data_v3.csv (v3): The most refined version, offering a clean, well-organized dataset with comprehensive linguistic features and translation equivalents with minimal errors. This version is recommended for most use cases. Data Source: Webscraped from Wiktionary Produced by: Python-based web scraping algorithms

创建时间：

2024-09-25

5,000+

优质数据集

54 个

任务类型

进入经典数据集