Enhanced Latin Lemma Dataset
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/records/13838471
下载链接
链接失效反馈官方服务:
资源简介:
Overview
The Latin Lexicon Dataset contains information about Latin words collected through webscraping from Wiktionary. The dataset includes various linguistic features such as part of speech, lemma, aspect, tense, verb form, voice, mood, number, person, case, and gender. Additionally, it provides source URLs and links to the Wiktionary pages for further reference. The dataset aims to contribute to linguistic research and analysis of Latin language elements.
Versions of the Dataset
This dataset is available in three versions, each offering varying levels of refinement:
wiki_latin_data_v1.csv(v1): The initial raw version, containing all webscraped data without extensive cleaning or filtering.
wiki_latin_data_v2.csv(v2): A more processed version, where some inconsistencies and duplicates were removed, and linguistic features were better aligned.
wiki_latin_data_v3.csv (v3): The most refined version, offering a clean, well-organized dataset with comprehensive linguistic features and translation equivalents with minimal errors. This version is recommended for most use cases.
Data Source:
Webscraped from Wiktionary
Produced by:
Python-based web scraping algorithms
创建时间:
2024-09-25



