five

Enhanced Latin Lemma Dataset

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/records/13838471
下载链接
链接失效反馈
官方服务:
资源简介:
Overview The Latin Lexicon Dataset contains information about Latin words collected through webscraping from Wiktionary. The dataset includes various linguistic features such as part of speech, lemma, aspect, tense, verb form, voice, mood, number, person, case, and gender. Additionally, it provides source URLs and links to the Wiktionary pages for further reference. The dataset aims to contribute to linguistic research and analysis of Latin language elements. Versions of the Dataset This dataset is available in three versions, each offering varying levels of refinement: wiki_latin_data_v1.csv(v1): The initial raw version, containing all webscraped data without extensive cleaning or filtering. wiki_latin_data_v2.csv(v2): A more processed version, where some inconsistencies and duplicates were removed, and linguistic features were better aligned. wiki_latin_data_v3.csv (v3): The most refined version, offering a clean, well-organized dataset with comprehensive linguistic features and translation equivalents with minimal errors. This version is recommended for most use cases. Data Source: Webscraped from Wiktionary Produced by: Python-based web scraping algorithms
创建时间:
2024-09-25
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作