CLEF and FLLex: resources for Latin to French computerized forward reconstruction
收藏NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/14977375
下载链接
链接失效反馈官方服务:
资源简介:
This repository is a resource for computerized forward reconstruction (CFR) from Latin to French, and pairs the lexica FLLex and FLLAPS with three cascades: the baseline cascades BaseCLEF and BaseCLEFstar, and the "debugged" cascade DiaCLEF.
A cascade here refers to an ordered set of sound changes that can be operated upon a lexicon by a CFR system to produce predicted regular outcomes. BaseCLEF is based on Mildred K. Pope's "From Latin to Modern French with Especial Consideration of Anglo-Norman: Phonology and Morphology", and BaseCLEFstar is version thereof with "non-interesting errors" fixed (more info on these can be found in the README in this repository and in relevant publications). DiaCLEF is a "debugged" version of BaseCLEFstar, which was fixed using the CFR and diagnostic/debugging system DiaSim (GitHub repository linked). Further details on DiaSim, the debugging process, and these resources can be found in Marr and Mortensen 2020 and Marr and Mortensen 2023. Fixes involved in this debugging process of French relative chronology empowered by CFR both independently reproduced fixes made in the literature (thus mutually corroborating past work and the CFR-debugging method), and led to new discoveries, such as the initial velar voicing development detailed in Marr 2024.
Formatting for lexica files can be found on the respective DiaSim wiki page (linked). FLLex includes pairs of Latin forms and their French reflexes, delimited by a comma, with each phone in each delimited by spaces, and a comment flagged by '$' at the end containing lexical info. FLLAPS includes Latin words and their French reflexes, as well as their forms at four intermediate stages. Most of the words in each were drawn from Pope 1934; the remainder, added to exemplify certain unrepresented or underrepresented phonetic sequences, were drawn from Alain Rey's 2013 etymological dictionary of French.
Statistics for performance of each cascade on each lexicon (for the versions of them included in this repository) can be found toward the bottom of the README file included in this repository. This dataset is released so others can make use of it, but the relevant work in French diachrony with CFR is still an ongoing and developing project.
创建时间:
2025-03-06



