five

Annotated Dataset of Desinence Deletion in Contemporary Written Fiuman

收藏
DataCite Commons2026-05-06 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.20056878
下载链接
链接失效反馈
官方服务:
资源简介:
Annotated Dataset of Desinence Deletion in Contemporary Written Fiuman   This dataset contains manually annotated lexical data from two samples of contemporary written Fiuman, an endangered Colonial Venetian variety spoken in Rijeka/Fiume, Croatia, and in the Fiuman diaspora. The dataset accompanies a study of inter-variety mappings between standard Italian and Fiuman. The goal was to establish how Fiuman integrates standard Italian nouns and adjectives, and more specifically to determine the conditions under which the singular desinences -e and -o are deleted or retained. For example, standard Italian situazione may correspond to Fiuman situazion, with deletion of final -e; standard Italian fine may correspond to Fiuman fine, with retention of final -e; standard Italian cittadino may correspond to Fiuman citadin, with deletion of final -o; and standard Italian treno may correspond to Fiuman treno, with retention of final -o. The data come from two recent sources of written Fiuman.  The first source is El mondo secondo mi, a column by Stéphanie Peloso published on Fiuman.hr. Fiuman.hr presents it as a column written in Fiuman by Stéphanie Peloso. Source page: https://www.fiuman.hr/author/stephanie-peloso/ The second source is La scartaza, a column by Laura Marchig published on Rijeka Danas. Rijeka Danas describes La scartaza as a column in Fiuman aimed at protecting and promoting Rijeka’s autochthonous variety. Source page: https://rijekadanas.hr/?page_id=119 The texts were collected online in November 2025 from the websites on which they were published. The last issue included from La scartaza is dated September 20, 2025. The material was imported into #LancsBox X (Brezina and Platt 2025) and searched using regular expressions targeting candidate forms with zero or overt singular desinences. The accompanying study shows  that desinence deletion is restricted to words with three stem-final consonants n, r and l. The searches were organised by stem-final consonant: n-final candidates: [word = ".*(n|no|ne)"] r-final candidates: [word = ".*(r|ro|re)"] l-final candidates: [word = ".*(l|lo|le)"] The resulting candidates were manually annotated. All items belonging to the relevant lexical class were annotated for stem-final consonants. During annotation, a prosodic criterion was considered: desinence deletion is possible only where the stem ends in a single consonant, the stem-final consonant is n, r, or l, and the final vowel of the stem is stressed. In addition, extremely rare items that already have a final consonant in standard Italian, such as Macron, were not included, since they could not show desinence deletion. The dataset does not reproduce the full source texts. It contains token-level KWIC concordance lines and manual annotations used for the linguistic analysis.   The database contains six tabs, splitting the data by stem-final consonant and source: ElMondoN: Candidates for n-final stems extracted from El mondo secondo mi. LaScartazaN: Candidates for n-final stems extracted from La scartaza. ElMondoR: Candidates for r-final stems extracted from El mondo secondo mi. LaScartazaR: Candidates for r-final stems extracted from La scartaza. ElMondoL: Candidates for l-final stems extracted from El mondo secondo mi. LaScartazaL: Candidates for l-final stems extracted from La scartaza. In this dataset, candidate means a token returned by the regular-expression search and subsequently considered during manual annotation. All six tabs have the same first 10 columns: document: Source text or filename from which the token was extracted, e.g. An englishman in Fiume.txt. right_context: Text immediately to the right of the keyword. KWIC: Keyword in context; the extracted candidate token, e.g. situazion, pien, fine. left_context: Text immediately to the left of the keyword. stem-final consonant: Stem-final consonant targeted in the search. Values: n, r, l. lemma_FI: Annotated Fiuman lemma, e.g. situazion, pien, fine. Importantly, versions with and without desinence deletion were annotated as separate lemmas; for instance, union and unione, or comun and comune, are treated as distinct lemmas when both forms are attested. desinence_FI: Singular desinence of the Fiuman form. Values: zero, e, o. The value zero indicates absence of an overt desinence, as in situazion; e indicates final -e, as in fine; o indicates final -o, as in treno. SI_cognate_present: Whether a direct standard Italian cognate was identified. Values: 1 = yes, 0 = no. A direct cognate was annotated only if the Fiuman form could be related to the standard Italian form through correspondences active in the current mapping, namely desinence deletion and the simplification of geminate consonants. For this reason, citadin ‘citizen’ was counted as having the standard Italian cognate cittadino, whereas zitadin was not, since the initial c / z correspondence is not part of the current mapping under investigation. SI_cognate: Standard Italian cognate, where present, e.g. situazione for Fiuman situazion, pieno for Fiuman pien, fine for Fiuman fine. Blank cells indicate that no direct standard Italian cognate was annotated. SI_cognate_desinence: Singular desinence of the standard Italian cognate. Values: e, o, or blank where no direct standard Italian cognate was annotated, e.g. e for situazione and fine, o for pieno and treno. variation: Indicates whether the lemma shows variation between a form with desinence deletion and a form with desinence retention within the same dataset. This column was annotated only for items with a direct standard Italian cognate. Values: 1 = yes, 0 = no. This column allows users to retrieve all tokens belonging to lemmas with attested variation, such as union/unione or comun/comune. In addition, each consonant-specific pair of tabs contains further binary annotation columns for morphological classes that were relevant to the analysis of desinence deletion. These columns indicate whether the relevant (version of the) morpheme is present in the annotated item, regardless of whether desinence deletion actually applies. Values are 1 = yes and 0 = no. For ElMondoN and LaScartazaN, the additional columns are: ion(e)_fem: Indicates feminine nouns containing the relevant -ion(e) morpheme, e.g. situazione/situazion, visione/vision. an(o)_adj: Indicates forms containing the relevant -an(o) morpheme, e.g. umano/uman, vittoriano/vitorian. in(o)_dim: Indicates forms containing the relevant -in(o) diminutive morpheme, e.g. spettacolino/spetacolin, regalino/regalin. on(e)_masc: Indicates masculine nouns containing the relevant -on(e) morpheme, e.g. librone/libron. For ElMondoR and LaScartazaR, the additional columns are: ar(e)_adj: Indicates adjectives containing the relevant -ar(e) morpheme, e.g. particolare/particolar, nucleare, militare/militar. or(e)_agent: Indicates agent nouns containing the relevant -or(e) morpheme, e.g. violentatore/violentator, lavoratore/lavorator.   For ElMondoL and LaScartazaL, the additional columns are: il(e)_adj: Indicates adjectives containing the relevant -il(e) morpheme, e.g. femminile/feminile/feminil. al(e)_adj: Indicates adjectives containing the relevant -al(e) morpheme, e.g. medievale/medieval, virtuale, uguale/ugual, ufficiale/uficial, spazio-temporale/spazio-temporal.
提供机构:
Zenodo
创建时间:
2026-05-06
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作