Twitter Italian Negation Corpus: Frequency Lists

NIAID Data Ecosystem2026-05-02 收录

下载链接：

https://zenodo.org/record/5108212

下载链接

链接失效反馈

官方服务：

资源简介：

Negation is one of the most widely discussed language change phenomena in Romance languages, especially French, but innovative forms and non-standard uses have also been observed in Italian, potentially pointing to grammaticalisation processes. The Twitter Italian Negation Corpus (TIN corpus) consists of 10,000 tweets in Italian. The posts were tweeted from users in ten Italian and non-Italian cities: Milan, Rome, Naples, Palermo, Bologna, Turin, Florence, Cagliari, Genua and New York City. The corpus was collected in August 2019 using a web scraping data collection method. In order to counter the problem of the rarity of non-standard negations, program-based Twitter queries were used and gradually narrowed down to contexts in which verbs and negative particles frequently occur. From this perspective, Twitter proves to be a medium of spontaneous speech influential to informal communication situations. The most common verbs were vulgar lexemes (fotte < fottersene ‘shit on something’, frega < fregarsene ‘don't care’), but also neutral verbs (interessa < interessarsi ‘concern somebody’ and importa < importare (intransitive) ‘mean something to someone’). Among the nouns used as complements of these verbs, the former taboo word un cazzo ‘penis’ ranked first by far. Deviations from the standard such as the omission of the preverbal non ‘not’ and use of cazzo as a bare noun show that Twitter not only mirrors these rare variants, but also contributes to their establishment. Drawing from the results of the following frequency lists, this process can be understood as grammaticalization, which develops in parallel to a pragmaticalization process. The ten frequency list files are in csv format.

创建时间：

2025-01-27

5,000+

优质数据集

54 个

任务类型

进入经典数据集