FR-MIGR-TWIT Corpus 2.0

Name: FR-MIGR-TWIT Corpus 2.0
Creator: ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr
Published: 2026-02-11 12:34:27
License: 暂无描述

DataCite Commons2026-02-11 更新2026-05-04 收录

下载链接：

https://www.ortolang.fr/market/item/fr-migr-twit-corpus-20/v1

下载链接

链接失效反馈

官方服务：

资源简介：

The MIGR-TWIT Corpus is a multilingual corpus of tweets about the topic of migration in Europe, developed within the framework of the OLiNDiNUM (Observatoire LINguistique du DIscours NUMérique, Linguistic Observatory of Online Debate), with the aim of documenting and analyzing online public discourse on (im)migration in contemporary European politics. Considering the global issue of migration over the last decade (2011–2022), and in order to observe discursive evolution accros the political spectrum and in two national contexts (France and the UK), the MIGR-TWIT Corpus was published: Tweets of right and far-right politics in Europe (Battaglia, Blandino, Jeon amp; Pietrandrea, 2022)UK-R-MIGR-RA-TWIT-2012-2022FR-R-MIGR-TWIT-2011-2022 Tweets of French left-wing politics (Pietrandrea amp; Jeon, 2023)FR-L-MIGR-TWIT-2011-2022The FR-MIGR-TWIT Corpus 1.0, compiled from the FR-R and FR-L modules, comprises 17,395 tweets posted by 39 French political figures and parties (16 right-wing and 23 left-wing) between 2011 and 2022. Tweets containing migr- derivatives were retrieved via the Twitter API v2 Academic Research, and truncated retweets (gt;140 characters) were restored through targeted verification (for detailed information on each module see the links above).This second version provides a multilayer linguistic annotation of all occurrences of terms derived from the Latin root migr-. The FR-MIGR-TWIT Corpus 2.0 offers:multilayer linguistic annotations associated with each occurrence of a migr- derivative (MIGR-LEXICON), including semantic roles (ROLE_SEM), syntactic functions (FUNC_SYN), lemmatised forms (LEMMA), as well as features and collocational items related to modification (MODIFICATION, LEMMA_MODIF_*, LEMMA_NOUN-1) and list/parallelism constructions (LIST_PAR, LENGTH-1, #forme#_MIGR-LIST_PAR) (Non-exhaustive list);tweet URLs (tweet_url) and 44 types of data retrieved through the Full Archive Search endpoints of the Twitter API v2, such as the textual content of tweets (data__text), posting date (data__created_at), user ID (data__author_id), number of retweets (data__public_metrics__retweet_count) likes (data__public_metrics__like_count), replies (data__public_metrics__reply_count), quotes (data__public_metrics__quote_count). (Non-exhaustive list)The corpus is available in CSV, XML, and TEI-XML formats. The CSV and XML files provide stand-off linguistic annotations and metadata. The TEI-XML files encode the canonical textual layer.Changelogversion 2.0 (© 2025 Jeon amp; Pietrandrea)– Added multilayer linguistic annotations– Corrected delimiter-related errors– Added TEI-XML format– Added a basic Python query script– Added README.mdThe FR-MIGR-TWIT Corpus 1.0 (© 2025 Jeon, Battaglia amp; Pietrandrea) was previously released via Zenodo.

提供机构：

ORTOLANG (Open Resources and TOols for LANGuage) - www.ortolang.fr

创建时间：

2026-02-11

5,000+

优质数据集

54 个

任务类型

进入经典数据集