MIGR-TWIT Corpus. Migration Tweets of right and far-right politics in Europe
收藏NIAID Data Ecosystem2026-03-14 收录
下载链接:
https://zenodo.org/record/7257708
下载链接
链接失效反馈官方服务:
资源简介:
Description
The MIGR-TWIT Corpus is a multilingual corpus of tweets about the topic of migration in Europe. Within the framework of the collaborative research project OLiNDiNUM (Observatoire LINguistique du DIscours NUMérique, Linguistic Observatory of Online Debate) the MIGR-TWIT Corpus is created with the aim of developing language databases of online debate. Considering the global issue of migration in line with British and French political contexts of last dozen years from 2011 to 2022, the corpus consists of two sub-corpora:
FR-R-MIGR-TWIT-2011-2022 Corpus for French language data (1 January 2011 - 30 June 2022) and
UK-R-MIGR-RA-TWIT-2012-2022 Corpus for English language data (1 January 2012 - 5 September 2022)
Using the Twitter API v2 Academic Research, tweets containing at least one occurrence of migration or refugee related words are retrieved automatically from 28 right and far-right political figures and parties. The whole corpus contains 18,233 tweets and 533,198 words.
Scientific reference: Pietrandrea, P., Battaglia, E. (2022). “Migrants and the EU”. The diachronic construction of ad hoc categories in French far-right discourse. Journal of Pragmatics 192, 139-157.
Contents
The whole corpus contains two CSV Zip files (tab-delimited format) corresponding to each sub-corpus. The complete corpus is presented in two versions, one version with the tweet identifier (data__id) and the text of the tweet (data__text) as a header (folders named FR-R-MIGR-TWIT-2011-2022_textonly and UK-R-MIGR-RA-TWIT-2012-2022_textonly, respectively composed of 12 and 11 Zip files of every single year), and the other version with all tweet fields information included as a header, such as the posting date (data__created__at), the username (author__name), the number of retweets (data__public_metrics__retweet_count), etc., with two folders named FR-R-MIGR-TWIT-2011-2022_meta and UK-R-MIGR-RA-TWIT-2012-2022_meta. Detailed information for each sub-corpus is illustrated below.
1. FR-R-MIGR-TWIT-2011-2022
Created at: 2022-08-08
Language: FR
Coverage: 16 user accounts; 11,761 tweets; 358,491 words
Time of data collection: start=2011-01-01; end=2022-06-30
Keywords: words derived from a latin root “migr” of migrare
Corpus composition:
FR-R-MIGR-TWIT Corpus Composition
Political figure/party
Username
Tweets
Year concerned
1
Michel Barnier
@MichelBarnier
31
2017-22
2
Valérie Pécresse
@vpecresse
81
2017-22
3
Rassemblement National
@RNational_off
3,347
2017-22
4
Nicolas Dupont-aignan
@dupontaignan
663
2011-22
5
Éric Ciotti
@ECiotti
1,007
2012-22
6
Christian Estrosi
@cestrosi
137
2011-22
7
Marine Le Pen
@MLP_officiel
1,650
2011-22
8
Valérie Boyer
@valerieboyer13
837
2012-22
9
Florian Philippot
@f_philippot
485
2012-22
10
Xavier Bertrand
@xavierbertrand
70
2017-22
11
Marion Maréchal
@MarionMarechal
479
2012-17,19-22
12
Philippe Meunier
@Meunier_Ph
245
2013-22
13
Jordan Bardella
@J_Bardella
1,095
2013-22
14
Nicolas Bay
@NicolasBay_
1,260
2017-22
15
Emmanuel Macron
@EmmanuelMacron
72
2017-22
16
Éric Zemmour
@ZemmourEric
302
2019-22
17
Jean Messiha*
Banned from Twitter (since July 2021)
Political figures and parties of table above are listed in chronological order according to the dates on which they posted their first tweet.
*Before the launching of Twitter API v2 Academic Research, migr-tweets were collected from the database of Europresse.com including 1,453 tweets of Jean Messiha as part of the reference study (Pietrandrea & Battaglia 2022). However, the Twitter account in question has been permanently banned since July 2021. For our data collection using the Twitter API started in September 2021, we could not access this account. Therefore, we decided not to include his tweets in the FR-R-MIGR-TWIT-2011-2022 for the sake of consistency with the rest of twitter data that are automatically retrieved.
The sub-corpus FR-R-MIGR-TWIT-2017-2022 is developed, annotated and analyzed as part of a doctoral thesis in progress (Jeon, S.) with the aim of studying the semantic construction of migr-lexicon over the 5-year-period between two recent French Presidential Elections.
2. UK-R-MIGR-RA-TWIT-2012-2022
Created at: 2022-09-06
Language: EN
Coverage: 12 user accounts; 6,472 tweets; 174,707 words
Time of data collection: start=2012-01-01; end=2022-09-05
Keywords: words derived from a latin root “migr” of migrare in addition to the keywords “refugee(s)” and “asylum”.
Corpus composition:
UK-R-MIGR-RA-TWIT Corpus Composition
Political figure/party
Username
Tweets
Year concerned
1
David Cameron
@David_Cameron
32
2012-22
2
Amber Rudd
@AmberRuddUK
29
2012-22
3
Sajid Javid
@sajidjavid
84
2012-22
4
Boris johnson
@BorisJohnson
80
2015-22
5
Priti Patel
@pritipatel
304
2012-22
6
UK Home Office
@ukhomeoffice
909
2012-22
7
Nigel Farage
@Nigel_Farage
1,010
2012-22
8
Richard Tice
@TiceRichard
180
2013-22
9
UKIP
@UKIP
2,746
2012-22
10
Neil Hamilton
@NeilUKIP
252
2013-22
11
Nick Griffin
@NickGriffinBU
542
2012-22
12
Robin Tilbrook
@RobinTilbrook
304
2012-22
2 out of 12 accounts are official accounts belonging to the” UK Home Office” department and the “UKIP” (United Kingdom Independence Party) party. 10 out of 12 accounts are political figures’ accounts.
The corpus UK-R-MIGR-RA-TWIT-2012-2022 will be exploited for the following master’s thesis: Guido Blandino, 10 years of public debate on immigration: combining topic modeling and corpus linguistics to examine the British (far-)right discourse on Twitter, MA University of Wolverhampton (2023)
创建时间:
2022-11-25



