five

Multilingual MigrationsKB: A Mulitlingual Knowledge Base of Migration related annotated Tweets

收藏
NIAID Data Ecosystem2026-03-13 收录
下载链接:
https://zenodo.org/record/5918507
下载链接
链接失效反馈
官方服务:
资源简介:
Multilingual MigrationskB (MGKB) is a mulitlingual extended version of English MGKB. The tweets geotagged with Geo location from 32 European Countries (Austria, Belgium, Bulgaria, Croatia, Cyprus, Czech, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Ireland, Italy, Latvia, Lithuania, Luxembourg, Malta, Netherlands, Poland, Portugal, Romania, Slovakia, Slovenia, Spain, Sweden, Iceland, Liechtenstein, Norway, Switzerland, the United Kingdom)  are extracted and filtered by 11 languages (English, French, Finnish, German, Greek, Dutch, Hungarian, Italian, Polish, Spain, Swedish). Metadata information about the tweets, such as Geo information (place name, coordinates, country code) are included. MGKB  contains sentiments, offensive and hate speeches, topics, hashtags, user mentions in RDF format. The schema of MGKB is an extension of TweetsKB for migration related information. Moreover, to associate and represent the potential economic and social factors driving the migration flows, the data from Eurostat  and FIBO ontology was used. To represent multilinguality, the CIDOC Conceptual Reference Model (CIDOC-CRM) is used. The extracted economic indicators, i.e., GDP Growth Rate, Total Unemployment Rate, Youth Unemployment Rate, Long-term Unemployment Rate and Income per househould, are connected with each tweet in RDF using geographical and temporal dimensions.  For this version, the Multilingual MGKB is delivered separated by year. The extracted topic words are also published. Code: https://github.com/migrationsKB/MRL Please contact Yiyi Chen (yiyi.chen@partner.kit.edu) for pretrained models (Sentiment analysis/hate speech detection/ETM) if necessary.

多语言迁移知识库(Multilingual MigrationsKB, MGKB)是英文MGKB的多语言扩展版本。研究人员从涵盖奥地利、比利时、保加利亚、克罗地亚、塞浦路斯、捷克、丹麦、爱沙尼亚、芬兰、法国、德国、希腊、匈牙利、爱尔兰、意大利、拉脱维亚、立陶宛、卢森堡、马耳他、荷兰、波兰、葡萄牙、罗马尼亚、斯洛伐克、斯洛文尼亚、西班牙、瑞典、冰岛、列支敦士登、挪威、瑞士、英国在内的32个欧洲国家中,提取带有地理标签的推文,并通过11种语言(英语、法语、芬兰语、德语、希腊语、荷兰语、匈牙利语、意大利语、波兰语、西班牙语、瑞典语)完成筛选过滤。数据集包含推文的元数据信息,例如地理信息(地名、坐标、国家代码)等。MGKB以资源描述框架(Resource Description Framework, RDF)格式存储情感分类、攻击性言论、仇恨言论、主题、话题标签以及用户提及内容。其本体模式是针对迁移相关信息对推文知识库(TweetsKB)的扩展。 为了关联并表征驱动迁移流动的潜在经济与社会因素,本数据集采用了欧盟统计局(Eurostat)以及金融行业商务本体(Financial Industry Business Ontology, FIBO)的相关数据;为了实现多语言表征,采用了CIDOC概念参考模型(CIDOC Conceptual Reference Model, CIDOC-CRM)。研究人员将提取得到的经济指标——国内生产总值增长率、总失业率、青年失业率、长期失业率以及家庭人均收入,通过地理与时间维度,以RDF格式与每条推文完成关联。 本版本的多语言MGKB按年份拆分发布,同时公开了提取得到的主题词。 代码仓库:https://github.com/migrationsKB/MRL 如有需要,可联系陈依依(yiyi.chen@partner.kit.edu)获取预训练模型(涵盖情感分析、仇恨言论检测以及主题模型ETM)。
创建时间:
2022-01-30
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作