Reddit EU language dataset

NIAID Data Ecosystem2026-03-12 收录

下载链接：

https://zenodo.org/record/5346799

下载链接

链接失效反馈

官方服务：

资源简介：

This dataset has been created for a personal project related to the recognition of the original language of someone writing in english. Origin The dataset has been crawled from the subreddit r/europe and contains around 1.5 milions posts in it's raw form. Structure This repo contains both the raw data and the cleaned data, the latter, purged of deleted comments and of those that were not linked to the provenience of the writer, contains around 450k datapoints and has the following structure: body: the text content of the comment country_name: extended name of the country permalink: link to the comment author: username of the creator created_utc: utc creation datetime: date and time of creation alpha2: ISO country alpha2 code alpha3: ISO country alpha3 code numeric: ISO country number apolitical_name: apolitical country name

创建时间：

2021-08-31

5,000+

优质数据集

54 个

任务类型

进入经典数据集