five

Reddit EU language dataset

收藏
NIAID Data Ecosystem2026-03-12 收录
下载链接:
https://zenodo.org/record/5346799
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset has been created for a personal project related to the recognition of the original language of someone writing in english. Origin The dataset has been crawled from the subreddit r/europe and contains around 1.5 milions posts in it's raw form. Structure This repo contains both the raw data and the cleaned data, the latter, purged of deleted comments and of those that were not linked to the provenience of the writer, contains around 450k datapoints and has the following structure: body: the text content of the comment country_name: extended name of the country permalink: link to the comment author: username of the creator created_utc: utc creation datetime: date and time of creation alpha2: ISO country alpha2 code alpha3: ISO country alpha3 code numeric: ISO country number apolitical_name: apolitical country name
创建时间:
2021-08-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作