Gender Classification Using Twitter Data
收藏Mendeley Data2024-03-27 更新2024-06-26 收录
下载链接:
https://data.mendeley.com/datasets/6x9srbfp6w
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is an expansion of the Twitter User Gender Classification dataset, which is freely available on Kaggle. The aim of this data for research is to predict user gender based on textual data available on Twitter. The original dataset contained 12,894 distinct male and female twitter users with one tweet each. This was significantly expanded to 269,108 tweets by the same 12,894 users where each user had multiple tweets. Expansion method was using Tweepy to access the Twitter API. The uploaded files contains the Train and Test split used for the experiment. It contains the following: user_id - a unique id for each user gender - male or female gender:confidence - a float representing confidence in the provided gender (1 for 100%) created_at - date and time when the tweet was created tweet_id - the unique id of the text of a random tweet by the users Attached also is a simple script on Jupyter Notebook using Tweepy. This is built to retrieve a tweet’s complete information using its ID which is known as the hydration of a tweet ID. Some sample tweet id's are already in the script for testing purposes.
创建时间:
2024-01-23
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集是Twitter用户性别分类的扩展版本,包含12,894名用户的269,108条推文,用于基于文本内容预测用户性别。数据集提供了用户ID、性别、推文时间等关键字段,并附有推文ID提取脚本。
以上内容由遇见数据集搜集并总结生成



