five

qg2020252627/twitter_author_profiling_by_gender_nlp

收藏
Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/qg2020252627/twitter_author_profiling_by_gender_nlp
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-classification language: - en tags: - NLP configs: - config_name: default data_files: - split: train path: "one_tweet_dataset_train.csv" - split: validation path: "one_tweet_dataset_val.csv" - split: test path: "one_tweet_dataset_test.csv" features: - name: tweet_id dtype: string - name: gender_label dtype: class_label: names: - M - F - name: text dtype: string --- This dataset was created for a student's Bc work. The main purpose for which the dataset was created is to use it in author profiling by gender. # Single-Tweet-Per-Author Twitter Dataset ## Overview This dataset consists of Twitter (X) posts with a strict constraint: **each author appears exactly once**. There is a one-to-one correspondence between tweets and authors. This design removes author-level accumulation effects and prevents models from exploiting repeated stylistic or behavioral signals from the same individual. ## Key Property - **1 tweet = 1 unique author** - No `author_id` is repeated - Number of tweets equals number of authors ## Intended Use The dataset is intended for: - Text classification - Sentiment analysis - Topic classification - Bias and fairness analysis - Modeling tasks requiring independent textual observations It is explicitly designed to avoid author leakage. ## Not Intended Use The dataset should not be used for: - Author identification or profiling - Longitudinal analysis - User behavior modeling - Style consistency analysis ## Dataset Structure Each record represents a single tweet from a single author. ### Example Record ```json { "tweet_id": "1234567890", "gender": "M|F", "text": "Example tweet text", }
提供机构:
qg2020252627
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作