qg2020252627/twitter_author_profiling_by_gender_nlp

Name: qg2020252627/twitter_author_profiling_by_gender_nlp
Creator: qg2020252627
Published: 2026-03-09 11:52:58
License: 暂无描述

Hugging Face2026-03-09 更新2026-03-29 收录

下载链接：

https://hf-mirror.com/datasets/qg2020252627/twitter_author_profiling_by_gender_nlp

下载链接

链接失效反馈

官方服务：

资源简介：

--- license: apache-2.0 task_categories: - text-classification language: - en tags: - NLP configs: - config_name: default data_files: - split: train path: "one_tweet_dataset_train.csv" - split: validation path: "one_tweet_dataset_val.csv" - split: test path: "one_tweet_dataset_test.csv" features: - name: tweet_id dtype: string - name: gender_label dtype: class_label: names: - M - F - name: text dtype: string --- This dataset was created for a student's Bc work. The main purpose for which the dataset was created is to use it in author profiling by gender. # Single-Tweet-Per-Author Twitter Dataset ## Overview This dataset consists of Twitter (X) posts with a strict constraint: **each author appears exactly once**. There is a one-to-one correspondence between tweets and authors. This design removes author-level accumulation effects and prevents models from exploiting repeated stylistic or behavioral signals from the same individual. ## Key Property - **1 tweet = 1 unique author** - No `author_id` is repeated - Number of tweets equals number of authors ## Intended Use The dataset is intended for: - Text classification - Sentiment analysis - Topic classification - Bias and fairness analysis - Modeling tasks requiring independent textual observations It is explicitly designed to avoid author leakage. ## Not Intended Use The dataset should not be used for: - Author identification or profiling - Longitudinal analysis - User behavior modeling - Style consistency analysis ## Dataset Structure Each record represents a single tweet from a single author. ### Example Record ```json { "tweet_id": "1234567890", "gender": "M|F", "text": "Example tweet text", }

提供机构：

qg2020252627

5,000+

优质数据集

54 个

任务类型

进入经典数据集