qg2020252627/twitter_author_profiling_by_gender_nlp
收藏Hugging Face2026-03-09 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/qg2020252627/twitter_author_profiling_by_gender_nlp
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-classification
language:
- en
tags:
- NLP
configs:
- config_name: default
data_files:
- split: train
path: "one_tweet_dataset_train.csv"
- split: validation
path: "one_tweet_dataset_val.csv"
- split: test
path: "one_tweet_dataset_test.csv"
features:
- name: tweet_id
dtype: string
- name: gender_label
dtype:
class_label:
names:
- M
- F
- name: text
dtype: string
---
This dataset was created for a student's Bc work.
The main purpose for which the dataset was created is to use it in author profiling by gender.
# Single-Tweet-Per-Author Twitter Dataset
## Overview
This dataset consists of Twitter (X) posts with a strict constraint: **each author appears exactly once**.
There is a one-to-one correspondence between tweets and authors.
This design removes author-level accumulation effects and prevents models from exploiting repeated stylistic or behavioral signals from the same individual.
## Key Property
- **1 tweet = 1 unique author**
- No `author_id` is repeated
- Number of tweets equals number of authors
## Intended Use
The dataset is intended for:
- Text classification
- Sentiment analysis
- Topic classification
- Bias and fairness analysis
- Modeling tasks requiring independent textual observations
It is explicitly designed to avoid author leakage.
## Not Intended Use
The dataset should not be used for:
- Author identification or profiling
- Longitudinal analysis
- User behavior modeling
- Style consistency analysis
## Dataset Structure
Each record represents a single tweet from a single author.
### Example Record
```json
{
"tweet_id": "1234567890",
"gender": "M|F",
"text": "Example tweet text",
}
提供机构:
qg2020252627



