five

soda-lmu/tweet-annotation-sensitivity-1

收藏
Hugging Face2023-10-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/soda-lmu/tweet-annotation-sensitivity-1
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集来源于一个关于推文标注敏感性的实验。实验从Davidson等人的研究中选取了20条推文,并招募了1000名Prolific工人对这些推文进行标注。标注者在六种不同的实验条件下进行标注,并收集了标注者的人口统计信息和任务相关数据。数据集包含了推文的标注结果、标注者的背景信息以及任务完成的时间等详细数据。
提供机构:
soda-lmu
原始信息汇总

数据集概述

数据集描述

该数据集包含20条推文,这些推文是从Davidson et al. (2017)的研究中抽取的,并根据多数投票的类别和分歧程度进行了分层抽样。数据集用于情感分类和仇恨言论检测任务。

数据收集

招募了1000名Prolific工作者对每条推文进行标注,标注者被随机分配到六种实验条件之一,并被要求对推文进行仇恨言论、冒犯性语言或两者都不是的标注。此外,还收集了标注者的多种人口统计变量(如年龄和性别)以及一些辅助数据(如任务总时长、每个屏幕的时长)。

推文示例

数据集包含以下20条推文,用户名已进行哈希处理:

Tweet ID Tweets Hashed
1 Jihadi John has struck again. Prayers with the victims families. What is the #bounty on His head? Time 4 him 2 meet allah
2 RT @#####: Seriously, Florida. A vote for Charlie Christ is a vote for taxes and ebola.
3 Islamic Jihadis run away to Pakistan @##### @##### #UniteBlue
4 At this rate, Id cheer for the awful New York Yankees over the St. Louis Cardinals.
5 Allergies have my eyes all red and chink eyed, I look like Im stoned
6 Not protected in the state of Utah. There is no fear of queer couples forcing churches to marry them. They do not legally have to do so.
7 RT @#####: Boss Niggas Fuk wit Boss bitches Vice versa..
8 RT @#####: ever wanna text somebody and be like bitch die I hope you dont wake up you low down dirty muthafxcka&#128520
9 @##### @##### @##### shes a right twat, Id tell you, honestly Sky! Honestly &#128520
10 So yall give farwells to Lulu yet did all but shit on Kris? Ok bitches... ok
11 RT @#####: Dis bitch wanna smoke all our weed
12 Shittin on all my old hoes
13 Getting hit off a cunt thats about 6 foot sideways and a cunt that was in the army lol, i dini care, yous are fuckin laughable
14 When it rains its good "go to the mall weather" all the coons be at home
15 Dont mind black people.....hate niggers!
16 My eyes feel so chink eyed
17 "From now on, I will call all radical MUSLIMS niggers!It is very fitting and it is racist! I AM RACIST AGAINST ANYONE WHO HATES AMERICA!"""
18 RT @#####: Fucking chinks in Clash of Clans always fuck me up. Bitch ass kamikaze faggots.
19 @##### ya I know all the slang Im racist I h8 porch monkies
20 RT @#####: Stupid fucking nigger LeBron. You flopping stupid jungle bunny monkey faggot.

数据字典

数据集包含以下列:

Column Name Description Type
id annotator ID integer
age Age integer
gender Gender<br> 1: Female<br>2: Male<br> 3: Something Else<br> 4: Prefer not to say<br> factor
afam African-American<br> 0: No<br> 1: Yes binary
asian Asian-American<br> 0: No<br> 1: Yes binary
hispanic Hispanic<br> 0: No<br> 1: Yes binary
white White<br> 0: No<br> 1: Yes binary
race_other Other race/ethnicity<br> 0: No<br> 1: Yes binary
race_not_say Prefer not to say race/ethnicity<br> 0: No<br> 1: Yes binary
education Highest educational attainment<br> 1: Less than high school<br>2: High school<br> 3: Some college<br> 4: College graduate<br> 5: Masters degree or professional degree (Law, Medicine, MPH, etc.) <br> 6: Doctoral degree (PhD, DPH, EdD, etc.) factor
sexuality Sexuality<br> 1: Gay or Lesbian<br>2: Bisexual<br> 3: Straight<br> 4: Something Else<br> factor
english English first language? <br> 0: No<br> 1: Yes binary
tw_use Twitter Use <br> 1: Most days<br>2: Most weeks, but not every day<br> 3: A few times a month<br> 4: A few times a year<br> 5: Less often <br> 6: Never factor
social_media_use Social Media Use<br> 1: Most days<br>2: Most weeks, but not every day<br> 3: A few times a month<br> 4: A few times a year<br> 5: Less often <br> 0: Never factor
prolific_hours Prolific hours worked last month integer
task_fun Coding work was: fun<br> 0: No<br> 1: Yes binary
task_interesting Coding work was: interesting<br> 0: No<br> 1: Yes binary
task_boring Coding work was: boring<br> 0: No<br> 1: Yes binary
task_repetitive Coding work was: repetitive<br> 0: No<br> 1: Yes binary
task_important Coding work was: important<br> 0: No<br> 1: Yes binary
task_depressing Coding work was: depressing<br> 0: No<br> 1: Yes binary
task_offensive Coding work was: offensive<br> 0: No<br> 1: Yes binary
another_tweettask Likelihood to do another Tweet related task<br> not at all: Not at all likely<br> somewhat: Somewhat likely<br> very: Very likely factor
another_hatetask Likelihood to do another Hate Speech related task<br> not at all: Not at all likely<br> somewhat: Somewhat likely<br> very: Very likely factor
page_history Order in which annotator saw pages character
date_of_first_access Datetime of first access datetime
date_of_last_access Datetime of last access datetime
duration_sec Task duration in seconds integer
version Version of annotation task <br> A: Version A<br>B: Version B<br> C: Version C<br> D: Version D<br> E: Version E<br> F: Version F factor
tw1-20 Label assigned to Tweet 1-20<br> hate speech: Hate Speech<br> offensive language: Offensive Language<br> neither: Neither HS nor OL <br> NA: Missing or "dont know" factor
tw_duration_1-20 Annotation duration in milliseconds Tweet 1-20 numerical
num_approvals Prolific data: number of previous task approvals of annotator integer
num_rejections Prolific data: number of previous task rejections of annotator integer
prolific_score Annotator quality score by Prolific numerical
countryofbirth Prolific data: Annotator country of birth character
currentcountryofresidence Prolific data: Annotator country of residence character
employmentstatus Prolific data: Annotator Employment Status<br> Full-timePart-time<br> Unemployed (and job-seeking)<br> Due to start a new job within the next month<br> Not in paid work (e.g. homemaker, retired or disabled)<br> Other<br> DATA EXPIRED factor
firstlanguage Prolific data: Annotator first language character
nationality Prolific data: Nationality character
studentstatus Prolific data: Student status<br> Yes<br> No <br> DATA EXPIRED factor
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作