soda-lmu/tweet-annotation-sensitivity-1
收藏Hugging Face2023-10-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/soda-lmu/tweet-annotation-sensitivity-1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集来源于一个关于推文标注敏感性的实验。实验从Davidson等人的研究中选取了20条推文,并招募了1000名Prolific工人对这些推文进行标注。标注者在六种不同的实验条件下进行标注,并收集了标注者的人口统计信息和任务相关数据。数据集包含了推文的标注结果、标注者的背景信息以及任务完成的时间等详细数据。
提供机构:
soda-lmu
原始信息汇总
数据集概述
数据集描述
该数据集包含20条推文,这些推文是从Davidson et al. (2017)的研究中抽取的,并根据多数投票的类别和分歧程度进行了分层抽样。数据集用于情感分类和仇恨言论检测任务。
数据收集
招募了1000名Prolific工作者对每条推文进行标注,标注者被随机分配到六种实验条件之一,并被要求对推文进行仇恨言论、冒犯性语言或两者都不是的标注。此外,还收集了标注者的多种人口统计变量(如年龄和性别)以及一些辅助数据(如任务总时长、每个屏幕的时长)。
推文示例
数据集包含以下20条推文,用户名已进行哈希处理:
| Tweet ID | Tweets Hashed |
|---|---|
| 1 | Jihadi John has struck again. Prayers with the victims families. What is the #bounty on His head? Time 4 him 2 meet allah |
| 2 | RT @#####: Seriously, Florida. A vote for Charlie Christ is a vote for taxes and ebola. |
| 3 | Islamic Jihadis run away to Pakistan @##### @##### #UniteBlue |
| 4 | At this rate, Id cheer for the awful New York Yankees over the St. Louis Cardinals. |
| 5 | Allergies have my eyes all red and chink eyed, I look like Im stoned |
| 6 | Not protected in the state of Utah. There is no fear of queer couples forcing churches to marry them. They do not legally have to do so. |
| 7 | RT @#####: Boss Niggas Fuk wit Boss bitches Vice versa.. |
| 8 | RT @#####: ever wanna text somebody and be like bitch die I hope you dont wake up you low down dirty muthafxcka😈 |
| 9 | @##### @##### @##### shes a right twat, Id tell you, honestly Sky! Honestly 😈 |
| 10 | So yall give farwells to Lulu yet did all but shit on Kris? Ok bitches... ok |
| 11 | RT @#####: Dis bitch wanna smoke all our weed |
| 12 | Shittin on all my old hoes |
| 13 | Getting hit off a cunt thats about 6 foot sideways and a cunt that was in the army lol, i dini care, yous are fuckin laughable |
| 14 | When it rains its good "go to the mall weather" all the coons be at home |
| 15 | Dont mind black people.....hate niggers! |
| 16 | My eyes feel so chink eyed |
| 17 | "From now on, I will call all radical MUSLIMS niggers!It is very fitting and it is racist! I AM RACIST AGAINST ANYONE WHO HATES AMERICA!""" |
| 18 | RT @#####: Fucking chinks in Clash of Clans always fuck me up. Bitch ass kamikaze faggots. |
| 19 | @##### ya I know all the slang Im racist I h8 porch monkies |
| 20 | RT @#####: Stupid fucking nigger LeBron. You flopping stupid jungle bunny monkey faggot. |
数据字典
数据集包含以下列:
| Column Name | Description | Type |
|---|---|---|
| id | annotator ID | integer |
| age | Age | integer |
| gender | Gender<br> 1: Female<br>2: Male<br> 3: Something Else<br> 4: Prefer not to say<br> | factor |
| afam | African-American<br> 0: No<br> 1: Yes | binary |
| asian | Asian-American<br> 0: No<br> 1: Yes | binary |
| hispanic | Hispanic<br> 0: No<br> 1: Yes | binary |
| white | White<br> 0: No<br> 1: Yes | binary |
| race_other | Other race/ethnicity<br> 0: No<br> 1: Yes | binary |
| race_not_say | Prefer not to say race/ethnicity<br> 0: No<br> 1: Yes | binary |
| education | Highest educational attainment<br> 1: Less than high school<br>2: High school<br> 3: Some college<br> 4: College graduate<br> 5: Masters degree or professional degree (Law, Medicine, MPH, etc.) <br> 6: Doctoral degree (PhD, DPH, EdD, etc.) | factor |
| sexuality | Sexuality<br> 1: Gay or Lesbian<br>2: Bisexual<br> 3: Straight<br> 4: Something Else<br> | factor |
| english | English first language? <br> 0: No<br> 1: Yes | binary |
| tw_use | Twitter Use <br> 1: Most days<br>2: Most weeks, but not every day<br> 3: A few times a month<br> 4: A few times a year<br> 5: Less often <br> 6: Never | factor |
| social_media_use | Social Media Use<br> 1: Most days<br>2: Most weeks, but not every day<br> 3: A few times a month<br> 4: A few times a year<br> 5: Less often <br> 0: Never | factor |
| prolific_hours | Prolific hours worked last month | integer |
| task_fun | Coding work was: fun<br> 0: No<br> 1: Yes | binary |
| task_interesting | Coding work was: interesting<br> 0: No<br> 1: Yes | binary |
| task_boring | Coding work was: boring<br> 0: No<br> 1: Yes | binary |
| task_repetitive | Coding work was: repetitive<br> 0: No<br> 1: Yes | binary |
| task_important | Coding work was: important<br> 0: No<br> 1: Yes | binary |
| task_depressing | Coding work was: depressing<br> 0: No<br> 1: Yes | binary |
| task_offensive | Coding work was: offensive<br> 0: No<br> 1: Yes | binary |
| another_tweettask | Likelihood to do another Tweet related task<br> not at all: Not at all likely<br> somewhat: Somewhat likely<br> very: Very likely | factor |
| another_hatetask | Likelihood to do another Hate Speech related task<br> not at all: Not at all likely<br> somewhat: Somewhat likely<br> very: Very likely | factor |
| page_history | Order in which annotator saw pages | character |
| date_of_first_access | Datetime of first access | datetime |
| date_of_last_access | Datetime of last access | datetime |
| duration_sec | Task duration in seconds | integer |
| version | Version of annotation task <br> A: Version A<br>B: Version B<br> C: Version C<br> D: Version D<br> E: Version E<br> F: Version F | factor |
| tw1-20 | Label assigned to Tweet 1-20<br> hate speech: Hate Speech<br> offensive language: Offensive Language<br> neither: Neither HS nor OL <br> NA: Missing or "dont know" | factor |
| tw_duration_1-20 | Annotation duration in milliseconds Tweet 1-20 | numerical |
| num_approvals | Prolific data: number of previous task approvals of annotator | integer |
| num_rejections | Prolific data: number of previous task rejections of annotator | integer |
| prolific_score | Annotator quality score by Prolific | numerical |
| countryofbirth | Prolific data: Annotator country of birth | character |
| currentcountryofresidence | Prolific data: Annotator country of residence | character |
| employmentstatus | Prolific data: Annotator Employment Status<br> Full-timePart-time<br> Unemployed (and job-seeking)<br> Due to start a new job within the next month<br> Not in paid work (e.g. homemaker, retired or disabled)<br> Other<br> DATA EXPIRED | factor |
| firstlanguage | Prolific data: Annotator first language | character |
| nationality | Prolific data: Nationality | character |
| studentstatus | Prolific data: Student status<br> Yes<br> No <br> DATA EXPIRED | factor |



