abercowsky/autotrain-data-sexual-content-classification
收藏AutoTrain Dataset for project: sexual-content-classification
数据集描述
该数据集由AutoTrain自动处理,用于项目sexual-content-classification。
语言
数据集的语言BCP-47代码为en。
数据集结构
数据实例
数据集的一个样本如下所示:
json [ { "text": "Youre No Good: was covered in a College babes fucks with the neighbor charting version by which soul singer and pianist?", "target": 1 }, { "text": "AT first u201cthe buttonu201d seemed like an April Foolsu2019 joke.
Now, 13 days later, Redditu2019s social experiment is still holding momentum.
The button feature was added to Reddit on April 1 and contains a timer which counts down from 60 seconds to zero.
However, every time the button is pushed, the timer is reset.
Although Reddit users have been speculating the reason for the experiment, no one knows its specific purpose.
Additionally, no one is aware what will happen when the countdown reaches zero because the timer is yet to fall below 29 seconds.
Redditors can only use the feature if they were a member of the website before April 1 and they can only push the button once.
As of this afternoon, over 711,000 members have pushed the button.
Since its inception, members have received coloured circles next to their username which indicate how long they waited to push the button.
Those who donu2019t push the button receive a grey circle, while those who give in to temptation receive circles ranging from purple all the way down to red.
To date, no one has waited past the time restrictions of yellow meaning there are no orange or red circles floating around Reddit.
However, one can only assume interest will eventually disappear and the true purpose of the button will be revealed.
Think about this: for the past 12 days someone on Earth has pressed a button every 30 sec or so. http://t.co/7ALLYeWB7i #TheButton @reddit u2014 Zachs Mind (@ZachsMind) April 13, 2015
The fact that Ive been watching #thebutton for 11 days is starting to concern me. u2014 ConvertToChris (@converttochris) April 12, 2015
I only date people who have not pressed #TheButton u2014 Pyro (@Pyrao) April 11, 2015", "target": 0 } ]
数据集字段
数据集包含以下字段(也称为“特征”):
json { "text": "Value(dtype=string, id=None)", "target": "ClassLabel(names=[0, 1], id=None)" }
数据集拆分
该数据集被拆分为训练集和验证集。拆分大小如下:
| 拆分名称 | 样本数量 |
|---|---|
| train | 568324 |
| valid | 142082 |



