five

takiholadi/kill-me-please-dataset

收藏
Hugging Face2022-10-19 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/takiholadi/kill-me-please-dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - no-annotation language_creators: - found language: - ru multilinguality: - monolingual pretty_name: Kill-Me-Please Dataset size_categories: - 10K<n<100K source_datasets: - original tags: - stories - website task_categories: - text-generation - text-classification --- # Dataset Card for Kill-Me-Please Dataset ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) ## Dataset Description - **Repository:** [github pet project repo](https://github.com/takiholadi/generative-kill-me-please) ### Dataset Summary It is an Russian-language dataset containing just over 30k unique stories as written as users of https://killpls.me as of period from March 2009 to October 2022. This resource was blocked by Roskomnadzor so consider text-generation task if you want more stories. ### Languages ru-RU ## Dataset Structure ### Data Instances Here is an example of instance: ``` {'text': 'По глупости удалил всю 10 летнюю базу. Восстановлению не подлежит. Мне конец. КМП!' 'tags': 'техника' 'votes': 2914 'url': 'https://killpls.me/story/616' 'datetime': '4 июля 2009, 23:20'} ``` ### Data Fields - `text`: a string containing the body of the story - `tags`: a string containing a comma-separated tags in a multi-label setup, fullset of tags (except of one empty-tagged record): `внешность`, `деньги`, `друзья`, `здоровье`, `отношения`, `работа`, `разное`, `родители`, `секс`, `семья`, `техника`, `учеба` - `votes`: an integer sum of upvotes/downvotes - `url`: a string containing the url where the story was web-scraped from - `datetime`: a string containing with the datetime the story was written ### Data Splits The has 2 multi-label stratified splits: train and test. | Dataset Split | Number of Instances in Split | | ------------- | ------------------------------------------- | | Train | 27,321 | | Test | 2,772 |
提供机构:
takiholadi
原始信息汇总

Kill-Me-Please Dataset 概述

数据集描述

数据集总结

  • 语言: 俄语 (ru-RU)
  • 内容: 包含超过30,000个独特的俄语故事,这些故事由用户在2009年3月至2022年10月期间在killpls.me网站上撰写。
  • 用途: 适用于文本生成和文本分类任务。

语言

  • 语言: 俄语 (ru-RU)

数据集结构

数据实例

  • 示例:

    {text: По глупости удалил всю 10 летнюю базу. Восстановлению не подлежит. Мне конец. КМП! tags: техника votes: 2914 url: https://killpls.me/story/616 datetime: 4 июля 2009, 23:20}

数据字段

  • text: 故事主体,字符串类型。
  • tags: 标签,字符串类型,包含多个标签,如внешность, деньги, друзья, здоровье, отношения, работа, разное, родители, секс, семья, техника, учеба
  • votes: 投票总数,整数类型。
  • url: 故事来源的URL,字符串类型。
  • datetime: 故事撰写的时间,字符串类型。

数据分割

  • 训练集: 27,321个实例
  • 测试集: 2,772个实例
  • 分割方式: 多标签分层分割
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作