IlyaGusev/ru_stackoverflow
收藏Russian StackOverflow dataset
数据集概述
基本信息
- 许可证: other
- 任务类别:
- text-generation
- question-answering
- 语言: Russian
- 数据集大小: 100K<n<1M
数据集特征
- question_id: uint32
- url: string
- answer_count: uint32
- text_html: string
- text_markdown: string
- score: int32
- title: string
- tags: sequence of string
- views: uint64
- author: string
- timestamp: uint64
- comments: sequence
- text: string
- author: string
- comment_id: uint32
- score: int32
- timestamp: uint64
- answers: sequence
- answer_id: uint32
- is_accepted: uint8
- text_html: string
- text_markdown: string
- score: int32
- author: string
- timestamp: uint64
- comments: sequence
- text: string
- author: string
- comment_id: uint32
- score: int32
- timestamp: uint64
数据集分割
- train
- num_bytes: 3013377174
- num_examples: 437604
- 下载大小: 670468664
- 数据集大小: 3013377174
数据实例
{ "question_id": 11235, "answer_count": 1, "url": "https://ru.stackoverflow.com/questions/11235", "score": 2, "tags": ["c++", "сериализация"], "title": "Извлечение из файла, запись в файл", "views": 1309, "author": "...", "timestamp": 1303205289, "text_html": "...", "text_markdown": "...", "comments": { "text": ["...", "..."], "author": ["...", "..."], "comment_id": [11236, 11237], "score": [0, 0], "timestamp": [1303205411, 1303205678] }, "answers": { "answer_id": [11243, 11245], "timestamp": [1303207791, 1303207792], "is_accepted": [1, 0], "text_html": ["...", "..."], "text_markdown": ["...", "..."], "score": [3, 0], "author": ["...", "..."], "comments": { "text": ["...", "..."], "author": ["...", "..."], "comment_id": [11246, 11249], "score": [0, 0], "timestamp": [1303207961, 1303207800] } } }
许可证信息
- 许可证: CC BY-SA 2.5



