zeio/baneks-speech
收藏Hugging Face2023-10-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zeio/baneks-speech
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- ru
- en
license: apache-2.0
tags:
- not-for-all-audiences
- art
- humour
- jokes
annotation_creators:
- crowdsourced
- original
- machine-generated
language_creators:
- crowdsourced
- original
- machine-generated
pretty_name: baneks-speech
size_categories:
- 10K<n<100K
task_categories:
- text-to-speech
- automatic-speech-recognition
---
# Dataset card for baneks-speech
## Table of contents
- [Dataset description](#dataset-description)
- [Dataset summary](#dataset-summary)
- [Dataset structure](#dataset-structure)
- [Dataset instance](#dataset-instance)
- [Dataset fields](#dataset-fields)
## Dataset description
- **Homepage:** [baneks-speech homepage]()
- **Repository:** [baneks-speech repository](https://huggingface.co/datasets/zeio/baneks-speech)
- **Point of contact:** [Zeio Nara](mailto:zeionara@gmail.com)
- **Dataset version:** `10.10.2023`
### Dataset summary
This dataset contains speech generated for anekdotes parsed from a few vk social network communities using [stc model](https://cloud.speechpro.com/home).
The dataset corresponds to the **default** configuration from the [baneks](https://huggingface.co/datasets/zeio/baneks) dataset.
Since the dataset is regularly updated, there is no fixed number of entries, so stay tuned.
## Dataset structure
### Data instance
An example of an entry from the dataset is given below:
```json
{
'text': 'Сидят русский и казах Русский спрашивает: - Слушай, а у тебя жена есть? - Жок - Молодец! Я бы свою тоже сжег нахуй!',
'audio': {
'path': '/root/.cache/huggingface/datasets/downloads/extracted/e8abea39e83c61e4a60c5a4b0661dc044ae82cfcd84b26f966b87999f73ae92e/00476018.anekdotikategoriib.mp3',
'array': array([ 8.09500818e-07, 1.27653129e-06, 4.17583010e-07, ..., -8.52341486e-07, 9.19626189e-07, 1.67368569e-06]),
'sampling_rate': 22050
},
'artist': 'Vladimir_n',
'id': 476018,
'source': 'anekdotikategoriib'
}
```
### Data fields
Each dataset entry therefore consists of the following fields:
- `text` - text representation of the anecdote;
- `id` - id of the corresponding post;
- `source` - community name in which the corresponding post has been published;
- `artist` - identifier of the voice which was used for speech generation;
- `audio` - audio data read from an `mp3` file.
提供机构:
zeio
原始信息汇总
数据集卡片 for baneks-speech
数据集描述
数据集摘要
该数据集包含为从几个VK社交网络社区解析的轶事生成的语音,使用stc模型生成。数据集对应于baneks数据集的默认配置。由于数据集定期更新,没有固定的条目数量,请保持关注。
数据集结构
数据实例
以下是一个数据集条目的示例:
json { text: Сидят русский и казах Русский спрашивает: - Слушай, а у тебя жена есть? - Жок - Молодец! Я бы свою тоже сжег нахуй!, audio: { path: /root/.cache/huggingface/datasets/downloads/extracted/e8abea39e83c61e4a60c5a4b0661dc044ae82cfcd84b26f966b87999f73ae92e/00476018.anekdotikategoriib.mp3, array: array([ 8.09500818e-07, 1.27653129e-06, 4.17583010e-07, ..., -8.52341486e-07, 9.19626189e-07, 1.67368569e-06]), sampling_rate: 22050 }, artist: Vladimir_n, id: 476018, source: anekdotikategoriib }
数据字段
每个数据集条目包含以下字段:
text- 轶事的文本表示;id- 对应帖子的ID;source- 发布对应帖子的社区名称;artist- 用于语音生成的声音标识符;audio- 从mp3文件读取的音频数据。



