five

zeio/baneks-speech

收藏
Hugging Face2023-10-12 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/zeio/baneks-speech
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - ru - en license: apache-2.0 tags: - not-for-all-audiences - art - humour - jokes annotation_creators: - crowdsourced - original - machine-generated language_creators: - crowdsourced - original - machine-generated pretty_name: baneks-speech size_categories: - 10K<n<100K task_categories: - text-to-speech - automatic-speech-recognition --- # Dataset card for baneks-speech ## Table of contents - [Dataset description](#dataset-description) - [Dataset summary](#dataset-summary) - [Dataset structure](#dataset-structure) - [Dataset instance](#dataset-instance) - [Dataset fields](#dataset-fields) ## Dataset description - **Homepage:** [baneks-speech homepage]() - **Repository:** [baneks-speech repository](https://huggingface.co/datasets/zeio/baneks-speech) - **Point of contact:** [Zeio Nara](mailto:zeionara@gmail.com) - **Dataset version:** `10.10.2023` ### Dataset summary This dataset contains speech generated for anekdotes parsed from a few vk social network communities using [stc model](https://cloud.speechpro.com/home). The dataset corresponds to the **default** configuration from the [baneks](https://huggingface.co/datasets/zeio/baneks) dataset. Since the dataset is regularly updated, there is no fixed number of entries, so stay tuned. ## Dataset structure ### Data instance An example of an entry from the dataset is given below: ```json { 'text': 'Сидят русский и казах Русский спрашивает: - Слушай, а у тебя жена есть? - Жок - Молодец! Я бы свою тоже сжег нахуй!', 'audio': { 'path': '/root/.cache/huggingface/datasets/downloads/extracted/e8abea39e83c61e4a60c5a4b0661dc044ae82cfcd84b26f966b87999f73ae92e/00476018.anekdotikategoriib.mp3', 'array': array([ 8.09500818e-07, 1.27653129e-06, 4.17583010e-07, ..., -8.52341486e-07, 9.19626189e-07, 1.67368569e-06]), 'sampling_rate': 22050 }, 'artist': 'Vladimir_n', 'id': 476018, 'source': 'anekdotikategoriib' } ``` ### Data fields Each dataset entry therefore consists of the following fields: - `text` - text representation of the anecdote; - `id` - id of the corresponding post; - `source` - community name in which the corresponding post has been published; - `artist` - identifier of the voice which was used for speech generation; - `audio` - audio data read from an `mp3` file.
提供机构:
zeio
原始信息汇总

数据集卡片 for baneks-speech

数据集描述

数据集摘要

该数据集包含为从几个VK社交网络社区解析的轶事生成的语音,使用stc模型生成。数据集对应于baneks数据集的默认配置。由于数据集定期更新,没有固定的条目数量,请保持关注。

数据集结构

数据实例

以下是一个数据集条目的示例:

json { text: Сидят русский и казах Русский спрашивает: - Слушай, а у тебя жена есть? - Жок - Молодец! Я бы свою тоже сжег нахуй!, audio: { path: /root/.cache/huggingface/datasets/downloads/extracted/e8abea39e83c61e4a60c5a4b0661dc044ae82cfcd84b26f966b87999f73ae92e/00476018.anekdotikategoriib.mp3, array: array([ 8.09500818e-07, 1.27653129e-06, 4.17583010e-07, ..., -8.52341486e-07, 9.19626189e-07, 1.67368569e-06]), sampling_rate: 22050 }, artist: Vladimir_n, id: 476018, source: anekdotikategoriib }

数据字段

每个数据集条目包含以下字段:

  • text - 轶事的文本表示;
  • id - 对应帖子的ID;
  • source - 发布对应帖子的社区名称;
  • artist - 用于语音生成的声音标识符;
  • audio - 从mp3文件读取的音频数据。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作