five

gijs/emolia-balanced-5M-subset

收藏
Hugging Face2026-04-26 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/gijs/emolia-balanced-5M-subset
下载链接
链接失效反馈
官方服务:
资源简介:
这是emolia-balanced-5M-subset语料库的重新打包版本,作为WebDataset tars,包含了由MOSS-Audio-8B-Instruct生成的每段音频剪辑的语音维度注释。每段音频的JSON侧车文件增加了18个顶级组别键,每个组别包含3-4个短代码字段,描述了语音的一个维度,总共覆盖了59个语音维度。数据集布局为每个tar包含配对的<key>.mp3和<key>.json样本,音频与源文件字节相同,只有侧车JSON增加了18个注释键。生成方式是通过OpenMOSS-Team/MOSS-Audio-8B-Instruct模型,每个音频剪辑被提示18次(每个组别一次),模型返回包含该组别短代码键的JSON对象。注意事项包括一小部分剪辑可能包含错误或原始标签,注释由神经网络模型生成,建议在高风险下游使用时针对目标片段进行人工标注检查。数据集来源于emolia-balanced-5M-subset语料库。

This is the emolia-balanced-5M-subset corpus re-packaged as WebDataset tars with per-clip voice-dimension annotations generated by MOSS-Audio-8B-Instruct. For every audio clip the JSON sidecar is augmented with 18 top-level group keys, each containing 3–4 short-code fields that describe one dimension of the voice, covering 59 total voice dimensions. The dataset layout is a straight WebDataset: each tar contains paired <key>.mp3 + <key>.json samples. The audio is byte-identical to the source; only the sidecar JSON is enriched with the 18 annotation keys. It was generated using the OpenMOSS-Team/MOSS-Audio-8B-Instruct model, with each audio clip prompted 18 times — once per group — and the model returns a JSON object with the groups short-code keys. Caveats include a very small fraction of clips having an _error / _raw tag inside a group instead of parsed fields, and annotations being generated by a neural model, recommending spot-checking against human-labelled references for high-stakes downstream use. The source is derived from the emolia-balanced-5M-subset corpus.
提供机构:
gijs
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作