five

MemeEffect-382K

收藏
魔搭社区2025-12-05 更新2025-08-23 收录
下载链接:
https://modelscope.cn/datasets/sleeping-ai/MemeEffect-382K
下载链接
链接失效反馈
官方服务:
资源简介:
Excited to release **Meme Effect 382K**. It is the largest known collection of Meme voice effects to train fundamental text-to-voice models that does not only tackle human emotions rather consider factor like **sarcasm** and **popular meme culture** to become more human. We hope that researchers will consider building human centric **TTS** models and include our dataset in their training corpus to make **text-to-speech/voice** models more human. ### Data fields - `id`: Unique identifier for the sound. - `owner`: Information about the uploader. - `id`: Uploader’s unique ID. - `name`: Display name of the uploader. - `slug`: URL-friendly version of the name. - `verified`: Boolean flag for account verification. - `voicemodAccountId`: Uploader’s Voicemod account ID. - `title`: Title of the sound. - `category`: Category tag (e.g., memes, music). - `dmca`: Boolean flag indicating DMCA status. - `tags`: List of associated tags. - `imagePath`: URL to the image representing the sound. - `thumbnailS`: Small thumbnail image URL. - `thumbnailM`: Medium thumbnail image URL. - `path`: URL to the `.mp3` file of the sound. - `oggPath`: URL to the `.ogg` file version. - `description`: Description provided by the user. - `createdAt`: ISO timestamp when the sound was created. - `anonymous`: Boolean indicating anonymous upload. - `social`: Engagement metrics. - `likeCount`: Number of likes. - `sharedCount`: Number of shares. - `bookmarkCount`: Number of bookmarks. - `properties`: Technical sound properties. - `duration`: Duration in milliseconds. - `loudness`: Average loudness (dB). - `maximumAmplitude`: Max amplitude level. - `stats`: Usage statistics. - `downloadCount`: Total downloads. - `playedCount`: Total plays. - `sendVoicemodCount`: Sent via Voicemod count. - `visitCount`: Page visit count. - `permalink`: Permanent URL to the sound’s page. - `explicit`: Boolean indicating explicit content. - `updatedAt`: ISO timestamp of last update. - `moderationStatus`: Moderation flag (e.g., `explicit`, `safe`). - `uploadSource`: Platform or method used to upload. - `uploadStatus`: Current processing status of the file. - `textModeration`: NLP-based moderation results. - `title`: Classification of title content. - `description`: Classification of description. - `tags`: Classification of tags. - `transcription`: Classification of transcription. ### Ethical Statement We are releasing this under research exemption provided by the EU and following all the ethical practices of respective local laws where the data was downloaded. We also restrict the data access to non-commercial and research purposes-only. ### LICENCE We release this dataset under CC-by-nc-nd 4.0, that means 1. You can't create derivatives of this dataset and reupload it anywhere on the internet without our permission 2. No commercial activities are allowed and if you break our LICENCE terms, you will be subjected to respective law accordance to the action taken by the parties involved in making this dataset.

我们非常荣幸推出**Meme Effect 382K**。这是目前已知规模最大的表情包语音效果数据集,旨在训练基础的文本到语音模型;该数据集不仅覆盖人类情感表达,还充分考虑反讽(sarcasm)与流行表情包文化等因素,让生成的语音更贴近自然人的表达逻辑。 我们期望研究人员能够构建以人类为中心的文本转语音(Text-to-Speech,TTS)模型,并将本数据集纳入训练语料库,从而让文本到语音/语音模型的输出更具自然人质感。 ### 数据字段 - `id`:音频文件的唯一标识符。 - `owner`:上传者信息 - `id`:上传者的唯一ID。 - `name`:上传者的显示名称。 - `slug`:适配URL的名称格式版本。 - `verified`:账号验证状态的布尔标记。 - `voicemodAccountId`:上传者的Voicemod账号ID。 - `title`:音频文件的标题。 - `category`:分类标签(例如:表情包、音乐)。 - `dmca`:数字千年版权法(Digital Millennium Copyright Act,DMCA)状态的布尔标记。 - `tags`:关联标签列表。 - `imagePath`:该音频对应展示图片的URL。 - `thumbnailS`:小型缩略图的URL。 - `thumbnailM`:中型缩略图的URL。 - `path`:该音频的`.mp3`文件URL。 - `oggPath`:该音频的`.ogg`格式文件URL。 - `description`:用户提供的音频描述信息。 - `createdAt`:音频创建的ISO格式时间戳。 - `anonymous`:标记是否为匿名上传的布尔值。 - `social`:互动统计指标 - `likeCount`:点赞总量。 - `sharedCount`:分享总量。 - `bookmarkCount`:收藏总量。 - `properties`:音频技术属性 - `duration`:音频时长(单位:毫秒)。 - `loudness`:平均响度(单位:分贝)。 - `maximumAmplitude`:最大振幅值。 - `stats`:使用统计数据 - `downloadCount`:总下载量。 - `playedCount`:总播放量。 - `sendVoicemodCount`:通过Voicemod转发的次数。 - `visitCount`:页面访问总量。 - `permalink`:该音频页面的永久链接。 - `explicit`:标记内容是否为低俗/成人向的布尔值。 - `updatedAt`:最后一次更新的ISO格式时间戳。 - `moderationStatus`:审核状态标记(例如:`explicit`(低俗)、`safe`(合规))。 - `uploadSource`:上传所用的平台或方式。 - `uploadStatus`:文件当前的处理状态。 - `textModeration`:基于自然语言处理的审核结果 - `title`:标题内容的分类结果。 - `description`:描述内容的分类结果。 - `tags`:标签内容的分类结果。 - `transcription`:转录内容的分类结果。 ### 伦理声明 本数据集依据欧盟提供的研究豁免条款发布,并严格遵循数据来源地的所有相关伦理规范与当地法律法规。同时,我们将数据集的使用范围严格限定为非商业用途与纯研究用途。 ### 许可协议 本数据集采用知识共享署名-非商业性使用-禁止演绎4.0(CC BY-NC-ND 4.0)协议发布,具体条款如下: 1. 未经本团队许可,您不得对本数据集进行衍生创作并将其重新上传至互联网任意平台。 2. 禁止任何商业性使用行为;若您违反本许可协议条款,将需承担相应法律责任,且本数据集制作相关方可依法采取相应维权行动。
提供机构:
maas
创建时间:
2025-07-31
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作