five

noxwano/ASMR-Archive-Processed-mini

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/noxwano/ASMR-Archive-Processed-mini
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: agpl-3.0 task_categories: - automatic-speech-recognition - text-to-speech language: - ja tags: - speech - audio - japanese - asmr - anime - not-for-all-audiences pretty_name: ASMR-Archive-Processed-mini size_categories: - 1M<n<10M --- # ASMR-Archive-Processed-mini ## Overview This dataset is a small subset of the original [OmniAICreator/ASMR-Archive-Processed](https://huggingface.co/datasets/OmniAICreator/ASMR-Archive-Processed) dataset, created so that when you want to use only a portion of the original dataset form every subdirectory, you can simply pass this dataset name. We randomly sampled about 10% of the data from each subdirectory of the original dataset. ## Dataset Contents & Preprocessing For detailed information regarding the specific contents of the data and the original preprocessing pipelines, please refer to the original [OmniAICreator/ASMR-Archive-Processed](https://huggingface.co/datasets/OmniAICreator/ASMR-Archive-Processed) dataset. ## Biases and Limitations Users should be aware of the following limitations inherited from the original dataset: * **NSFW Content**: This dataset contains a significant amount of data derived from content originally marked as NSFW. * **Gender Bias**: Due to the nature of the source material, the dataset is heavily skewed towards female voices. * **Overlapping Speakers**: Some audio segments may contain instances where multiple speakers are talking simultaneously. * **Inclusion of Sound Effects**: While the preprocessing pipeline is designed to isolate vocals, some segments may still contain residual sound effects commonly found in ASMR content. * **Potential Transcription Errors**: Transcriptions are generated automatically by AI models and have not been manually verified. They are likely to contain errors and inaccuracies. ## License & Usage This dataset inherits the **AGPL-3.0 license** from the source datasets. **Intended Use**: This dataset is intended strictly for educational and academic research purposes. **Disclaimer**: Use is at your own risk. You must ensure compliance with applicable laws. The dataset is provided "as is" with absolutely no express or implied warranty.
提供机构:
noxwano
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作