noxwano/ASMR-Archive-Processed-mini
收藏Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/noxwano/ASMR-Archive-Processed-mini
下载链接
链接失效反馈官方服务:
资源简介:
---
license: agpl-3.0
task_categories:
- automatic-speech-recognition
- text-to-speech
language:
- ja
tags:
- speech
- audio
- japanese
- asmr
- anime
- not-for-all-audiences
pretty_name: ASMR-Archive-Processed-mini
size_categories:
- 1M<n<10M
---
# ASMR-Archive-Processed-mini
## Overview
This dataset is a small subset of the original [OmniAICreator/ASMR-Archive-Processed](https://huggingface.co/datasets/OmniAICreator/ASMR-Archive-Processed) dataset, created so that when you want to use only a portion of the original dataset form every subdirectory, you can simply pass this dataset name.
We randomly sampled about 10% of the data from each subdirectory of the original dataset.
## Dataset Contents & Preprocessing
For detailed information regarding the specific contents of the data and the original preprocessing pipelines, please refer to the original [OmniAICreator/ASMR-Archive-Processed](https://huggingface.co/datasets/OmniAICreator/ASMR-Archive-Processed) dataset.
## Biases and Limitations
Users should be aware of the following limitations inherited from the original dataset:
* **NSFW Content**: This dataset contains a significant amount of data derived from content originally marked as NSFW.
* **Gender Bias**: Due to the nature of the source material, the dataset is heavily skewed towards female voices.
* **Overlapping Speakers**: Some audio segments may contain instances where multiple speakers are talking simultaneously.
* **Inclusion of Sound Effects**: While the preprocessing pipeline is designed to isolate vocals, some segments may still contain residual sound effects commonly found in ASMR content.
* **Potential Transcription Errors**: Transcriptions are generated automatically by AI models and have not been manually verified. They are likely to contain errors and inaccuracies.
## License & Usage
This dataset inherits the **AGPL-3.0 license** from the source datasets.
**Intended Use**: This dataset is intended strictly for educational and academic research purposes.
**Disclaimer**: Use is at your own risk. You must ensure compliance with applicable laws. The dataset is provided "as is" with absolutely no express or implied warranty.
提供机构:
noxwano



