five

laion/majestrino-1.00-16xk5-sae-features

收藏
Hugging Face2026-03-16 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/laion/majestrino-1.00-16xk5-sae-features
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - audio-classification language: - multilingual tags: - sparse-autoencoder - voice - audio - interpretability - majestrino pretty_name: "Majestrino 1.00 SAE Feature Audio Samples" size_categories: - 1M<n<10M --- # Majestrino 1.00 SAE — Feature Audio Samples (16x, k=5) Top-2000 activating audio samples for each feature in the [Majestrino 1.00 SAE](https://huggingface.co/laion/majestrino-1.00-16xk5-sae). ## Overview | Metric | Value | |--------|-------| | SAE Architecture | 16x expansion, k=5, d_model=768 | | Total Features | 12,288 | | Alive Features | 10,684 | | Audio per Feature | Up to 2,000 highest-activating | | Audio Format | Opus (24 kbps OGG container) | | Total TAR Files | 1069 | | Source Dataset | [laion/majestrino-data](https://huggingface.co/datasets/laion/majestrino-data) | ## File Structure Each TAR file contains 10 features, named `features_XXXXX_YYYYY.tar`. Inside each TAR: ``` feature_00042/ metadata.json # Feature info, annotation, activation scores 00001991.opus # Audio file (highest activation) 00002299.opus # Audio file (2nd highest) ... feature_00043/ metadata.json ... ``` ### metadata.json ```json { "feature_id": 42, "title": "British Male Narrator", "description": "This feature activates on...", "bin": 12, "bin_name": "Broadcast & Formal Style", "consistency": 3, "activation_count": 15234, "n_audio_files": 2000, "activations": [ {"file": "00001991.opus", "activation": 0.8234, "original_path": "...", "tar_source": "00042"}, ... ] } ``` ## Usage ```python import tarfile, json # Extract a feature TAR with tarfile.open("features_00000_00009.tar") as tf: tf.extractall("./extracted") # Read metadata with open("./extracted/feature_00042/metadata.json") as f: meta = json.load(f) print(f"Feature {meta['feature_id']}: {meta['title']}") print(f"Top activation: {meta['activations'][0]['activation']:.4f}") ``` ## Related - **SAE Model**: [laion/majestrino-1.00-16xk5-sae](https://huggingface.co/laion/majestrino-1.00-16xk5-sae) - **Source Data**: [laion/majestrino-data](https://huggingface.co/datasets/laion/majestrino-data)
提供机构:
laion
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作