freedumb2000/anima-tagger-artifacts
收藏Hugging Face2026-04-21 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/freedumb2000/anima-tagger-artifacts
下载链接
链接失效反馈官方服务:
资源简介:
---
license: cc0-1.0
task_categories:
- text-retrieval
- feature-extraction
language:
- en
tags:
- danbooru
- anime
- rag
- tag-retrieval
- anima
size_categories:
- 100K<n<1M
---
# anima-tagger-artifacts
Pre-built retrieval artefacts for the **Anima** tag format of the
[`sd-webui-prompt-enhancer`](https://github.com/Gunther-Schulz/sd-webui-prompt-enhancer)
Stable Diffusion WebUI extension. Lets the extension's Anima pipeline
do real-time embedding-based tag validation and shortlist retrieval
without users needing to rebuild a 270k+ entry FAISS index locally.
## Contents
| File | Size | Description |
|---|---|---|
| `tags.sqlite` | ~30 MB | 273,025 Danbooru tags (name, category, post count, aliases, wiki). Post-count floor 10. |
| `tags.faiss` | ~1.1 GB | FAISS FlatIP index of bge-m3 embeddings (1024-dim) for every tag. Artist and character embeddings include co-occurrence signatures (top-12 general tags from their actual Danbooru posts). |
| `cooccurrence.sqlite` | ~10 MB | Pointwise-mutual-information table for character↔series, character↔artist, series↔character pairs. Enables automatic series-pairing (e.g. `hatsune_miku` → `vocaloid`) at query time. |
| `VERSION` | <1 kB | JSON manifest with per-file sha256 + size + build date. |
## Usage
Automatic — the extension's `install.py` downloads these on Forge
startup, verifies sha256 against `VERSION`, and re-downloads when the
upstream hash changes.
Manual (e.g. for other projects):
```python
from huggingface_hub import hf_hub_download
for fname in ("tags.sqlite", "tags.faiss", "cooccurrence.sqlite", "VERSION"):
hf_hub_download(
repo_id="freedumb2000/anima-tagger-artifacts",
filename=fname, repo_type="dataset",
local_dir="./data",
)
```
## How it was built
1. Tag metadata + wiki from
[`NSFW-API/DanBooruTagsAndWikiDumpSept2025`](https://huggingface.co/datasets/NSFW-API/DanBooruTagsAndWikiDumpSept2025)
(1.59M tags; filtered to post_count ≥ 10 → 273,025 tags).
2. Post-level tag sets from
[`isek-ai/danbooru-tags-2024`](https://huggingface.co/datasets/isek-ai/danbooru-tags-2024)
(streamed first 500k posts).
3. For each artist/character, compute top-12 co-occurring general tags
across their posts → "style signature".
4. Format each tag as `"<name> (<category>) | aliases: ... | <wiki excerpt>
| associated with: <top co-occurring tags>"` and embed with
[`BAAI/bge-m3`](https://huggingface.co/BAAI/bge-m3) (fp16 on GPU,
normalized, 1024-dim).
5. FAISS FlatIP index over all 273k vectors.
6. PMI table over the same post sample for character↔series pairing.
Full rebuild: ~10 minutes on a modern GPU.
## License
The artefacts themselves are released under CC0-1.0. The upstream
Danbooru tag data is public-domain by convention. Refer to the
linked source datasets for their own attribution requirements.
The underlying embedding model ([`BAAI/bge-m3`](https://huggingface.co/BAAI/bge-m3))
is MIT-licensed.
提供机构:
freedumb2000



