Hakureirm/bird-species-10k
收藏Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Hakureirm/bird-species-10k
下载链接
链接失效反馈官方服务:
资源简介:
# Bird-10K SigLIP Training Dataset
Taxonomy-aware image-text dataset for fine-tuning SigLIP on 10,753 bird species.
## Dataset Structure
```
taxonomy.json # eBird v2025 taxonomy (species/genus/family/order hierarchy)
hard_negatives.json # Same-genus & same-family negative sampling index
siglip_train.jsonl # 692,236 training image-text pairs
siglip_val.jsonl # 32,125 validation pairs
images_224_*.tar.gz # 224x224 bird images organized by species (from bird-species-10k)
DATA_NOTES.md # Data provenance and quality notes
```
## Record Format (jsonl)
```json
{
"image": "path/to/image.jpg",
"taxonomy_version": "ebird_v2025",
"species": "Abbott's Babbler",
"scientific_name": "Malacocincla abbotti",
"cn_name": "阿氏雅鹛",
"genus": "Malacocincla",
"family": "雀眉科",
"order": "雀形目",
"description": "Plain, sandy-brown babbler with...",
"desc_source": "ebird_original",
"texts": {
"species": ["A photo of a Abbott's Babbler.", ...],
"species_taxonomy": ["Abbott's Babbler (Malacocincla abbotti), a bird of the family 雀眉科.", ...],
"genus": ["A bird of the genus Malacocincla.", ...],
"family": ["A bird in the family 雀眉科.", ...],
"bilingual": ["Abbott's Babbler / 阿氏雅鹛(Malacocincla abbotti)", ...],
"chinese": ["阿氏雅鹛(Malacocincla abbotti),属于雀形目雀眉科。", ...]
},
"hard_negatives": {
"same_genus": ["Black-browed Babbler", "Horsfield's Babbler"],
"same_family_count": 63
}
}
```
## Statistics
- **Species**: 10,753 (with images) / 10,805 (total taxonomy)
- **Images**: 724,342
- **Taxonomy**: 42 orders / 286 families / 2,345 genera
- **Text templates**: 6 levels (species / species_taxonomy / genus / family / bilingual / chinese)
- **Descriptions**: 99.8% from eBird (0% LLM-generated)
## Training Recipe
Joint training with:
- **SigLIP sigmoid contrastive loss** (image-text alignment)
- **Species classification head** (λ=1.0)
- **Genus classification head** (λ=0.3)
- **Family classification head** (λ=0.1)
- **Taxonomy-aware hard negative sampling** (same-genus priority)
## Taxonomy
Based on eBird/Clements v2025. All order/family names standardized to simplified Chinese with proper suffixes (目/科).
## License
Research and educational use.
提供机构:
Hakureirm



