five

Hakureirm/bird-species-10k

收藏
Hugging Face2026-03-25 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Hakureirm/bird-species-10k
下载链接
链接失效反馈
官方服务:
资源简介:
# Bird-10K SigLIP Training Dataset Taxonomy-aware image-text dataset for fine-tuning SigLIP on 10,753 bird species. ## Dataset Structure ``` taxonomy.json # eBird v2025 taxonomy (species/genus/family/order hierarchy) hard_negatives.json # Same-genus & same-family negative sampling index siglip_train.jsonl # 692,236 training image-text pairs siglip_val.jsonl # 32,125 validation pairs images_224_*.tar.gz # 224x224 bird images organized by species (from bird-species-10k) DATA_NOTES.md # Data provenance and quality notes ``` ## Record Format (jsonl) ```json { "image": "path/to/image.jpg", "taxonomy_version": "ebird_v2025", "species": "Abbott's Babbler", "scientific_name": "Malacocincla abbotti", "cn_name": "阿氏雅鹛", "genus": "Malacocincla", "family": "雀眉科", "order": "雀形目", "description": "Plain, sandy-brown babbler with...", "desc_source": "ebird_original", "texts": { "species": ["A photo of a Abbott's Babbler.", ...], "species_taxonomy": ["Abbott's Babbler (Malacocincla abbotti), a bird of the family 雀眉科.", ...], "genus": ["A bird of the genus Malacocincla.", ...], "family": ["A bird in the family 雀眉科.", ...], "bilingual": ["Abbott's Babbler / 阿氏雅鹛(Malacocincla abbotti)", ...], "chinese": ["阿氏雅鹛(Malacocincla abbotti),属于雀形目雀眉科。", ...] }, "hard_negatives": { "same_genus": ["Black-browed Babbler", "Horsfield's Babbler"], "same_family_count": 63 } } ``` ## Statistics - **Species**: 10,753 (with images) / 10,805 (total taxonomy) - **Images**: 724,342 - **Taxonomy**: 42 orders / 286 families / 2,345 genera - **Text templates**: 6 levels (species / species_taxonomy / genus / family / bilingual / chinese) - **Descriptions**: 99.8% from eBird (0% LLM-generated) ## Training Recipe Joint training with: - **SigLIP sigmoid contrastive loss** (image-text alignment) - **Species classification head** (λ=1.0) - **Genus classification head** (λ=0.3) - **Family classification head** (λ=0.1) - **Taxonomy-aware hard negative sampling** (same-genus priority) ## Taxonomy Based on eBird/Clements v2025. All order/family names standardized to simplified Chinese with proper suffixes (目/科). ## License Research and educational use.
提供机构:
Hakureirm
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作