zhangdw/astra-skills
收藏Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zhangdw/astra-skills
下载链接
链接失效反馈官方服务:
资源简介:
---
license: apache-2.0
task_categories:
- text-generation
language:
- en
tags:
- agent-skills
- ai-agents
- tool-use
- claude-code
- skill-discovery
- github-crawling
pretty_name: Astra Skills Collection
size_categories:
- 100K<n<1M
---
# Astra Skills Collection
## Skill Count: **148,134**
Snapshot of deduplicated AI agent skills crawled via Astra on **2026-04-15**.
## Snapshot Summary
- **Total deduplicated skills (`skills.db`)**: **148,134**
- **Source labels in DB**: **github only**
- **Crawl time range (UTC, from DB)**: `2026-04-15 09:10:28` → `2026-04-15 11:59:38`
## What This Dataset Contains
Each saved skill directory is copied from a GitHub repository path that contains `SKILL.md`.
Typical structure:
```
{source}_{owner}_{skill_name}/
├── SKILL.md
├── _meta.json
└── [optional files, e.g. scripts/]
```
`_meta.json` stores crawl metadata such as source site, repo URL, owner, repo, skill name, and relative path.
## Crawl Pipeline (Current)
1. Discover GitHub repositories from skill index websites (`skills.sh`, `skillstore.io`, `agent-skills.md`) plus configured seed repos.
2. Clone discovered repositories.
3. Recursively detect directories containing `SKILL.md`.
4. Copy skill directories into this dataset layout.
5. Deduplicate by MD5 hash of `SKILL.md` content in `skills.db`.
> Note: website index coverage does not guarantee 100% successful retrieval of every listed skill, because repository accessibility and content availability can vary.
## Intended Use
- Research on tool-use and instruction-following behavior in coding agents
- Skill retrieval, ranking, and composition experiments
- Analysis of real-world agent skill ecosystems
## Download & Extract
The archive is split into multiple parts for reliable transfer. After downloading, merge and extract with:
```bash
# Download the dataset
huggingface-cli download zhangdw/astra-skills --repo-type dataset --local-dir astra-skills
# Merge split parts and extract
cat astra-skills/astra-skills-part-*.tar.gz > skills_github.tar.gz
tar xzf skills_github.tar.gz
```
This produces a `github/` directory containing all skill directories.
## Authors
- Dawei Zhang (GitHub: zhangdw156)
## Citation
If you use this dataset in research, please cite:
```bibtex
@misc{dawei_zhang_2026,
author = { Dawei Zhang },
title = { astra-skills (Revision 146eb8c) },
year = 2026,
url = { https://huggingface.co/datasets/zhangdw/astra-skills },
doi = { 10.57967/hf/8399 },
publisher = { Hugging Face }
}
```
## License
Dataset metadata and packaging are under Apache-2.0. Individual skill contents remain subject to their original repository licenses.
提供机构:
zhangdw



