five

zhangdw/astra-skills

收藏
Hugging Face2026-04-17 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/zhangdw/astra-skills
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: apache-2.0 task_categories: - text-generation language: - en tags: - agent-skills - ai-agents - tool-use - claude-code - skill-discovery - github-crawling pretty_name: Astra Skills Collection size_categories: - 100K<n<1M --- # Astra Skills Collection ## Skill Count: **148,134** Snapshot of deduplicated AI agent skills crawled via Astra on **2026-04-15**. ## Snapshot Summary - **Total deduplicated skills (`skills.db`)**: **148,134** - **Source labels in DB**: **github only** - **Crawl time range (UTC, from DB)**: `2026-04-15 09:10:28` → `2026-04-15 11:59:38` ## What This Dataset Contains Each saved skill directory is copied from a GitHub repository path that contains `SKILL.md`. Typical structure: ``` {source}_{owner}_{skill_name}/ ├── SKILL.md ├── _meta.json └── [optional files, e.g. scripts/] ``` `_meta.json` stores crawl metadata such as source site, repo URL, owner, repo, skill name, and relative path. ## Crawl Pipeline (Current) 1. Discover GitHub repositories from skill index websites (`skills.sh`, `skillstore.io`, `agent-skills.md`) plus configured seed repos. 2. Clone discovered repositories. 3. Recursively detect directories containing `SKILL.md`. 4. Copy skill directories into this dataset layout. 5. Deduplicate by MD5 hash of `SKILL.md` content in `skills.db`. > Note: website index coverage does not guarantee 100% successful retrieval of every listed skill, because repository accessibility and content availability can vary. ## Intended Use - Research on tool-use and instruction-following behavior in coding agents - Skill retrieval, ranking, and composition experiments - Analysis of real-world agent skill ecosystems ## Download & Extract The archive is split into multiple parts for reliable transfer. After downloading, merge and extract with: ```bash # Download the dataset huggingface-cli download zhangdw/astra-skills --repo-type dataset --local-dir astra-skills # Merge split parts and extract cat astra-skills/astra-skills-part-*.tar.gz > skills_github.tar.gz tar xzf skills_github.tar.gz ``` This produces a `github/` directory containing all skill directories. ## Authors - Dawei Zhang (GitHub: zhangdw156) ## Citation If you use this dataset in research, please cite: ```bibtex @misc{dawei_zhang_2026, author = { Dawei Zhang }, title = { astra-skills (Revision 146eb8c) }, year = 2026, url = { https://huggingface.co/datasets/zhangdw/astra-skills }, doi = { 10.57967/hf/8399 }, publisher = { Hugging Face } } ``` ## License Dataset metadata and packaging are under Apache-2.0. Individual skill contents remain subject to their original repository licenses.
提供机构:
zhangdw
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作