five

twangodev/devpost-hacks

收藏
Hugging Face2026-04-28 更新2026-05-03 收录
下载链接:
https://hf-mirror.com/datasets/twangodev/devpost-hacks
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: Devpost Hackathon Projects license: other language: - en size_categories: - 1K<n<10K task_categories: - text-classification - text-generation - summarization tags: - hackathon - devpost - github - projects configs: - config_name: all default: true data_files: - split: train path: data/all/train.parquet - config_name: cal-hacks-12-0 data_files: - split: train path: data/cal-hacks-12-0/train.parquet - config_name: hackgt-12 data_files: - split: train path: data/hackgt-12/train.parquet - config_name: hacktech-by-caltech-2026 data_files: - split: train path: data/hacktech-by-caltech-2026/train.parquet - config_name: madhacks data_files: - split: train path: data/madhacks/train.parquet - config_name: madhacks-fall-2025 data_files: - split: train path: data/madhacks-fall-2025/train.parquet - config_name: pennapps-xxv data_files: - split: train path: data/pennapps-xxv/train.parquet - config_name: treehacks-2024 data_files: - split: train path: data/treehacks-2024/train.parquet - config_name: treehacks-2025 data_files: - split: train path: data/treehacks-2025/train.parquet - config_name: treehacks-2026 data_files: - split: train path: data/treehacks-2026/train.parquet --- # devpost-hacks A collection of hackathon project submissions scraped from [Devpost](https://devpost.com), enriched with the README files of any GitHub repos linked from each project. Intended for **research use only** (e.g. evaluating LLM judges, training project-summarization models, studying what wins hackathons.) ## Configurations | Config | Rows | Winners | With READMEs | |---|---:|---:|---:| | `all` (default) | 2222 | 363 | 1358 | | `cal-hacks-12-0` | 694 | 100 | 352 | | `hackgt-12` | 272 | 25 | 171 | | `hacktech-by-caltech-2026` | 61 | 0 | 46 | | `madhacks` | 55 | 10 | 32 | | `madhacks-fall-2025` | 111 | 13 | 77 | | `pennapps-xxv` | 98 | 25 | 63 | | `treehacks-2024` | 319 | 58 | 192 | | `treehacks-2025` | 248 | 70 | 185 | | `treehacks-2026` | 364 | 62 | 240 | `hacktech-by-caltech-2026` has zero winners because results were not yet announced on Devpost at scrape time. ## Schema | Field | Type | Notes | |---|---|---| | `project_id` | string | First 12 hex chars of `sha1(url)` | | `hackathon` | string | Hackathon slug (matches the config name) | | `url` | string | Devpost project URL | | `title` | string | | | `tagline` | string | One-line summary on Devpost | | `description` | string | Full Devpost project write-up | | `built_with` | list&lt;string&gt; | Tags the team used (`react`, `python`, ...) | | `video_link` | string | Demo video URL if any | | `other_links` | list&lt;string&gt; | All non-video links from the project page | | `results` | string | Award label (`Winner`, `Did Not Place`, etc.) or `null` | | `is_winner` | bool | True iff Devpost flagged any award as a "winner" | | `readmes` | list&lt;struct&gt; | GitHub READMEs for repos linked under `other_links` | The `readmes` struct fields: - `repo` — `owner/repo` - `content` — README markdown, truncated to 6000 chars (with `[... truncated]` marker) - `truncated` — bool ## Loading ```python from datasets import load_dataset ds = load_dataset("twangodev/devpost-hacks") # all hackathons ds = load_dataset("twangodev/devpost-hacks", "treehacks-2026") # one hackathon ``` ## Sources & licensing This dataset aggregates content from two upstream sources, neither of which has a single uniform license: 1. Devpost project pages - Project text (title, tagline, description, links, results) is authored by the participating teams. We include this content here under a fair-use / research-use rationale. 2. README files: Each README inherits the license of its repository. Because of (2) in particular, we cannot apply a uniform open license to this dataset. ## Takedown & Removal Requests If you are the author of a project (or a maintainer of a linked repo) and would like your content removed from this dataset, email **contact@twango.dev** with the project URL or repo slug. We will remove it from the next release.
提供机构:
twangodev
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作