five

Azka20/Projects

收藏
Hugging Face2025-12-10 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Azka20/Projects
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification - time-series-forecasting language: - en tags: - github - trending - repositories - software-engineering - popularity - time-series size_categories: - 100K<n<1M configs: - config_name: monthly data_files: "monthly/data.csv" default: true - config_name: full data_files: "full/data.csv" --- # GitHub Trending Projects (2013-2025) A comprehensive dataset of **423,098 GitHub trending repository entries** spanning **12+ years** (August 2013 - November 2025), scraped from Wayback Machine snapshots of GitHub's trending page. ## 🎯 Dataset Overview This dataset captures the evolution of GitHub's trending repositories over time, providing insights into: - **Software development trends** across programming languages and domains - **Popular open-source projects** and their trending patterns - **Community interests** and shifts in developer focus over 12 years - **Viral repository dynamics** and sustained popularity patterns **Key Statistics:** - 📊 **423,098** trending repository entries - 🗂️ **14,500** unique repositories - 📅 **128 months** of coverage (2013-08 to 2025-11) - ⭐ **89.8%** scraping success rate from Wayback Machine - 🏆 **Pre-processed monthly rankings** with weighted scoring ## 🔧 Dataset Configurations This dataset has **two configurations** defined in the YAML header: ### Configuration: `monthly` (Default) Top 25 repositories per month with 3,200 entries ```python from datasets import load_dataset ds = load_dataset('ronantakizawa/github-top-projects', 'monthly') ``` **Columns:** - `month` (string): Month (YYYY-MM) - `rank` (int): Monthly rank (1-25) - `repository` (string): Full repository name (owner/name) - `repo_owner` (string): Repository owner - `repo_name` (string): Repository name - `star_count` (int): Maximum recorded stars - `fork_count` (int): Maximum recorded forks - `ranking_appearances` (int): Times appeared in trending that month ### Configuration: `full` Complete daily trending data with 423,098 entries ```python from datasets import load_dataset ds = load_dataset('ronantakizawa/github-top-projects', 'full') ``` **Columns:** - `name` (string): Repository name - `star_count` (int): Star count (may be empty for pre-2020) - `fork_count` (int): Fork count (may be empty for pre-2020) - `repo_owner` (string): Repository owner/organization - `rank` (int): Position in trending (1-25) - `date` (string): Snapshot date (YYYY-MM-DD) ## 🏆 Scoring Methodology Monthly rankings use a **weighted frequency and position-based scoring system**: ``` Score = Σ (25 - rank + 1) for each trending appearance Where: - Rank 1 → 25 points - Rank 2 → 24 points - ... - Rank 25 → 1 point ``` **Example:** - Project appears 10 times at rank #1 → 250 points - Project appears 20 times at rank #10 → 320 points (higher ranked!) This rewards both **consistency** (frequent appearances) and **position** (higher ranks). ## 📊 Data Collection **Source:** GitHub Trending page via Wayback Machine (web.archive.org) **Period:** August 21, 2013 - November 30, 2025 **Method:** Python web scraping with BeautifulSoup **Snapshots:** 17,127 successfully scraped from 19,064 available **Retry Logic:** Up to 15 retries with exponential backoff ## 🌟 Key Insights ### 1. All-Time Top 10 Projects (2013-2025) | Rank | Repository | Total Score | Months in Top 25 | Total Trending Days | Best Rank | |------|------------|-------------|------------------|---------------------|-----------| | 1 | **TheAlgorithms/Python** | 379 | 24 (2.0 years) | 1,383 | #1 | | 2 | **tensorflow/tensorflow** | 322 | 20 (1.7 years) | 88 | #1 | | 3 | **jwasham/coding-interview-university** | 295 | 21 (1.8 years) | 1,254 | #1 | | 4 | **public-apis/public-apis** | 279 | 18 (1.5 years) | 937 | #1 | | 5 | **donnemartin/system-design-primer** | 249 | 18 (1.5 years) | 727 | #1 | | 6 | **EbookFoundation/free-programming-books** | 237 | 17 (1.4 years) | 772 | #1 | | 7 | **FreeCodeCamp/FreeCodeCamp** | 229 | 10 (0.8 years) | 41 | #1 | | 8 | **freeCodeCamp/freeCodeCamp** | 228 | 12 (1.0 years) | 408 | #1 | | 9 | **trekhleb/javascript-algorithms** | 228 | 15 (1.2 years) | 692 | #1 | | 10 | **kamranahmedse/developer-roadmap** | 189 | 15 (1.2 years) | 495 | #1 | **Notable Pattern:** Educational resources dominate all-time rankings. 8 of the top 10 are learning resources (algorithms, interview prep, system design, free books). ### 2. Recent Champions (2024-2025) #### Monthly Winners | Month | Winner | Days Trending | Current Stars | Theme | |-------|--------|---------------|---------------|-------| | **2025-11** | google/adk-go | 55 | 5,494 | AI Development Kit | | **2025-10** | Stremio/stremio-web | 45 | 7,288 | Streaming Platform | | **2025-09** | microsoft/markitdown | 63 | 79,395 | Markdown Converter | | **2025-08** | simstudioai/sim | 36 | 17,812 | AI Simulation | | **2025-07** | NanmiCoder/MediaCrawler | 32 | 28,058 | Media Scraping | | **2024-12** | lobehub/lobe-chat | 36 | 66,763 | AI Chat Interface | | **2024-11** | abi/screenshot-to-code | 40 | 67,774 | AI Code Generation | | **2024-10** | TheAlgorithms/Python | 34 | 212,762 | Algorithm Learning | #### Top New Projects (First Appeared 2024+) 361 new projects entered the top 25 in 2024-2025: | Rank | Repository | Score | Months | Theme | |------|------------|-------|--------|-------| | 1 | **virattt/ai-hedge-fund** | 96 | 5 | AI Finance Tools | | 2 | **microsoft/markitdown** | 72 | 3 | Document Conversion | | 3 | **hacksider/Deep-Live-Cam** | 68 | 3 | AI Video Processing | | 4 | **harry0703/MoneyPrinterTurbo** | 67 | 4 | AI Content Generation | | 5 | **Shubhamsaboo/awesome-llm-apps** | 66 | 4 | LLM Applications | ### 3. Era Analysis: Technology Trend Shifts #### 2013-2014: Web Framework Era - **Dominant:** Bootstrap, Angular.js, jQuery - **Top 3:** `twbs/bootstrap` (84), `atom/atom` (75), `angular/angular.js` (63) - **Trend:** Frontend frameworks and UI libraries ruled #### 2015-2017: Framework Wars - **Dominant:** FreeCodeCamp, TensorFlow, Vue.js - **2016 Champion:** `FreeCodeCamp/FreeCodeCamp` (220 score, 9 months at #1) - **2017 Champion:** `tensorflow/tensorflow` (213 score, 12 months presence) - **Trend:** Education platforms + rise of ML frameworks #### 2018-2019: Algorithm Renaissance - **Dominant:** Educational algorithm repositories - **Top:** `trekhleb/javascript-algorithms`, `Snailclimb/JavaGuide` - **Viral Hit:** `996icu/996.ICU` (148 days trending in April 2019) - **Trend:** Shift from tools to learning resources #### 2020-2021: Learning Platform Dominance - **Dominant:** Interview prep and public APIs - **COVID Impact:** `CSSEGISandData/COVID-19` (356 days trending in March 2020!) - **Top:** `public-apis/public-apis` (114 score in 2021) - **Trend:** Remote work drove demand for learning materials #### 2022-2023: AI/ML Explosion - **Q3 2022:** Stable Diffusion era (`AUTOMATIC1111/stable-diffusion-webui`) - **Q4 2022 - 2023:** ChatGPT impact (`f/awesome-chatgpt-prompts`: 113 days in Dec 2022) - **Top AI Projects:** `AntonOsika/gpt-engineer`, `imartinez/privateGPT`, `xtekky/gpt4free` - **Trend:** Generative AI democratization #### 2024-2025: Specialized AI Tools Era - **Dominant:** Practical AI applications - **Top:** `codecrafters-io/build-your-own-x` (82 score), `lobehub/lobe-chat` (72) - **Microsoft Surge:** 87 Microsoft repos appeared (46 unique projects) - **Trend:** From AI experimentation to production tools ### 4. Viral Phenomenon: Record-Breaking Trending Periods **Most Trending Days in a Single Month:** 1. **CSSEGISandData/COVID-19** - 356 days (March 2020) - *COVID data tracker* 2. **denoland/deno** - 205 days (May 2020) - *Node.js alternative* 3. **TheAlgorithms/Python** - 196 days (May 2020) - *Algorithm implementations* 4. **TheAlgorithms/Python** - 186 days (May 2019) 5. **jackfrued/Python-100-Days** - 179 days (May 2019) - *Python tutorial* **Insight:** March-May 2020 saw unprecedented trending activity due to COVID-19 lockdowns and remote work transition. ### 5. Top Organizations & Developers #### Most Prolific Organizations (Unique Repos in Top 25) | Organization | Appearances | Unique Repos | Notable Projects | |--------------|-------------|--------------|------------------| | **microsoft** | 87 | 46 | generative-ai-for-beginners, Web-Dev-For-Beginners | | **google** | 43 | 30 | googletest, adk-go, tensorflow | | **TheAlgorithms** | 35 | 5 | Python, Java, JavaScript, C++, Go | | **tensorflow** | 22 | 3 | tensorflow, models, examples | | **facebook** | 21 | 13 | react, react-native, nuclide | #### Consistent Individual Developers | Developer | Projects | Months | Key Work | |-----------|----------|--------|----------| | **jwasham** | 2 | 24 | coding-interview-university | | **trekhleb** | 3 | 18 | javascript-algorithms, homemade-machine-learning | | **donnemartin** | 1 | 18 | system-design-primer | | **kamranahmedse** | 2 | 16 | developer-roadmap | | **sindresorhus** | 3 | 15 | awesome, quick-look-plugins | ### 6. Project Categories: What Trends on GitHub? **Educational Resources** (35% of top 25) - Algorithm learning: TheAlgorithms/*, trekhleb/javascript-algorithms - Interview prep: jwasham/coding-interview-university, yangshun/tech-interview-handbook - Learning paths: kamranahmedse/developer-roadmap, EbookFoundation/free-programming-books - Courses: microsoft/generative-ai-for-beginners, microsoft/Web-Dev-For-Beginners **Development Tools** (25% of top 25) - Code editors: atom/atom, microsoft/vscode - Build tools: codecrafters-io/build-your-own-x - APIs: public-apis/public-apis **AI/ML Projects** (20% of top 25, surging in 2024-2025) - Chat interfaces: lobehub/lobe-chat, abi/screenshot-to-code - Generation tools: AUTOMATIC1111/stable-diffusion-webui, hacksider/Deep-Live-Cam - LLM applications: Shubhamsaboo/awesome-llm-apps, virattt/ai-hedge-fund **Frameworks** (15% of top 25) - Frontend: vuejs/vue, facebook/react, twbs/bootstrap - Backend: tensorflow/tensorflow, flutter/flutter **Utilities & Curations** (5% of top 25) - Awesome lists: sindresorhus/awesome - Tool collections: Z4nzu/hackingtool ### 7. Longevity vs. Virality **Longevity Leaders** (Most Months in Top 25): - `TheAlgorithms/Python`: 24 months (2.0 years) - `jwasham/coding-interview-university`: 21 months (1.8 years) - `tensorflow/tensorflow`: 20 months (1.7 years) **Viral One-Hit Wonders** (High trending days, short duration): - `CSSEGISandData/COVID-19`: 356 days in 1 month, then disappeared - `996icu/996.ICU`: 148 days in 1 month (April 2019 protest) - `kelseyhightower/nocode`: 75 score across 3 months (2018), then gone **Pattern:** Educational resources sustain; news/events spike and fade.
提供机构:
Azka20
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作