five

Stack Overflow Dataset

收藏
Databricks2025-06-07 收录
下载链接:
https://marketplace.databricks.com/details/84a5b666-a4ce-434a-bbeb-02d6cba62121/Stack-Overflow-Knowledge-Solutions_Stack-Overflow-Dataset
下载链接
链接失效反馈
官方服务:
资源简介:
**Overview** Millions of the world's developers and technologists visit Stack Overflow to ask questions, learn, and share technical knowledge, making it the most complete, accurate source of human-verified technical knowledge on the internet: - 16+ years of accurate, high-quality, and trusted technical knowledge. - 60 million+ questions and answers to date. - 69,000 technology tags used to organize content. - 92% of developers visit Stack Overflow regularly. Improve the performance of your chatbots, agents, and other AI solutions with Stack Overflow’s community-validated data: - Strict moderation policies and rich feedback signals from Stack Overflow’s users and moderators provide a reliable source of truth. - Top-class technical expertise and experience, expressed in natural language, is ideal for LLM training, improving RAG performance, and more. - 145+ Stack Exchange sites across a range of topics—including software engineering, math, DIY, and more—support fine-tuning. **Use cases** **LLM Fine-Tuning and Pre-Training** - Use high-quality, expert-vetted question-and-answer pairs from any Stack Exchange site to fine-tune your model on technical domains. **Retrieval-Augmented Generation (RAG)** - Ingest Stack Overflow data as a knowledge base for grounded technical answers in real-time. - Reduce hallucinations in developer-facing copilots or support bots. **Agentic Systems** - Power decision trees and logic flows with Stack Overflow’s reasoning data. **Search and Knowledge Graph Enrichment** - Improve relevance in technical search engines by embedding Stack Overflow data with your internal datasets. **Product details** Tables - **Comments:** Comments on Questions and Answers on a given Stack Exchange Site - **PostLinks:** Link to Posts from Stack Exchange sites to facilitate attribution of content - **Posts:** Questions and Answers to Questions on a Stack Exchange Site - **PostHistory:** Content changes to Questions and Answers on a Stack Exchange Site - **Tags:** Tags on a Stack Exchange Site - **Votes:** Votes on Posts on a Stack Exchange Site **Contact us for more information** The entire Stack Overflow corpus or a tailored subset is available with options that suit your specific needs. Reach out to us for more information.
提供机构:
Stack Overflow Knowledge Solutions
搜集汇总
数据集介绍
main_image_url
背景与挑战
背景概述
该数据集包含Stack Overflow平台16年积累的6000万+技术问答数据,覆盖69,000个技术标签,包含Posts、Tags、Votes等6类结构化数据表。主要适用于LLM微调、RAG增强、知识图谱构建等技术场景,数据每30天更新一次。
以上内容由遇见数据集搜集并总结生成
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作