kaggle-notebooks-edu-v0

Name: kaggle-notebooks-edu-v0
Creator: maas
Published: 2025-12-05 16:49:20
License: 暂无描述

魔搭社区2025-12-05 更新2025-09-06 收录

下载链接：

https://modelscope.cn/datasets/data-agents/kaggle-notebooks-edu-v0

下载链接

链接失效反馈

官方服务：

资源简介：

# Kaggle Notebooks LLM Filtered - Model: `meta-llama/Meta-Llama-3.1-70B-Instruct` - Sample: `12,400` - Source dataset: `data-agents/kaggle-notebooks` - Prompt: ``` Below is an extract from a Jupyter notebook. Evaluate whether it has a high analysis value and could help a data scientist. The notebooks are formatted with the following tokens: START <text block> # Here comes markdown content <input block> <lang python> # Here comes python code <output block> # Here comes code output # More blocks END Use the additive 5-point scoring system described below. Points are accumulated based on the satisfaction of each criterion, so stop counting if any of the criteria is not fulfilled: - Add 1 point if the notebook contains valid code, even if it's not educational, like boilerplate code, configs, and niche concepts. - Add another point if the notebook successfully loads a dataset e.g. a CSV or JSON file, even if it lacks further analysis and contains the code outputs. - Award a third point if the notebook runs some analysis on the dataset by running statistics or plotting useful properties, even if they are mostly uncommented. - Give a fourth point if the majority of the notebook contains text between the code cells explaining insights and performing reasoning. - Give a fifth point if the notebook is clean and outstanding in it's analysis, creates insightful, explained plots and contains consistent, multi-step reasoning connected across the whole notebook and gains useful insights from the data. The extract: START {} END After examining the extract: - Briefly justify your total score, up to 100 words. - Conclude with the score using the format: "Educational score: <total points>" where <total points> is just a one digit number. ```

# 经大语言模型（LLM）筛选的Kaggle笔记本数据集 - 模型：`meta-llama/Meta-Llama-3.1-70B-Instruct` - 样本数量：`12,400` - 源数据集：`data-agents/kaggle-notebooks` - 提示词：以下为一段Jupyter笔记本的节选内容，请评估其分析价值高低，是否能够为数据科学家提供有效助力。该类笔记本采用以下Token（Token）进行格式标记： START <text block> # 此处为Markdown内容 <input block> <lang python> # 此处为Python代码 <output block> # 此处为代码输出 # 更多内容块 END 请采用下述加分制5分评分规则，每满足一项评分准则即可获得对应分值，若任意一项准则未达标则停止计分： - 若笔记本包含有效代码（即便不具备教育意义，如样板代码、配置代码及小众领域相关代码），加1分。 - 若笔记本成功加载数据集（如CSV或JSON文件），即便缺乏后续分析且包含代码输出，再加1分。 - 若笔记本通过统计分析或绘制有效属性图表对数据集开展了分析，即便多数内容未添加注释，再授予1分。 - 若笔记本的大部分内容为代码单元间的阐释文本，用于阐述分析结论与开展推理，再授予1分。 - 若笔记本的分析过程清晰出色，绘制了具备阐释性的可视化图表，且全本贯穿连贯的多步推理并从数据中获取了有效洞见，再授予1分。待评估节选内容： START {} END 完成节选内容审阅后： - 简要阐述总得分的评分依据，字数不超过100词。 - 最终得分需遵循以下格式："Educational score: <总得分>"，其中<总得分>为单个阿拉伯数字。

提供机构：

maas

创建时间：

2025-09-04

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集基于Kaggle平台的Jupyter笔记本，通过Meta-Llama-3.1-70B-Instruct模型筛选出12,400个样本，并采用5点评分系统评估笔记本的教育价值，重点关注代码分析、数据加载和洞察解释等方面。数据集以Apache License 2.0许可发布，旨在帮助数据科学家识别高质量的教育资源。

以上内容由遇见数据集搜集并总结生成