five

kaggle-notebooks-edu-v0

收藏
魔搭社区2025-12-05 更新2025-09-06 收录
下载链接:
https://modelscope.cn/datasets/data-agents/kaggle-notebooks-edu-v0
下载链接
链接失效反馈
官方服务:
资源简介:
# Kaggle Notebooks LLM Filtered - Model: `meta-llama/Meta-Llama-3.1-70B-Instruct` - Sample: `12,400` - Source dataset: `data-agents/kaggle-notebooks` - Prompt: ``` Below is an extract from a Jupyter notebook. Evaluate whether it has a high analysis value and could help a data scientist. The notebooks are formatted with the following tokens: START <text block> # Here comes markdown content <input block> <lang python> # Here comes python code <output block> # Here comes code output # More blocks END Use the additive 5-point scoring system described below. Points are accumulated based on the satisfaction of each criterion, so stop counting if any of the criteria is not fulfilled: - Add 1 point if the notebook contains valid code, even if it's not educational, like boilerplate code, configs, and niche concepts. - Add another point if the notebook successfully loads a dataset e.g. a CSV or JSON file, even if it lacks further analysis and contains the code outputs. - Award a third point if the notebook runs some analysis on the dataset by running statistics or plotting useful properties, even if they are mostly uncommented. - Give a fourth point if the majority of the notebook contains text between the code cells explaining insights and performing reasoning. - Give a fifth point if the notebook is clean and outstanding in it's analysis, creates insightful, explained plots and contains consistent, multi-step reasoning connected across the whole notebook and gains useful insights from the data. The extract: START {} END After examining the extract: - Briefly justify your total score, up to 100 words. - Conclude with the score using the format: "Educational score: <total points>" where <total points> is just a one digit number. ```

# 经大语言模型(LLM)筛选的Kaggle笔记本数据集 - 模型:`meta-llama/Meta-Llama-3.1-70B-Instruct` - 样本数量:`12,400` - 源数据集:`data-agents/kaggle-notebooks` - 提示词: 以下为一段Jupyter笔记本的节选内容,请评估其分析价值高低,是否能够为数据科学家提供有效助力。 该类笔记本采用以下Token(Token)进行格式标记: START <text block> # 此处为Markdown内容 <input block> <lang python> # 此处为Python代码 <output block> # 此处为代码输出 # 更多内容块 END 请采用下述加分制5分评分规则,每满足一项评分准则即可获得对应分值,若任意一项准则未达标则停止计分: - 若笔记本包含有效代码(即便不具备教育意义,如样板代码、配置代码及小众领域相关代码),加1分。 - 若笔记本成功加载数据集(如CSV或JSON文件),即便缺乏后续分析且包含代码输出,再加1分。 - 若笔记本通过统计分析或绘制有效属性图表对数据集开展了分析,即便多数内容未添加注释,再授予1分。 - 若笔记本的大部分内容为代码单元间的阐释文本,用于阐述分析结论与开展推理,再授予1分。 - 若笔记本的分析过程清晰出色,绘制了具备阐释性的可视化图表,且全本贯穿连贯的多步推理并从数据中获取了有效洞见,再授予1分。 待评估节选内容: START {} END 完成节选内容审阅后: - 简要阐述总得分的评分依据,字数不超过100词。 - 最终得分需遵循以下格式:"Educational score: <总得分>",其中<总得分>为单个阿拉伯数字。
提供机构:
maas
创建时间:
2025-09-04
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作