five

lilacai/lilac-textbook_quality_programming

收藏
Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-textbook_quality_programming
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac). Original dataset: [https://huggingface.co/datasets/vikp/textbook_quality_programming](https://huggingface.co/datasets/vikp/textbook_quality_programming) Lilac dataset config: ```namespace: lilac name: textbook_quality_programming source: dataset_name: vikp/textbook_quality_programming source_name: huggingface embeddings: - path: - outline - '*' embedding: gte-small - path: - concepts - '*' embedding: gte-small - path: markdown embedding: gte-small signals: - path: - outline - '*' signal: signal_name: pii - path: - outline - '*' signal: signal_name: text_statistics - path: - outline - '*' signal: signal_name: near_dup - path: - outline - '*' signal: signal_name: lang_detection - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: - outline - '*' signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: - concepts - '*' signal: signal_name: pii - path: - concepts - '*' signal: signal_name: text_statistics - path: - concepts - '*' signal: signal_name: near_dup - path: - concepts - '*' signal: signal_name: lang_detection - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: - concepts - '*' signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: markdown signal: signal_name: pii - path: markdown signal: signal_name: text_statistics - path: markdown signal: signal_name: near_dup - path: markdown signal: signal_name: lang_detection - path: markdown signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: markdown signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: - outline - '*' signal: signal_name: cluster_dbscan - path: - concepts - '*' signal: signal_name: cluster_dbscan - path: markdown signal: signal_name: cluster_dbscan - path: - outline - '*' signal: embedding: gte-small signal_name: cluster_hdbscan - path: - concepts - '*' signal: embedding: gte-small signal_name: cluster_hdbscan - path: markdown signal: embedding: gte-small signal_name: cluster_hdbscan settings: ui: media_paths: - - outline - '*' - - concepts - '*' - markdown markdown_paths: - markdown tags: - machine-learning ```
提供机构:
lilacai
原始信息汇总

数据集概述

基本信息

  • 命名空间: lilac
  • 名称: textbook_quality_programming
  • 来源:
    • 数据集名称: vikp/textbook_quality_programming
    • 来源名称: huggingface

嵌入信息

  • 路径:
    • outline/*
    • concepts/*
    • markdown
  • 嵌入模型: gte-small

信号信息

  • 路径:
    • outline/*
    • concepts/*
    • markdown
  • 信号类型:
    • pii
    • text_statistics
    • near_dup
    • lang_detection
    • concept_score (涉及多个概念,如legal-termination, negative-sentiment, non-english, positive-sentiment, profanity, question, source-code, toxicity)
    • cluster_dbscan
    • cluster_hdbscan

设置信息

  • UI媒体路径:
    • outline/*
    • concepts/*
    • markdown
  • Markdown路径:
    • markdown
  • 标签:
    • machine-learning
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作