lilacai/lilac-textbook_quality_programming
收藏Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-textbook_quality_programming
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac).
Original dataset: [https://huggingface.co/datasets/vikp/textbook_quality_programming](https://huggingface.co/datasets/vikp/textbook_quality_programming)
Lilac dataset config:
```namespace: lilac
name: textbook_quality_programming
source:
dataset_name: vikp/textbook_quality_programming
source_name: huggingface
embeddings:
- path:
- outline
- '*'
embedding: gte-small
- path:
- concepts
- '*'
embedding: gte-small
- path: markdown
embedding: gte-small
signals:
- path:
- outline
- '*'
signal:
signal_name: pii
- path:
- outline
- '*'
signal:
signal_name: text_statistics
- path:
- outline
- '*'
signal:
signal_name: near_dup
- path:
- outline
- '*'
signal:
signal_name: lang_detection
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path:
- outline
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path:
- concepts
- '*'
signal:
signal_name: pii
- path:
- concepts
- '*'
signal:
signal_name: text_statistics
- path:
- concepts
- '*'
signal:
signal_name: near_dup
- path:
- concepts
- '*'
signal:
signal_name: lang_detection
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path:
- concepts
- '*'
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path: markdown
signal:
signal_name: pii
- path: markdown
signal:
signal_name: text_statistics
- path: markdown
signal:
signal_name: near_dup
- path: markdown
signal:
signal_name: lang_detection
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path: markdown
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path:
- outline
- '*'
signal:
signal_name: cluster_dbscan
- path:
- concepts
- '*'
signal:
signal_name: cluster_dbscan
- path: markdown
signal:
signal_name: cluster_dbscan
- path:
- outline
- '*'
signal:
embedding: gte-small
signal_name: cluster_hdbscan
- path:
- concepts
- '*'
signal:
embedding: gte-small
signal_name: cluster_hdbscan
- path: markdown
signal:
embedding: gte-small
signal_name: cluster_hdbscan
settings:
ui:
media_paths:
- - outline
- '*'
- - concepts
- '*'
- markdown
markdown_paths:
- markdown
tags:
- machine-learning
```
提供机构:
lilacai
原始信息汇总
数据集概述
基本信息
- 命名空间: lilac
- 名称: textbook_quality_programming
- 来源:
- 数据集名称: vikp/textbook_quality_programming
- 来源名称: huggingface
嵌入信息
- 路径:
outline/*concepts/*markdown
- 嵌入模型: gte-small
信号信息
- 路径:
outline/*concepts/*markdown
- 信号类型:
- pii
- text_statistics
- near_dup
- lang_detection
- concept_score (涉及多个概念,如legal-termination, negative-sentiment, non-english, positive-sentiment, profanity, question, source-code, toxicity)
- cluster_dbscan
- cluster_hdbscan
设置信息
- UI媒体路径:
outline/*concepts/*markdown
- Markdown路径:
markdown
- 标签:
- machine-learning



