lilacai/lilac-stanford-alpaca
收藏Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-stanford-alpaca
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac).
Lilac dataset config:
```namespace: lilac
name: stanford-alpaca
source:
filepaths:
- https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
source_name: json
embeddings:
- path: output
embedding: gte-small
- path: instruction
embedding: gte-small
- path: input
embedding: gte-small
signals:
- path: output
signal:
signal_name: pii
- path: output
signal:
signal_name: text_statistics
- path: output
signal:
signal_name: near_dup
- path: output
signal:
signal_name: lang_detection
- path: output
signal:
embedding: gte-small
signal_name: cluster_hdbscan
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path: output
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path: instruction
signal:
signal_name: pii
- path: instruction
signal:
signal_name: text_statistics
- path: instruction
signal:
signal_name: near_dup
- path: instruction
signal:
signal_name: lang_detection
- path: instruction
signal:
embedding: gte-small
signal_name: cluster_hdbscan
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path: instruction
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path: input
signal:
signal_name: pii
- path: input
signal:
signal_name: text_statistics
- path: input
signal:
signal_name: near_dup
- path: input
signal:
signal_name: lang_detection
- path: input
signal:
embedding: gte-small
signal_name: cluster_hdbscan
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path: input
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
settings:
ui:
media_paths:
- output
- instruction
- input
markdown_paths: []
```
提供机构:
lilacai
原始信息汇总
数据集概述
命名空间和名称
- 命名空间: lilac
- 名称: stanford-alpaca
数据源
- 文件路径:
- https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
- 源类型: json
嵌入信息
- 路径: output
- 嵌入类型: gte-small
- 路径: instruction
- 嵌入类型: gte-small
- 路径: input
- 嵌入类型: gte-small
信号信息
-
路径: output
- 信号名称: pii
- 信号名称: text_statistics
- 信号名称: near_dup
- 信号名称: lang_detection
- 信号名称: cluster_hdbscan
- 嵌入类型: gte-small
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: legal-termination
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: negative-sentiment
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: non-english
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: positive-sentiment
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: profanity
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: question
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: source-code
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: toxicity
-
路径: instruction
- 信号名称: pii
- 信号名称: text_statistics
- 信号名称: near_dup
- 信号名称: lang_detection
- 信号名称: cluster_hdbscan
- 嵌入类型: gte-small
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: legal-termination
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: negative-sentiment
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: non-english
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: positive-sentiment
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: profanity
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: question
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: source-code
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: toxicity
-
路径: input
- 信号名称: pii
- 信号名称: text_statistics
- 信号名称: near_dup
- 信号名称: lang_detection
- 信号名称: cluster_hdbscan
- 嵌入类型: gte-small
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: legal-termination
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: negative-sentiment
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: non-english
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: positive-sentiment
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: profanity
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: question
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: source-code
- 信号名称: concept_score
- 嵌入类型: gte-small
- 命名空间: lilac
- 概念名称: toxicity
设置
- UI媒体路径:
- output
- instruction
- input
- Markdown路径: []



