lilacai/mosaicml-local-mosaic-instruct-v3
收藏Hugging Face2023-08-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/mosaicml-local-mosaic-instruct-v3
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/mosaicml](https://huggingface.co/spaces/lilacai/mosaicml).
Original dataset: [https://huggingface.co/datasets/mosaicml/instruct-v3](https://huggingface.co/datasets/mosaicml/instruct-v3)
Lilac dataset config:
```embeddings:
- {embedding: gte-small, path: response}
- {embedding: gte-small, path: prompt}
name: mosaic-instruct-v3
namespace: local
settings:
preferred_embedding: gte-small
ui:
media_paths: [prompt, response]
signals:
- path: prompt
signal: {signal_name: pii}
- path: prompt
signal: {signal_name: text_statistics}
- path: prompt
signal: {signal_name: near_dup}
- path: prompt
signal: {signal_name: lang_detection}
- path: prompt
signal: {concept_name: negative-sentiment, embedding: gte-small, namespace: lilac,
signal_name: concept_score}
- path: prompt
signal: {concept_name: non-english, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: prompt
signal: {concept_name: profanity, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: prompt
signal: {concept_name: source-code, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: prompt
signal: {concept_name: toxicity, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: response
signal: {signal_name: pii}
- path: response
signal: {signal_name: text_statistics}
- path: response
signal: {signal_name: near_dup}
- path: response
signal: {signal_name: lang_detection}
- path: response
signal: {concept_name: negative-sentiment, embedding: gte-small, namespace: lilac,
signal_name: concept_score}
- path: response
signal: {concept_name: non-english, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: response
signal: {concept_name: profanity, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: response
signal: {concept_name: source-code, embedding: gte-small, namespace: lilac, signal_name: concept_score}
- path: response
signal: {concept_name: toxicity, embedding: gte-small, namespace: lilac, signal_name: concept_score}
source: {dataset_name: mosaicml/instruct-v3, source_name: huggingface}
```
提供机构:
lilacai
原始信息汇总
数据集概述
数据集配置
- 嵌入配置:
- 嵌入类型:
gte-small - 路径:
response和prompt
- 嵌入类型:
- 名称:
mosaic-instruct-v3 - 命名空间:
local - 设置:
- 首选嵌入:
gte-small - UI媒体路径:
[prompt, response]
- 首选嵌入:
信号配置
-
路径:
prompt- 信号:
pii - 信号:
text_statistics - 信号:
near_dup - 信号:
lang_detection - 概念信号:
negative-sentiment - 概念信号:
non-english - 概念信号:
profanity - 概念信号:
source-code - 概念信号:
toxicity
- 信号:
-
路径:
response- 信号:
pii - 信号:
text_statistics - 信号:
near_dup - 信号:
lang_detection - 概念信号:
negative-sentiment - 概念信号:
non-english - 概念信号:
profanity - 概念信号:
source-code - 概念信号:
toxicity
- 信号:
数据源
- 数据集名称:
mosaicml/instruct-v3 - 来源名称:
huggingface



