five

lilacai/mosaicml-local-mosaic-instruct-v3

收藏
Hugging Face2023-08-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/mosaicml-local-mosaic-instruct-v3
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/mosaicml](https://huggingface.co/spaces/lilacai/mosaicml). Original dataset: [https://huggingface.co/datasets/mosaicml/instruct-v3](https://huggingface.co/datasets/mosaicml/instruct-v3) Lilac dataset config: ```embeddings: - {embedding: gte-small, path: response} - {embedding: gte-small, path: prompt} name: mosaic-instruct-v3 namespace: local settings: preferred_embedding: gte-small ui: media_paths: [prompt, response] signals: - path: prompt signal: {signal_name: pii} - path: prompt signal: {signal_name: text_statistics} - path: prompt signal: {signal_name: near_dup} - path: prompt signal: {signal_name: lang_detection} - path: prompt signal: {concept_name: negative-sentiment, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: non-english, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: profanity, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: source-code, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: toxicity, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {signal_name: pii} - path: response signal: {signal_name: text_statistics} - path: response signal: {signal_name: near_dup} - path: response signal: {signal_name: lang_detection} - path: response signal: {concept_name: negative-sentiment, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: non-english, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: profanity, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: source-code, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: toxicity, embedding: gte-small, namespace: lilac, signal_name: concept_score} source: {dataset_name: mosaicml/instruct-v3, source_name: huggingface} ```
提供机构:
lilacai
原始信息汇总

数据集概述

数据集配置

  • 嵌入配置:
    • 嵌入类型: gte-small
    • 路径: responseprompt
  • 名称: mosaic-instruct-v3
  • 命名空间: local
  • 设置:
    • 首选嵌入: gte-small
    • UI媒体路径: [prompt, response]

信号配置

  • 路径: prompt

    • 信号: pii
    • 信号: text_statistics
    • 信号: near_dup
    • 信号: lang_detection
    • 概念信号: negative-sentiment
    • 概念信号: non-english
    • 概念信号: profanity
    • 概念信号: source-code
    • 概念信号: toxicity
  • 路径: response

    • 信号: pii
    • 信号: text_statistics
    • 信号: near_dup
    • 信号: lang_detection
    • 概念信号: negative-sentiment
    • 概念信号: non-english
    • 概念信号: profanity
    • 概念信号: source-code
    • 概念信号: toxicity

数据源

  • 数据集名称: mosaicml/instruct-v3
  • 来源名称: huggingface
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作