five

lilacai/lilac-stanford-alpaca

收藏
Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-stanford-alpaca
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac). Lilac dataset config: ```namespace: lilac name: stanford-alpaca source: filepaths: - https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json source_name: json embeddings: - path: output embedding: gte-small - path: instruction embedding: gte-small - path: input embedding: gte-small signals: - path: output signal: signal_name: pii - path: output signal: signal_name: text_statistics - path: output signal: signal_name: near_dup - path: output signal: signal_name: lang_detection - path: output signal: embedding: gte-small signal_name: cluster_hdbscan - path: output signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: output signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: instruction signal: signal_name: pii - path: instruction signal: signal_name: text_statistics - path: instruction signal: signal_name: near_dup - path: instruction signal: signal_name: lang_detection - path: instruction signal: embedding: gte-small signal_name: cluster_hdbscan - path: instruction signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: instruction signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: input signal: signal_name: pii - path: input signal: signal_name: text_statistics - path: input signal: signal_name: near_dup - path: input signal: signal_name: lang_detection - path: input signal: embedding: gte-small signal_name: cluster_hdbscan - path: input signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: input signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score settings: ui: media_paths: - output - instruction - input markdown_paths: [] ```
提供机构:
lilacai
原始信息汇总

数据集概述

命名空间和名称

  • 命名空间: lilac
  • 名称: stanford-alpaca

数据源

  • 文件路径:
    • https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/main/alpaca_data.json
  • 源类型: json

嵌入信息

  • 路径: output
    • 嵌入类型: gte-small
  • 路径: instruction
    • 嵌入类型: gte-small
  • 路径: input
    • 嵌入类型: gte-small

信号信息

  • 路径: output

    • 信号名称: pii
    • 信号名称: text_statistics
    • 信号名称: near_dup
    • 信号名称: lang_detection
    • 信号名称: cluster_hdbscan
      • 嵌入类型: gte-small
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: legal-termination
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: negative-sentiment
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: non-english
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: positive-sentiment
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: profanity
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: question
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: source-code
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: toxicity
  • 路径: instruction

    • 信号名称: pii
    • 信号名称: text_statistics
    • 信号名称: near_dup
    • 信号名称: lang_detection
    • 信号名称: cluster_hdbscan
      • 嵌入类型: gte-small
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: legal-termination
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: negative-sentiment
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: non-english
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: positive-sentiment
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: profanity
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: question
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: source-code
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: toxicity
  • 路径: input

    • 信号名称: pii
    • 信号名称: text_statistics
    • 信号名称: near_dup
    • 信号名称: lang_detection
    • 信号名称: cluster_hdbscan
      • 嵌入类型: gte-small
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: legal-termination
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: negative-sentiment
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: non-english
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: positive-sentiment
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: profanity
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: question
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: source-code
    • 信号名称: concept_score
      • 嵌入类型: gte-small
      • 命名空间: lilac
      • 概念名称: toxicity

设置

  • UI媒体路径:
    • output
    • instruction
    • input
  • Markdown路径: []
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作