five

lilacai/mosaicml-local-mosaic-chat-v2

收藏
Hugging Face2023-08-30 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/mosaicml-local-mosaic-chat-v2
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/mosaicml](https://huggingface.co/spaces/lilacai/mosaicml). Original dataset: [https://huggingface.co/datasets/sam-mosaic/chat-v2](https://huggingface.co/datasets/sam-mosaic/chat-v2) Lilac dataset config: ```embeddings: - {embedding: gte-small, path: original-context} - embedding: gte-small path: [new-context, value, '*'] - {embedding: gte-small, path: response} - {embedding: gte-small, path: prompt} name: mosaic-chat-v2 namespace: local settings: preferred_embedding: gte-small ui: media_paths: [prompt, response] signals: - path: prompt signal: {signal_name: pii} - path: prompt signal: {signal_name: text_statistics} - path: prompt signal: {signal_name: near_dup} - path: prompt signal: {signal_name: lang_detection} - path: prompt signal: {concept_name: negative-sentiment, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: non-english, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: profanity, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: source-code, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: prompt signal: {concept_name: toxicity, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {signal_name: pii} - path: response signal: {signal_name: text_statistics} - path: response signal: {signal_name: near_dup} - path: response signal: {signal_name: lang_detection} - path: response signal: {concept_name: negative-sentiment, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: non-english, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: profanity, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: source-code, embedding: gte-small, namespace: lilac, signal_name: concept_score} - path: response signal: {concept_name: toxicity, embedding: gte-small, namespace: lilac, signal_name: concept_score} source: {dataset_name: sam-mosaic/chat-v2, source_name: huggingface} ```
提供机构:
lilacai
原始信息汇总

数据集概述

数据集配置

  • 嵌入配置:
    • embedding: gte-small
    • path: original-context
    • path: [new-context, value, *]
    • path: response
    • path: prompt
  • 名称: mosaic-chat-v2
  • 命名空间: local
  • 设置:
    • preferred_embedding: gte-small
    • ui: media_paths: [prompt, response]

信号配置

  • 路径: prompt
    • signal_name: pii
    • signal_name: text_statistics
    • signal_name: near_dup
    • signal_name: lang_detection
    • concept_name: negative-sentiment
    • concept_name: non-english
    • concept_name: profanity
    • concept_name: source-code
    • concept_name: toxicity
  • 路径: response
    • signal_name: pii
    • signal_name: text_statistics
    • signal_name: near_dup
    • signal_name: lang_detection
    • concept_name: negative-sentiment
    • concept_name: non-english
    • concept_name: profanity
    • concept_name: source-code
    • concept_name: toxicity

数据源

  • 数据集名称: sam-mosaic/chat-v2
  • 来源名称: huggingface
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作