five

lilacai/lilac-enron-emails

收藏
Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-enron-emails
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac). Original dataset: [https://huggingface.co/datasets/EleutherAI/pile](https://huggingface.co/datasets/EleutherAI/pile) Lilac dataset config: ```namespace: lilac name: enron-emails source: dataset_name: EleutherAI/pile config_name: enron_emails sample_size: 100000 source_name: huggingface embeddings: - path: text embedding: gte-small signals: - path: text signal: signal_name: near_dup - path: text signal: signal_name: pii - path: text signal: signal_name: lang_detection - path: text signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: text signal: signal_name: text_statistics - path: text signal: embedding: gte-small namespace: lilac concept_name: legal-termination signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: negative-sentiment signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: non-english signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: positive-sentiment signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: profanity signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: question signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: source-code signal_name: concept_score - path: text signal: embedding: gte-small namespace: lilac concept_name: toxicity signal_name: concept_score - path: text signal: signal_name: cluster_dbscan - path: text signal: embedding: gte-small signal_name: cluster_hdbscan settings: ui: media_paths: - text markdown_paths: [] tags: - business ```
提供机构:
lilacai
原始信息汇总

数据集概述

基本信息

  • 命名空间: lilac
  • 名称: enron-emails
  • 源数据集: EleutherAI/pile
  • 配置名称: enron_emails
  • 样本大小: 100000
  • 源名称: huggingface

嵌入信息

  • 路径: text
  • 嵌入: gte-small

信号信息

  • 路径: text
  • 信号:
    • 信号名称: near_dup
    • 信号名称: pii
    • 信号名称: lang_detection
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: positive-sentiment
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: non-english
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: toxicity
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: question
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: legal-termination
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: source-code
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: negative-sentiment
      • 信号名称: concept_score
    • 嵌入: gte-small
      • 命名空间: lilac
      • 概念名称: profanity
      • 信号名称: concept_score
    • 信号名称: text_statistics
    • 信号名称: cluster_dbscan
    • 嵌入: gte-small
      • 信号名称: cluster_hdbscan

设置信息

  • UI:
    • 媒体路径: text
    • Markdown路径: []
  • 标签:
    • business
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作