lilacai/lilac-enron-emails
收藏Hugging Face2023-12-07 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/lilacai/lilac-enron-emails
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is generated by [Lilac](http://lilacml.com) for a HuggingFace Space: [huggingface.co/spaces/lilacai/lilac](https://huggingface.co/spaces/lilacai/lilac).
Original dataset: [https://huggingface.co/datasets/EleutherAI/pile](https://huggingface.co/datasets/EleutherAI/pile)
Lilac dataset config:
```namespace: lilac
name: enron-emails
source:
dataset_name: EleutherAI/pile
config_name: enron_emails
sample_size: 100000
source_name: huggingface
embeddings:
- path: text
embedding: gte-small
signals:
- path: text
signal:
signal_name: near_dup
- path: text
signal:
signal_name: pii
- path: text
signal:
signal_name: lang_detection
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path: text
signal:
signal_name: text_statistics
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: legal-termination
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: negative-sentiment
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: non-english
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: positive-sentiment
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: profanity
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: question
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: source-code
signal_name: concept_score
- path: text
signal:
embedding: gte-small
namespace: lilac
concept_name: toxicity
signal_name: concept_score
- path: text
signal:
signal_name: cluster_dbscan
- path: text
signal:
embedding: gte-small
signal_name: cluster_hdbscan
settings:
ui:
media_paths:
- text
markdown_paths: []
tags:
- business
```
提供机构:
lilacai
原始信息汇总
数据集概述
基本信息
- 命名空间: lilac
- 名称: enron-emails
- 源数据集: EleutherAI/pile
- 配置名称: enron_emails
- 样本大小: 100000
- 源名称: huggingface
嵌入信息
- 路径: text
- 嵌入: gte-small
信号信息
- 路径: text
- 信号:
- 信号名称: near_dup
- 信号名称: pii
- 信号名称: lang_detection
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: positive-sentiment
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: non-english
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: toxicity
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: question
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: legal-termination
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: source-code
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: negative-sentiment
- 信号名称: concept_score
- 嵌入: gte-small
- 命名空间: lilac
- 概念名称: profanity
- 信号名称: concept_score
- 信号名称: text_statistics
- 信号名称: cluster_dbscan
- 嵌入: gte-small
- 信号名称: cluster_hdbscan
设置信息
- UI:
- 媒体路径: text
- Markdown路径: []
- 标签:
- business



