preference-agents-working/enron-top-senders-train
收藏Hugging Face2024-04-26 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/preference-agents-working/enron-top-senders-train
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: message_id
dtype: string
- name: from
dtype: string
- name: to
dtype: string
- name: date
dtype: string
- name: subject
dtype: string
- name: content
dtype: string
- name: email_context
dtype: string
- name: token_count_content
dtype: int32
- name: token_count_context
dtype: int32
- name: intent
dtype: string
- name: baseline
struct:
- name: google/gemma-1.1-2b-it
dtype: string
- name: google/gemma-1.1-7b-it
dtype: string
- name: meta-llama/Meta-Llama-3-70B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
dtype: string
- name: mistralai/Mistral-7B-Instruct-v0.2
dtype: string
- name: automatic_eval
struct:
- name: google/gemma-1.1-2b-it
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: ROUGE-L Score
dtype: float64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: google/gemma-1.1-7b-it
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: ROUGE-L Score
dtype: float64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: meta-llama/Meta-Llama-3-8B-Instruct
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: ROUGE-L Score
dtype: float64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: mistralai/Mistral-7B-Instruct-v0.2
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: ROUGE-L Score
dtype: float64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: rules
struct:
- name: meta-llama/Meta-Llama-3-70B-Instruct
struct:
- name: meta-llama/Meta-Llama-3-70B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
struct:
- name: meta-llama/Meta-Llama-3-70B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
dtype: string
- name: processed_rules
struct:
- name: meta-llama/Meta-Llama-3-70B-Instruct
struct:
- name: meta-llama/Meta-Llama-3-70B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
struct:
- name: meta-llama/Meta-Llama-3-70B-Instruct
dtype: string
- name: meta-llama/Meta-Llama-3-8B-Instruct
dtype: string
- name: text
dtype: string
- name: to_infer
dtype: string
- name: rule_ft_kaymann
dtype: string
- name: rule_ft_all_senders
dtype: string
- name: to_infer_kaymann_rules
dtype: string
- name: to_infer_allsenders_rules
dtype: string
- name: to_infer_baseline_rules
dtype: string
- name: to_infer_kaymann_rules_generated_email
dtype: string
- name: to_infer_allsenders_rules_generated_email
dtype: string
- name: to_infer_baseline_rules_generated_email
dtype: string
- name: naive_ftnk_generated_email
dtype: string
- name: naive_ft_generated_email
dtype: string
splits:
- name: train
num_bytes: 151599517
num_examples: 3832
- name: test
num_bytes: 37494886
num_examples: 958
download_size: 80953494
dataset_size: 189094403
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
提供机构:
preference-agents-working
原始信息汇总
数据集概述
数据集特征
- message_id: 字符串类型
- from: 字符串类型
- to: 字符串类型
- date: 字符串类型
- subject: 字符串类型
- content: 字符串类型
- email_context: 字符串类型
- token_count_content: 整数类型 (int32)
- token_count_context: 整数类型 (int32)
- intent: 字符串类型
- baseline: 结构体类型,包含多个模型名称,如
google/gemma-1.1-2b-it等 - automatic_eval: 结构体类型,包含多个模型的评估指标,如
BERT Cosine Similarity,BLEU Score,ROUGE-L Score,TF-IDF Cosine Similarity - rules: 结构体类型,包含规则信息
- processed_rules: 结构体类型,包含处理后的规则信息
- text: 字符串类型
- to_infer: 字符串类型
- rule_ft_kaymann: 字符串类型
- rule_ft_all_senders: 字符串类型
- to_infer_kaymann_rules: 字符串类型
- to_infer_allsenders_rules: 字符串类型
- to_infer_baseline_rules: 字符串类型
- to_infer_kaymann_rules_generated_email: 字符串类型
- to_infer_allsenders_rules_generated_email: 字符串类型
- to_infer_baseline_rules_generated_email: 字符串类型
- naive_ftnk_generated_email: 字符串类型
- naive_ft_generated_email: 字符串类型
数据集分割
- train: 3832个样本,总大小151599517字节
- test: 958个样本,总大小37494886字节
数据集大小
- 下载大小: 80953494字节
- 数据集总大小: 189094403字节
配置信息
- config_name: default
- data_files:
- train: 文件路径为
data/train-* - test: 文件路径为
data/test-*
- train: 文件路径为



