five

preference-agents-working/enron-top-senders-train

收藏
Hugging Face2024-04-26 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/preference-agents-working/enron-top-senders-train
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: message_id dtype: string - name: from dtype: string - name: to dtype: string - name: date dtype: string - name: subject dtype: string - name: content dtype: string - name: email_context dtype: string - name: token_count_content dtype: int32 - name: token_count_context dtype: int32 - name: intent dtype: string - name: baseline struct: - name: google/gemma-1.1-2b-it dtype: string - name: google/gemma-1.1-7b-it dtype: string - name: meta-llama/Meta-Llama-3-70B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct dtype: string - name: mistralai/Mistral-7B-Instruct-v0.2 dtype: string - name: automatic_eval struct: - name: google/gemma-1.1-2b-it struct: - name: BERT Cosine Similarity dtype: float64 - name: BLEU Score dtype: float64 - name: ROUGE-L Score dtype: float64 - name: TF-IDF Cosine Similarity dtype: float64 - name: google/gemma-1.1-7b-it struct: - name: BERT Cosine Similarity dtype: float64 - name: BLEU Score dtype: float64 - name: ROUGE-L Score dtype: float64 - name: TF-IDF Cosine Similarity dtype: float64 - name: meta-llama/Meta-Llama-3-8B-Instruct struct: - name: BERT Cosine Similarity dtype: float64 - name: BLEU Score dtype: float64 - name: ROUGE-L Score dtype: float64 - name: TF-IDF Cosine Similarity dtype: float64 - name: mistralai/Mistral-7B-Instruct-v0.2 struct: - name: BERT Cosine Similarity dtype: float64 - name: BLEU Score dtype: float64 - name: ROUGE-L Score dtype: float64 - name: TF-IDF Cosine Similarity dtype: float64 - name: rules struct: - name: meta-llama/Meta-Llama-3-70B-Instruct struct: - name: meta-llama/Meta-Llama-3-70B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct struct: - name: meta-llama/Meta-Llama-3-70B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct dtype: string - name: processed_rules struct: - name: meta-llama/Meta-Llama-3-70B-Instruct struct: - name: meta-llama/Meta-Llama-3-70B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct struct: - name: meta-llama/Meta-Llama-3-70B-Instruct dtype: string - name: meta-llama/Meta-Llama-3-8B-Instruct dtype: string - name: text dtype: string - name: to_infer dtype: string - name: rule_ft_kaymann dtype: string - name: rule_ft_all_senders dtype: string - name: to_infer_kaymann_rules dtype: string - name: to_infer_allsenders_rules dtype: string - name: to_infer_baseline_rules dtype: string - name: to_infer_kaymann_rules_generated_email dtype: string - name: to_infer_allsenders_rules_generated_email dtype: string - name: to_infer_baseline_rules_generated_email dtype: string - name: naive_ftnk_generated_email dtype: string - name: naive_ft_generated_email dtype: string splits: - name: train num_bytes: 151599517 num_examples: 3832 - name: test num_bytes: 37494886 num_examples: 958 download_size: 80953494 dataset_size: 189094403 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* ---
提供机构:
preference-agents-working
原始信息汇总

数据集概述

数据集特征

  • message_id: 字符串类型
  • from: 字符串类型
  • to: 字符串类型
  • date: 字符串类型
  • subject: 字符串类型
  • content: 字符串类型
  • email_context: 字符串类型
  • token_count_content: 整数类型 (int32)
  • token_count_context: 整数类型 (int32)
  • intent: 字符串类型
  • baseline: 结构体类型,包含多个模型名称,如 google/gemma-1.1-2b-it
  • automatic_eval: 结构体类型,包含多个模型的评估指标,如 BERT Cosine Similarity, BLEU Score, ROUGE-L Score, TF-IDF Cosine Similarity
  • rules: 结构体类型,包含规则信息
  • processed_rules: 结构体类型,包含处理后的规则信息
  • text: 字符串类型
  • to_infer: 字符串类型
  • rule_ft_kaymann: 字符串类型
  • rule_ft_all_senders: 字符串类型
  • to_infer_kaymann_rules: 字符串类型
  • to_infer_allsenders_rules: 字符串类型
  • to_infer_baseline_rules: 字符串类型
  • to_infer_kaymann_rules_generated_email: 字符串类型
  • to_infer_allsenders_rules_generated_email: 字符串类型
  • to_infer_baseline_rules_generated_email: 字符串类型
  • naive_ftnk_generated_email: 字符串类型
  • naive_ft_generated_email: 字符串类型

数据集分割

  • train: 3832个样本,总大小151599517字节
  • test: 958个样本,总大小37494886字节

数据集大小

  • 下载大小: 80953494字节
  • 数据集总大小: 189094403字节

配置信息

  • config_name: default
  • data_files:
    • train: 文件路径为 data/train-*
    • test: 文件路径为 data/test-*
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作