preference-agents-working/enron-jeff-dasovich
收藏Hugging Face2024-06-01 更新2024-06-12 收录
下载链接:
https://hf-mirror.com/datasets/preference-agents-working/enron-jeff-dasovich
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: id
dtype: string
- name: message_id
dtype: string
- name: from
dtype: string
- name: to
dtype: string
- name: date
dtype: string
- name: subject
dtype: string
- name: content
dtype: string
- name: email_context
dtype: string
- name: token_count_content
dtype: int32
- name: token_count_context
dtype: int32
- name: __index_level_0__
dtype: int64
- name: generated_intent
dtype: string
- name: train_data_gemma_format
dtype: string
- name: baseline_gemma-7b-it
dtype: string
- name: baseline_gemma-2b-it
dtype: string
- name: baseline_Mistral-7B-Instruct-v0.2
dtype: string
- name: automatic_eval_Mistral-7B-Instruct-v0.2
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: Jaccard Similarity
dtype: float64
- name: Levenshtein Distance
dtype: int64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: automatic_eval_gemma-2b-it
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: Jaccard Similarity
dtype: float64
- name: Levenshtein Distance
dtype: int64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: automatic_eval_gemma-7b-it
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: Jaccard Similarity
dtype: float64
- name: Levenshtein Distance
dtype: int64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: sft_prompt_gemma-2b-it
dtype: string
- name: sft_text_gemma
dtype: string
- name: automatic_eval_finetune_gemma-2b-it
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: Jaccard Similarity
dtype: float64
- name: Levenshtein Distance
dtype: int64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: finetune_gemma-2b-it
dtype: string
- name: cleaned_finetune_gemma-2b-it
dtype: string
- name: finetune_gemma-7b-it
dtype: string
- name: cleaned_finetune_gemma-7b-it
dtype: string
- name: automatic_eval_finetune_gemma-7b-it
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: Jaccard Similarity
dtype: float64
- name: Levenshtein Distance
dtype: int64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: automatic_eval_finetune_Mistral-7B-Instruct-v0.2
struct:
- name: BERT Cosine Similarity
dtype: float64
- name: BLEU Score
dtype: float64
- name: Jaccard Similarity
dtype: float64
- name: Levenshtein Distance
dtype: int64
- name: TF-IDF Cosine Similarity
dtype: float64
- name: sft_text_Mistral
dtype: string
- name: finetune_Mistral-7B-Instruct-v0.2
dtype: string
- name: cleaned_finetune_Mistral-7B-Instruct-v0.2
dtype: string
splits:
- name: train
num_bytes: 4645727
num_examples: 260
- name: test
num_bytes: 1228392
num_examples: 65
download_size: 3206836
dataset_size: 5874119
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
- split: test
path: data/test-*
---
提供机构:
preference-agents-working
原始信息汇总
数据集特征
基本特征
- id: 字符串类型
- message_id: 字符串类型
- from: 字符串类型
- to: 字符串类型
- date: 字符串类型
- subject: 字符串类型
- content: 字符串类型
- email_context: 字符串类型
- token_count_content: 整数类型(int32)
- token_count_context: 整数类型(int32)
- index_level_0: 整数类型(int64)
- generated_intent: 字符串类型
- train_data_gemma_format: 字符串类型
- baseline_gemma-7b-it: 字符串类型
- baseline_gemma-2b-it: 字符串类型
- baseline_Mistral-7B-Instruct-v0.2: 字符串类型
自动评估指标
- automatic_eval_Mistral-7B-Instruct-v0.2: 结构类型,包含以下指标:
- BERT Cosine Similarity: 浮点类型(float64)
- BLEU Score: 浮点类型(float64)
- Jaccard Similarity: 浮点类型(float64)
- Levenshtein Distance: 整数类型(int64)
- TF-IDF Cosine Similarity: 浮点类型(float64)
- automatic_eval_gemma-2b-it: 结构类型,包含以下指标:
- BERT Cosine Similarity: 浮点类型(float64)
- BLEU Score: 浮点类型(float64)
- Jaccard Similarity: 浮点类型(float64)
- Levenshtein Distance: 整数类型(int64)
- TF-IDF Cosine Similarity: 浮点类型(float64)
- automatic_eval_gemma-7b-it: 结构类型,包含以下指标:
- BERT Cosine Similarity: 浮点类型(float64)
- BLEU Score: 浮点类型(float64)
- Jaccard Similarity: 浮点类型(float64)
- Levenshtein Distance: 整数类型(int64)
- TF-IDF Cosine Similarity: 浮点类型(float64)
微调相关特征
- sft_prompt_gemma-2b-it: 字符串类型
- sft_text_gemma: 字符串类型
- automatic_eval_finetune_gemma-2b-it: 结构类型,包含以下指标:
- BERT Cosine Similarity: 浮点类型(float64)
- BLEU Score: 浮点类型(float64)
- Jaccard Similarity: 浮点类型(float64)
- Levenshtein Distance: 整数类型(int64)
- TF-IDF Cosine Similarity: 浮点类型(float64)
- finetune_gemma-2b-it: 字符串类型
- cleaned_finetune_gemma-2b-it: 字符串类型
- finetune_gemma-7b-it: 字符串类型
- cleaned_finetune_gemma-7b-it: 字符串类型
- automatic_eval_finetune_gemma-7b-it: 结构类型,包含以下指标:
- BERT Cosine Similarity: 浮点类型(float64)
- BLEU Score: 浮点类型(float64)
- Jaccard Similarity: 浮点类型(float64)
- Levenshtein Distance: 整数类型(int64)
- TF-IDF Cosine Similarity: 浮点类型(float64)
- automatic_eval_finetune_Mistral-7B-Instruct-v0.2: 结构类型,包含以下指标:
- BERT Cosine Similarity: 浮点类型(float64)
- BLEU Score: 浮点类型(float64)
- Jaccard Similarity: 浮点类型(float64)
- Levenshtein Distance: 整数类型(int64)
- TF-IDF Cosine Similarity: 浮点类型(float64)
- sft_text_Mistral: 字符串类型
- finetune_Mistral-7B-Instruct-v0.2: 字符串类型
- cleaned_finetune_Mistral-7B-Instruct-v0.2: 字符串类型
数据集分割
- train: 260个样本,4645727字节
- test: 65个样本,1228392字节
数据集大小
- 下载大小: 3206836字节
- 数据集大小: 5874119字节



