oishooo/formality_classification
收藏Hugging Face2025-04-06 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/oishooo/formality_classification
下载链接
链接失效反馈官方服务:
资源简介:
该数据集通过从多个来源采样构成,包括[ccdv/govreport-summarization]测试集、[MedRAG/pubmed]训练集、[osyvokon/pavlick-formality-scores]测试集和[HuggingFaceGECLM/REDDIT_comments]的tifu部分,这些来源代表了不同正式程度的文本片段。每个样本包括text(文本)、type(类型)、formality_label(正式度标签)和formality_explanation(正式度解释)字段。对于每个样本,使用大型语言模型[DeepSeek-V3]分配正式度标签并生成相应的解释。对于标签为neutral的样本,数据直接来源于现有的数据集,没有正式度解释。
The dataset is constructed by sampling from multiple sources — the test split of [ccdv/govreport-summarization], the train split of [MedRAG/pubmed], the test split of [osyvokon/pavlick-formality-scores], and the tifu split of [HuggingFaceGECLM/REDDIT_comments] — representing a diverse range of text snippets with varying levels of formality. Each sample includes the fields text, type, formality_label, and formality_explanation. A large language model [DeepSeek-V3] is used to assign a formality_label and generate a corresponding formality_explanation for each sample. However, for samples with the formality_label neutral, the data is sourced directly from an existing dataset without a formality explanation.
提供机构:
oishooo



