polite-guard
收藏魔搭社区2025-12-05 更新2025-08-02 收录
下载链接:
https://modelscope.cn/datasets/Intel/polite-guard
下载链接
链接失效反馈官方服务:
资源简介:
# Polite Guard
- **Dataset type**: Synthetic and Annotated
- **Task**: Text Classification
- **Domain**: Classification of text into polite, somewhat polite, neutral, and impolite categories
- **Source Code**: (https://github.com/intel/polite-guard)
- **Model**: (https://huggingface.co/Intel/polite-guard)
This dataset is for [**Polite Guard**](https://huggingface.co/Intel/polite-guard): an open-source NLP language model developed by Intel, fine-tuned from BERT for text classification tasks. Polite Guard is designed to classify text into four categories: polite, somewhat polite, neutral, and impolite. The model, along with its accompanying datasets and [source code](https://github.com/intel/polite-guard), is available on Hugging Face* and GitHub* to enable both communities to contribute to developing more sophisticated and context-aware AI systems.
## Use Cases
Polite Guard provides a scalable model development pipeline and methodology, making it easier for developers to create and fine-tune their own models. Other contributions of the project include:
- **Improved Robustness**:
Polite Guard enhances the resilience of systems by providing a defense mechanism against adversarial attacks. This ensures that the model can maintain its performance and reliability even when faced with potentially harmful inputs.
- **Benchmarking and Evaluation**:
The project introduces the first politeness benchmark, allowing developers to evaluate and compare the performance of their models in terms of politeness classification. This helps in setting a standard for future developments in this area.
- **Enhanced Customer Experience**:
By ensuring respectful and polite interactions on various platforms, Polite Guard can significantly boost customer satisfaction and loyalty. This is particularly beneficial for customer service applications where maintaining a positive tone is crucial.
## Dataset Description
The dataset consists of three main components:
- 50,000 samples generated using *Few-Shot prompting*
- 50,000 samples generated using *Chain-of-Thought (CoT) prompting*
- 200 *annotated* samples from corporate trainings with the personal identifiers removed
The synthetic data is split into training (80%), validation (10%), and test (10%) sets, with each set balanced according to the label. The real annotated data is used solely for evaluation purposes.
Each example contains:
- **text**: The text input (string)
- **label**: The classification label (category: polite, somewhat polite, neutral, and impolite)
- **source**: The language model used to generate synthetic text and LMS (Learning Management Systems) for corporate trainings (category)
- **reasoning**: The reasoning provided by the language model for generating text that aligns with the specified label and category (string)
The synthetic data consists of customer service interactions across various sectors, including finance, travel, food and drink, retail, sports clubs, culture and education, and professional development. To ensure *data regularization*, the labels and categories were randomly selected, and a language model was instructed to generate synthetic data based on the specified categories and labels. To ensure *data diversity*, the generation process utilized multiple prompts and the large language models listed below.
- [Llama 3.1 8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
- [Gemma 2 9B-It](https://huggingface.co/google/gemma-2-9b-it)
- [Mixtral 8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
The code for the data generator pipeline is available [here](https://github.com/intel/polite-guard). For more details on the prompts used and the development of the generator, refer to this [article](https://medium.com/p/0ff98eb226a1).
## Description of labels
- **polite**: Text is considerate and shows respect and good manners, often including courteous phrases and a friendly tone.
- **somewhat polite**: Text is generally respectful but lacks warmth or formality, communicating with a decent level of courtesy.
- **neutral**: Text is straightforward and factual, without emotional undertones or specific attempts at politeness.
- **impolite**: Text is disrespectful or rude, often blunt or dismissive, showing a lack of consideration for the recipient's feelings.
## Usage
```python
from datasets import load_dataset
dataset = load_dataset("Intel/polite-guard")
```
## Articles
To learn more about the implementation of the data generator and fine-tuner packages, refer to
- [Synthetic Data Generation with Language Models: A Practical Guide](https://medium.com/p/0ff98eb226a1), and
- [How to Fine-Tune Language Models: First Principles to Scalable Performance](https://medium.com/p/78f42b02f112).
For more AI development how-to content, visit [Intel® AI Development Resources](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/overview.html).
## Join the Community
If you are interested in exploring other models, join us in the Intel and Hugging Face communities. These models simplify the development and adoption of Generative AI solutions, while fostering innovation among developers worldwide. If you find this project valuable, please like ❤️ it on Hugging Face and share it with your network. Your support helps us grow the community and reach more contributors.
## Disclaimer
Polite Guard has been trained and validated on a limited set of data that pertains to customer reviews, product reviews, and corporate communications. Accuracy metrics cannot be guaranteed outside these narrow use cases, and therefore this tool should be validated within the specific context of use for which it might be deployed. This tool is not intended to be used to evaluate employee performance. This tool is not sufficient to prevent harm in many contexts, and additional tools and techniques should be employed in any sensitive use case where impolite speech may cause harm to individuals, communities, or society.
# 礼貌卫士(Polite Guard)
- **数据集类型**:合成与标注数据集
- **任务类型**:文本分类
- **任务领域**:将文本划分为礼貌、较为礼貌、中性、不礼貌四类的分类任务
- **源码地址**:(https://github.com/intel/polite-guard)
- **模型地址**:(https://huggingface.co/Intel/polite-guard)
本数据集配套于由英特尔(Intel)开发的开源自然语言处理(Natural Language Processing, NLP)语言模型**礼貌卫士(Polite Guard)**,该模型基于BERT微调以适配文本分类任务。礼貌卫士旨在将文本划分为四类:礼貌、较为礼貌、中性与不礼貌。该模型及其配套数据集与[源码](https://github.com/intel/polite-guard)已在Hugging Face和GitHub平台开放,以供社区参与开发更先进的上下文感知AI系统。
## 应用场景
礼貌卫士提供了可扩展的模型开发流水线与方法论,便于开发者创建并微调自有模型。本项目的其他贡献包括:
- **鲁棒性优化**:礼貌卫士通过提供对抗攻击防御机制增强系统韧性,确保模型在面对潜在有害输入时仍能维持性能与可靠性。
- **基准测试与评估**:本项目推出了首个礼貌度基准数据集,允许开发者评估并对比自身模型的礼貌度分类性能,有助于为该领域的后续发展设定标准。
- **客户体验优化**:通过保障各平台上的交互保持尊重与礼貌,礼貌卫士可显著提升客户满意度与忠诚度,这对需要维持积极沟通基调的客服应用尤为有益。
## 数据集说明
本数据集包含三个核心组成部分:
- 基于少样本提示(Few-Shot prompting)生成的50,000条样本
- 基于思维链(Chain-of-Thought, CoT)提示生成的50,000条样本
- 来自企业培训且已移除个人标识的200条标注样本
合成数据已按照标签均衡划分至训练集(80%)、验证集(10%)与测试集(10%)。真实标注数据仅用于评估。
每条样本包含以下字段:
- **text**:文本输入(字符串类型)
- **label**:分类标签(类别为:礼貌、较为礼貌、中性、不礼貌)
- **source**:用于生成合成文本的语言模型,以及企业培训所用的学习管理系统(Learning Management Systems, LMS)(类别型字段)
- **reasoning**:语言模型为生成符合指定标签与类别的文本所提供的推理依据(字符串类型)
合成数据涵盖多行业的客服交互场景,包括金融、旅游、餐饮、零售、体育俱乐部、文化教育以及职业发展领域。为保障**数据规整性**,标签与类别均经过随机选取,并通过指令引导大语言模型基于指定类别与标签生成合成数据。为保障**数据多样性**,生成过程使用了多组提示词与以下多款大语言模型:
- [Llama 3.1 8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct)
- [Gemma 2 9B-It](https://huggingface.co/google/gemma-2-9b-it)
- [Mixtral 8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
数据生成流水线的代码已在[此处](https://github.com/intel/polite-guard)开放。如需了解更多提示词设计与生成器开发细节,请参阅该[文章](https://medium.com/p/0ff98eb226a1)。
## 标签说明
- **礼貌**:文本体贴周到,体现尊重与得体礼仪,常包含礼貌用语与友好语调。
- **较为礼貌**:文本整体保持尊重,但缺乏暖意或正式感,以适度的礼貌水平进行沟通。
- **中性**:文本直白客观,无情感倾向或刻意的礼貌表达。
- **不礼貌**:文本无礼粗鲁,常表现为生硬或轻蔑,未顾及接收方的感受。
## 使用方法
python
from datasets import load_dataset
dataset = load_dataset("Intel/polite-guard")
## 相关文章
如需深入了解数据生成器与微调工具包的实现方式,请参阅:
- [《基于大语言模型的合成数据生成:实用指南》](https://medium.com/p/0ff98eb226a1),以及
- [《如何微调大语言模型:从底层原理到可扩展性能优化》](https://medium.com/p/78f42b02f112)。
如需更多AI开发实操内容,请访问[英特尔® AI开发资源平台](https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/overview.html)。
## 加入社区
如果您对探索其他模型感兴趣,欢迎加入英特尔与Hugging Face社区。这些模型简化了生成式AI(Generative AI)解决方案的开发与落地,同时助力全球开发者开展创新。如果您认为本项目具有价值,请在Hugging Face上点赞❤️并分享给您的社交网络。您的支持有助于社区壮大,吸引更多贡献者参与。
## 免责声明
礼貌卫士仅在客户评论、产品评论与企业沟通的有限数据集上完成训练与验证。在这些窄应用场景之外无法保证准确率,因此该工具需在其部署的具体使用场景中完成验证。本工具不得用于评估员工绩效。在许多场景下,本工具不足以防范伤害,因此在任何可能因不礼貌言论对个人、社区或社会造成伤害的敏感使用场景中,应采用额外的工具与技术手段。
提供机构:
maas
创建时间:
2025-08-01



