black-box-api-challenges
收藏魔搭社区2025-12-05 更新2024-12-21 收录
下载链接:
https://modelscope.cn/datasets/CohereLabs/black-box-api-challenges
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset Card
**Paper**: On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
**Abstract**: Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity. Our findings suggest that research that relied on inherited automatic toxicity scores to compare models and techniques may have resulted in inaccurate findings. Rescoring all models from HELM, a widely respected living benchmark, for toxicity with the recent version of the API led to a different ranking of extensively used models. We suggest caution in applying apples-to-apples comparisons between studies and lay recommendations for a more structured approach to evaluating toxicity over time.
Published on the [Trustworthy and Reliable Large-Scale Machine Learning Models ICLR 2023 Workshop](https://rtml-iclr2023.github.io/cfp.html).
[[Code]](https://github.com/for-ai/black-box-api-challenges) [[OpenReview]](https://openreview.net/forum?id=bRDHL4J5vy) [[Extended Pre-print]]()
## Dataset Description
In this repo are the data from the paper "On the challenges of using black-box APIs for toxicity evaluation in research".
In the folders you can find:
- **real-toxicity-prompts:** prompts from the RealToxicityPrompts dataset rescored with Perspective API in February 2023.
- **helm:** prompts and continuations from the HELM benchmark v0.2.2 rescored with Perspective API on April 2023. Also, in that folder we have the original stats from each of the models as scraped from the website.
- **dexperts:** prompts and continuations from a few models from the DExperts paper. Rescored with Perspective API on February 2023.
- **uddia:** continuations from UDDIA models. Rescored with Perspective API on February 2023.
### RealToxicityPrompts
RealToxicityPrompts is a dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models.
- **Homepage:** [Toxic Degeneration homepage](https://toxicdegeneration.allenai.org/)
- **Repository:** [Code repository](https://github.com/allenai/real-toxicity-prompts)
- **Paper:** [RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models](https://arxiv.org/abs/2009.11462)
### HELM
- **Homepage:** [HELM Benchmark](https://crfm.stanford.edu/helm/latest/)
- **Repository:** [Code repository](https://github.com/stanford-crfm/helm)
- **Paper:** [Holistic Evaluation of Language Models](https://arxiv.org/abs/2211.09110)
### DExperts
- **Repository:** [Code repository](https://github.com/alisawuffles/DExperts)
- **Paper:** [DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts](https://arxiv.org/abs/2105.03023)
### UDDIA
- **Paper:** [Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization](https://arxiv.org/abs/2210.04492)
# Citation
```
@inproceedings{
pozzobon2023on,
title={On the Challenges of Using Black-Box {API}s for Toxicity Evaluation in Research},
author={Luiza Amador Pozzobon and Beyza Ermis and Patrick Lewis and Sara Hooker},
booktitle={ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models },
year={2023},
url={https://openreview.net/forum?id=bRDHL4J5vy}
}
```
# 数据集卡片
**论文**:《论在研究中使用黑盒API开展毒性评估的挑战》
**摘要**:毒性认知随时间推移不断演变,且常因地域与文化背景而存在差异。同理,市面上商用的黑盒毒性检测API(如Perspective API)并非静态不变,而是会频繁重新训练,以修复未被发现的漏洞与偏见。我们评估了此类变化对比较各类旨在抑制毒性的模型与方法相对优势的研究结果可复现性的影响。研究结果表明,依赖继承的自动毒性评分来比较模型与技术的研究,可能得出不准确的结论。我们使用最新版本的API,对广受认可的动态基准HELM中的所有模型进行毒性重评分后,常用模型的排名发生了显著变化。我们呼吁在不同研究间进行对照比较时保持谨慎,并提出了更结构化的长期毒性评估方法建议。
发表于[可信且可靠的大规模机器学习模型 ICLR 2023 研讨会](https://rtml-iclr2023.github.io/cfp.html)。
[[代码]](https://github.com/for-ai/black-box-api-challenges) [[OpenReview]](https://openreview.net/forum?id=bRDHL4J5vy) [[扩展预印本]]()
## 数据集说明
本仓库包含论文《论在研究中使用黑盒API开展毒性评估的挑战》的相关数据。各文件夹中包含以下内容:
- **real-toxicity-prompts**:来自RealToxicityPrompts数据集的提示语,已于2023年2月使用Perspective API完成重评分。
- **helm**:来自HELM基准v0.2.2的提示语与文本续文,已于2023年4月使用Perspective API完成重评分。该文件夹中同时包含从官方网站爬取的各模型原始统计数据。
- **dexperts**:来自DExperts论文中部分模型的提示语与文本续文,已于2023年2月使用Perspective API完成重评分。
- **uddia**:来自UDDIA模型的文本续文,已于2023年2月使用Perspective API完成重评分。
### RealToxicityPrompts数据集
RealToxicityPrompts数据集包含10万条来自网络的语句片段,供研究者进一步探究语言模型的神经毒性退化风险。
- **主页**:[毒性退化研究主页](https://toxicdegeneration.allenai.org/)
- **代码仓库**:[代码仓库](https://github.com/allenai/real-toxicity-prompts)
- **论文**:[《RealToxicityPrompts:评估语言模型中的神经毒性退化》](https://arxiv.org/abs/2009.11462)
### HELM基准
- **主页**:[HELM基准主页](https://crfm.stanford.edu/helm/latest/)
- **代码仓库**:[代码仓库](https://github.com/stanford-crfm/helm)
- **论文**:[《语言模型全景评估》](https://arxiv.org/abs/2211.09110)
### DExperts
- **代码仓库**:[代码仓库](https://github.com/alisawuffles/DExperts)
- **论文**:[《DExperts:基于专家与反专家的解码时可控文本生成》](https://arxiv.org/abs/2105.03023)
### UDDIA
- **论文**:[《基于推理时自适应优化的语言生成统一解毒与去偏方法》](https://arxiv.org/abs/2210.04492)
# 引用
@inproceedings{
pozzobon2023on,
title={On the Challenges of Using Black-Box {API}s for Toxicity Evaluation in Research},
author={Luiza Amador Pozzobon and Beyza Ermis and Patrick Lewis and Sara Hooker},
booktitle={ICLR 2023 Workshop on Trustworthy and Reliable Large-Scale Machine Learning Models },
year={2023},
url={https://openreview.net/forum?id=bRDHL4J5vy}
}
提供机构:
maas
创建时间:
2025-08-01



