Lakera/gandalf-rct
收藏Hugging Face2025-05-20 更新2025-02-15 收录
下载链接:
https://hf-mirror.com/datasets/Lakera/gandalf-rct
下载链接
链接失效反馈官方服务:
资源简介:
这是一个包含用户提交给Gandalf-RCT的所有提示和猜测的数据集。数据集中的列包括提交时间、类型(提示或猜测)、用户ID、设置(通用、试验或摘要)、防御措施、等级(设置-防御)、猜测的密码(如果类型是提示则为None)、实际密码、猜测成功与否(如果类型是提示则为None)、提交的提示(如果类型是猜测则为最后提交的提示)、使用的LLM名称、显示给用户的答案、LLM的原始响应、服务器生成响应的时间(如果类型是猜测则为NaN)、显示等级的顺序以及阻挡响应的防御措施标识(如果没有阻挡则为not_blocked)。
This dataset contains all prompt and guess submissions to Gandalf-RCT. The columns in the dataset include submission datetime, type (prompt or guess), user ID, setup (general, trial, or summarization), defense, level (setup-defense), guessed password (None if type is prompt), actual password, whether the password guess was successful (None if type is prompt), the submitted prompt (for type guess, this is the last submitted prompt), the name of the LLM used to generate the answer, the answer displayed to the user, the LLMs raw response before defenses such as substring checking and LLM checking are applied, how long the server took to generate the response (NaN if type is guess), the order in which the levels were displayed to this particular user, and an identifier of the defense that blocked the response if applicable (otherwise set to not_blocked).
提供机构:
Lakera



