SLAVA-OpenData-2800-v1

Hugging Face2024-10-03 更新2024-12-12 收录

下载链接：

https://huggingface.co/datasets/RANEPA-ai/SLAVA-OpenData-2800-v1

下载链接

链接失效反馈

官方服务：

资源简介：

自2024年以来，SLAVA基准测试集已开发完成，包含约14,000个专注于俄罗斯领域的问题，涵盖历史、政治学、社会学、政治地理学和国家安全基础等领域。该基准测试评估大型语言模型（LLMs）处理俄罗斯信息空间中重要敏感话题的能力。主要任务包括测试LLMs在俄罗斯领域的实际知识、评估问题的敏感性（挑衅性）以及基于答案准确性创建全面的评估系统。问题分为多选题（单选或多选）、序列和匹配题以及开放式回答题。问题的挑衅性分为低、中、高三个等级。测试结果显示，支持俄语的24个LLMs中，GigaChat、YandexGPT和qwen2模型在处理复杂挑衅性问题方面表现最佳。该基准强调了进一步研究LLMs可靠性的必要性，特别是在俄罗斯社会和政治重要话题的背景下。

The SLAVA benchmark was developed starting in 2024, containing approximately 14,000 Russia-centric questions spanning fields including history, political science, sociology, political geography, and fundamentals of national security. This benchmark evaluates the ability of large language models (LLMs) to handle important and sensitive topics within the Russian information space. Its core tasks include assessing LLMs' factual knowledge related to Russian domains, evaluating the sensitivity (provocative nature) of questions, and constructing a comprehensive evaluation system based on answer accuracy. Questions are categorized into multiple-choice questions (single or multiple selection), sequence and matching questions, and open-ended response questions. The provocative nature of the questions is divided into three levels: low, medium, and high. Test results demonstrate that among the 24 LLMs supporting the Russian language, GigaChat, YandexGPT, and Qwen2 models deliver the best performance when dealing with complex provocative questions. This benchmark underscores the necessity of further research into LLM reliability, particularly in the context of socially and politically significant topics tied to Russia.

创建时间：

2024-09-22

原始信息汇总

SLAVA: A benchmark of the Socio-political Landscape And Value Analysis

Dataset Description

Developed Since: 2024
Number of Questions: Approximately 14,000
Focus Areas: History, political science, sociology, political geography, national security basics
Objective: Evaluate the ability of large language models (LLMs) to handle sensitive topics important to the Russian information space.

Main Tasks:

Testing factual knowledge of LLMs in Russian domains.
Assessing the sensitivity (provocativeness) of the questions.
Creating a comprehensive evaluation system based on answer accuracy.

Structure:

Question Types:
- Multiple choice with one or several correct answers.
- Sequences and matching.
- Open-ended responses.

Question Provocativeness:

1 point: Low sensitivity — generally accepted facts.
2 points: Medium sensitivity — controversial issues in the mentioned areas.
3 points: High sensitivity — political and cultural issues that can provoke conflicts.

Results:

Tested Models: 24 LLMs supporting the Russian language.
High Accuracy Models: GigaChat, YandexGPT, qwen2.

Dataset Composition

Data Instances

json { "instruction": "Прочитайте приведённую далее задачу и выполните по ней задание. Задача: {task} Вариант ответа 1: {Option_1}, Вариант ответа 2: {Option_2}, Вариант ответа 3: {Option_3}, Вариант ответа 4: {Option_4}, Вариант ответа 5: {Option_5}, Вариант ответа 6: {Option_6} Выберите несколько вариантов правильных ответов и перечислите в ответе их номера без пробелов и знаков препинания.", "inputs": { "task": "В стране Y создан Центр изучения глобальных экологических проблем. Какие проблемымогут стать объектом изучения в данном центре?", "text": NaN, "options": { "option_1": "истощение запасов рыбы в мировом океане", "option_2": "озоновые дыры", "option_3": "глобальное перенаселение", "option_4": "распространение вируса СПИДа", "option_5": "старение населения в развитых странах", "option_6": "потепление климата", "option_7": NaN, "option_8": NaN, "option_9": NaN } }, "outputs": 126, "meta": { "subject": "Обществознание", "type": "выбор ответа (мультивыбор)", "source": "https://socege.sdamgia.ru/problem?id=69498", "comment": 4, "provac_score": 2 } }

Data Fields:

instruction: String containing task instructions.
inputs:
- task: String containing the task formulation.
- text: String with the main text or phrase for which a response needs to be selected.
- options: Object containing a list of possible answer choices.
  - option_1 - option_9: Answer choices represented as strings. Unused fields may contain null.
outputs: Number indicating the correct answer choice.
meta: Additional information about the task:
- subject: String specifying the subject of the task.
- type: String describing the type of task.
- source: String containing the source of the task.
- comment: Field for comments (can be null).
- provac_score: Numerical value indicating the difficulty or importance of the task.

Licensing Information

License: MIT

Citation Information

plaintext @misc{SLAVA: Benchmark of Sociopolitical Landscape and Value Analysis, author = {A. S. Chetvergov, R. S. Sharafetdinov, M. M. Polukoshko, V. A. Akhmetov, N. A. Oruzheynikova, E. S. Anichkov, S. V. Bolovtsov, I. S. Alekseevskaya}, title = {SLAVA: Benchmark of Sociopolitical Landscape and Value Analysis (2024)}, year = {2024}, publisher = {Hugging Face}, howpublished = "url{https://huggingface.co/datasets/RANEPA-ai/SLAVA-OpenData-2800-v1}" }

搜集汇总

数据集介绍

构建方式

SLAVA-OpenData-2800-v1数据集旨在评估大型语言模型（LLMs）在俄语领域的事实准确性。该数据集的构建基于多样化的任务类型，包括多项选择题、序列匹配题和开放式回答题。每个任务都经过精心设计，涵盖从低敏感性到高敏感性的不同主题，以确保全面评估模型在复杂社会政治议题上的表现。数据来源包括公开的教育资源和专家评审，确保了数据的权威性和多样性。

使用方法

使用SLAVA-OpenData-2800-v1数据集时，研究者可以通过Hugging Face平台下载数据集文件，并使用Python脚本加载数据。数据集以JSONL格式存储，便于逐行读取和处理。研究者可以根据任务类型、主题或敏感性评分对数据进行筛选和分析，以评估模型在不同情境下的表现。此外，数据集提供了丰富的可视化图表，帮助研究者直观理解数据分布和模型表现。

背景与挑战

背景概述

SLAVA-OpenData-2800-v1数据集由俄罗斯国立高等经济学院（RANEPA）的研究团队于2024年创建，旨在评估大型语言模型（LLMs）在俄语领域的事实准确性。随着LLMs在自然语言处理任务中的广泛应用，其可靠性成为关键问题，尤其是在涉及敏感和争议性话题时。该数据集填补了俄语语境下LLMs事实性评估的空白，特别关注社会政治和文化领域的敏感问题。通过多类型问题和不同敏感度评分，SLAVA为LLMs在俄语环境中的表现提供了全面的评估框架，推动了相关领域的研究进展。

当前挑战

SLAVA数据集面临的主要挑战包括两个方面：首先，在解决领域问题上，如何准确评估LLMs在俄语语境中的事实性知识，尤其是在涉及敏感和争议性话题时，模型的表现往往难以量化。其次，在数据构建过程中，如何确保问题的多样性和代表性，同时避免引入偏见或误导性信息，是一个复杂的技术难题。此外，数据集的设计需要平衡问题的敏感度与评估的客观性，以确保结果的科学性和实用性。这些挑战要求研究团队在数据收集、标注和评估过程中保持高度的严谨性和专业性。

常用场景

经典使用场景

SLAVA-OpenData-2800-v1数据集主要用于评估大型语言模型（LLMs）在俄语领域的事实准确性。通过设计多种类型的问题，包括选择题、序列匹配和开放式回答，该数据集能够全面测试模型在处理复杂、敏感话题时的表现。特别是在涉及社会和政治敏感问题时，数据集能够有效衡量模型的可靠性和准确性。

解决学术问题

该数据集解决了在俄语语境下评估LLMs事实准确性的学术研究问题。由于现有的基准测试大多忽略俄语背景下的敏感话题，SLAVA填补了这一空白，提供了针对俄语模型的全面评估工具。通过引入不同敏感度的问题，数据集帮助研究者更好地理解模型在处理争议性话题时的表现，从而推动LLMs在俄语领域的进一步发展。

实际应用

在实际应用中，SLAVA数据集可用于优化俄语语言模型在新闻、教育和政策分析等领域的表现。例如，新闻机构可以利用该数据集评估模型在生成新闻报道时的准确性，避免传播错误信息。教育机构则可以通过该数据集测试模型在解答学生问题时的可靠性，确保教育内容的准确性。

数据集最近研究