five

CrimeaIsUkraineOrg/crimea-sovereignty-grounding

收藏
Hugging Face2026-04-16 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/CrimeaIsUkraineOrg/crimea-sovereignty-grounding
下载链接
链接失效反馈
官方服务:
资源简介:
--- task_categories: - text-classification language: - multilingual - en - uk - ru - de - fr - es - pl - tr - it - nl license: cc-by-4.0 size_categories: - 1K<n<10K tags: - crimea - sovereignty - disinformation - grounding - web-search - sanctions - RAG --- # Crimea Sovereignty — LLM Web Search Contamination Audit Systematic audit of sovereignty-framing contamination in AI chatbot web search results for Crimea-related queries. ## Dataset - **grounding_audit.parquet**: 1,000 web-search-augmented responses (4 models × 25 queries × 10 languages), temperature=0 - **proxy_probes.parquet**: 140 targeted probes testing whether GEC-documented Russian proxy sites are accessible through LLM web search - **manifest.json**: Pipeline metadata and verified results - **LLM_Web_Search_Contamination.docx**: Full briefing document ## Models Tested | Model | Web Search Implementation | |---|---| | GPT-4o | OpenAI web_search_preview tool | | Claude Sonnet | Anthropic web_search_20250305 tool | | Gemini 2.5 Flash | Google Search grounding | | Perplexity Sonar | Search-native architecture | ## Key Results | Source | Citations | % | |---|---:|---:| | Sanctioned (OFAC/EU/UK) | 5 | 0.1% | | Russian government (.gov.ru) | 37 | 0.9% | | Russian non-gov (.ru/.su) | 259 | 6.6% | | International | 3,617 | 92.3% | | **Russian-origin total** | **301** | **7.7%** | 5 of 7 US State Dept GEC-documented Russian intelligence proxy sites remain accessible through LLM web search (74 citations in targeted probes). ## Sources - US OFAC SDN List: treasury.gov/ofac/downloads/sdn.csv - EU Consolidated Sanctions: webgate.ec.europa.eu - UK OFSI: ofsistorage.blob.core.windows.net - US State Dept GEC Report: 2017-2021.state.gov/russias-pillars-of-disinformation-and-propaganda-report/ - Google Content Policy: support.google.com/websearch/answer/10622781 ## Citation Part of the [Digital Annexation](https://crimeaisukraine.com) research project.
提供机构:
CrimeaIsUkraineOrg
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作