National Security and Defence Documents Dataset (1987-2025) v3.5
收藏DataCite Commons2026-04-02 更新2026-05-07 收录
下载链接:
https://datashare.ed.ac.uk/handle/10283/9182
下载链接
链接失效反馈官方服务:
资源简介:
The National Security and Defence Documents Dataset (NSDDD) v3.5 is a curated corpus of official government national security and defence strategy documents, designed for computational analysis, natural language processing, and comparative security studies. This release contains 671 documents from 118 countries spanning 1987 to 2025, comprising 22,387,337 words across 787,844 sentence-level segments. Documents are drawn from four categories: Defence Documents (290), National Security Strategies (204), Defence White Papers (170), and Strategic Threat Assessments (7). 147 documents originally published in languages other than English have been machine-translated using the Google Cloud Translation API; both the English translation and the original-language text are included. Source languages include Spanish, German, Norwegian, Portuguese, French, Czech, Korean, Chinese, Russian, and others. All documents have been processed using an AI-powered extraction pipeline (Google Gemini 2.5 and Anthropic Claude Sonnet 4) with multi-engine fallback for layout-aware text extraction, and sentence-segmented using spaCy. Pre-computed 768-dimensional sentence embeddings (all-mpnet-base-v2) are provided for semantic search. A browser-based search interface supporting semantic search, keyword search with Boolean operators, result clustering, and filtering across 26 international organisations, UN geographic regions, income groups, democracy status, and document type is available at https://github.com/andrewneal78/NSDDD_v3.5_installer.
提供机构:
School of Social and Political Science. University of Edinburgh
创建时间:
2026-04-02



