five

SUPERSEDED - National Security and Defence Documents Dataset (1987-2025) v3.0

收藏
DataCite Commons2026-04-02 更新2026-05-07 收录
下载链接:
https://datashare.ed.ac.uk/handle/10283/9156
下载链接
链接失效反馈
官方服务:
资源简介:
## This item has been replaced by the one which can be found at https://doi.org/10.7488/ds/8101 ## NSDDD v3 is a comprehensive corpus of 660 national security strategy documents from 118 countries spanning 1987-2025, featuring major improvements over version 2 including cleaner text extraction, improved translations, and enhanced semantic search capabilities. The dataset includes complete metadata for each document with classifications across 26 international organisations (NATO, EU, ASEAN, BRICS, G7, G20, G77, OECD, and more) and 27 geographic and political attributes (UN regions, subregions, World Bank income groups, Freedom House democracy scores, and Huntington civilizational classifications), enabling sophisticated comparative research. All 660 documents were re-extracted using AI-powered PDF processing to produce cleaner output from complex layouts, tables, and scanned documents, resulting in significantly higher quality text suitable for computational analysis. The dataset includes 132 documents translated from original languages (Spanish, German, Portuguese, French, Chinese, Russian, Arabic, Turkish, and 17 other languages), with both original language and English versions provided. Documents are formatted as sentence-segmented text files with numbered sentences preserving paragraph structure, making them ready for machine learning pipelines and natural language processing. The semantic search system uses 768-dimensional MPNet embeddings (upgraded from 512-dimensional USE-4 in v2) and includes advanced features such as configurable context windows (0-5 sentences before and after each match), intelligent clustering with TF-IDF descriptive labels, metadata filtering by organisation, region, income level, and democracy status, and both keyword and semantic similarity search modes. Full search capabilities and interactive tools are available in the GitHub repository at https://github.com/andrewneal78/NSDDD_v3_installer, which includes a Jupyter notebook installer for automated download from DataShare, Python examples for semantic search, tutorial notebooks with research use cases, and command-line search utilities. The dataset supports diverse research applications including comparative content analysis across countries and organisations (threat perception, policy priorities, language use), security discourse evolution over time, cross-national machine learning on security language, metadata-aware filtering for targeted research, and temporal and geographic trend analysis. Researchers, policymakers, civil society organisations, educators, and students can use this dataset to analyse international security and defence policy, threat perception and securitisation, comparative politics and international relations, and computational text analysis. The dataset is provided under Creative Commons Attribution 4.0 International license. For complete technical documentation, installation instructions, and research examples, see the included documentation files WHATS_NEW_IN_NSDDD_V3.md and dataset_inclusion_criteria.md.
提供机构:
University of Edinburgh. School of Social and Political Science
创建时间:
2026-02-10
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作