Corpus of Reports: Committee on the Peaceful Uses of Outer Space and its Legal Subcommittee (1990–2025)
收藏DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19953083
下载链接
链接失效反馈官方服务:
资源简介:
Overview
This dataset contains a curated corpus of annual reports from the Committee on the Peaceful Uses of Outer Space (COPUOS) and its Legal Subcommittee (LSC) covering the period 1990–2025. The corpus consists exclusively of one official annual report per body per year where available. Each corpus (COPUOS and LSC) is provided in two aligned formats: (1) original PDF documents and (2) processed TXT versions prepared for computational analysis.
The Committee on the Peaceful Uses of Outer Space (COPUOS) is a United Nations body responsible for promoting international cooperation in the peaceful uses of outer space and for the development of international space law. Its Legal Subcommittee (LSC) addresses legal issues related to space activities and contributes to the development of the international legal framework governing outer space.
Content
Annual reports of COPUOS and LSC (once per year when available). Coverage reflects official publication availibility:
no report available for COPUOS in 2020
no reports available for LSC in 2010, 2020, and 2024
Only official annual reports are included – addenda, corrigenda, and other supplementary documents are excluded.
Key Metrics
Time span: 1990–2025
Document type: annual institutional report
Dataset structure: two institutional corpora (COPUOS, LSC), each available in two formats (PDF + TXT)
Two parallel formats: PDF (original documents) + TXT (edited)
Number of documents in COPUOS corpus: 35
Number of documents in LSC corpus: 33
Language: English
Extraction, editing and limitation of TXT files
Extraction of text from the original PDF files using OCR technology (Google Docs) for older documents (1990–1992), or partial manual reconstruction in cases where automatic extraction failed, followed by copying and pasting all text separately into TextEdit. The final TXT files are optimized for computational analysis but do not represent a fully lossless reproduction of the original documents.
OCR errors may persist in early documents
Some manual text reconstruction introduces minor inconsistencies
Cleaning procedures reduce but do not totally eliminate formatting noise
TXT verision of the dataset is optimized for research use
Cleaning procedure encompasses removal of: repeatedly occurring document identifiers; page numbers and numerical or alphabetical section references that interfered with the flow of the text after being copied into a text document; footnotes and reference numbers within the text; of website links in the text that are enclosed in parentheses; table contents whose formatting and borders cannot be copied into text format; line breaks that appear at the end of every line in all documents during copying, and replacing them with a single space; and residual spaces in words containing a hyphen or slash that were inappropriately separated.
Data source
All documents in the dataset originate from a publicly available source – Documents and Resolutions Database of the United Nations Office for Outer Space Affairs (UNOOSA) – and are presumed to be freely available or distributed in accordance with institutional open access policies.
Disclaimer
This dataset has been created by Mr René Lušovský using documents available on the website of the UNOOSA. It is a personal academic initiative and is not associated with or endorsed by any institution. The dataset is provided exclusively for research and educational purposes. Users are required to ensure proper citation and compliance with applicable terms and conditions for further use.
The United Nations or any related institution accepts no responsibility or liability arising out of my use, or that of third parties, of the documents and information produced, used or published on the Zenodo website.
提供机构:
Zenodo
创建时间:
2026-05-07



