five

Corpus of Reports: Committee on the Peaceful Uses of Outer Space and its Legal Subcommittee (1990–2025)

收藏
DataCite Commons2026-05-07 更新2026-05-07 收录
下载链接:
https://zenodo.org/doi/10.5281/zenodo.19953083
下载链接
链接失效反馈
官方服务:
资源简介:
Overview This dataset contains a curated corpus of annual reports from the Committee on the Peaceful Uses of Outer Space (COPUOS) and its Legal Subcommittee (LSC) covering the period 1990–2025. The corpus consists exclusively of one official annual report per body per year where available. Each corpus (COPUOS and LSC) is provided in two aligned formats: (1) original PDF documents and (2) processed TXT versions prepared for computational analysis. The Committee on the Peaceful Uses of Outer Space (COPUOS) is a United Nations body responsible for promoting international cooperation in the peaceful uses of outer space and for the development of international space law. Its Legal Subcommittee (LSC) addresses legal issues related to space activities and contributes to the development of the international legal framework governing outer space. Content Annual reports of COPUOS and LSC (once per year when available). Coverage reflects official publication availibility: no report available for COPUOS in 2020 no reports available for LSC in 2010, 2020, and 2024 Only official annual reports are included – addenda, corrigenda, and other supplementary documents are excluded. Key Metrics Time span: 1990–2025 Document type: annual institutional report Dataset structure: two institutional corpora (COPUOS, LSC), each available in two formats (PDF + TXT) Two parallel formats: PDF (original documents) + TXT (edited) Number of documents in COPUOS corpus: 35 Number of documents in LSC corpus: 33 Language: English Extraction, editing and limitation of TXT files Extraction of text from the original PDF files using OCR technology (Google Docs) for older documents (1990–1992), or partial manual reconstruction in cases where automatic extraction failed, followed by copying and pasting all text separately into TextEdit. The final TXT files are optimized for computational analysis but do not represent a fully lossless reproduction of the original documents.  OCR errors may persist in early documents Some manual text reconstruction introduces minor inconsistencies Cleaning procedures reduce but do not totally eliminate formatting noise TXT verision of the dataset is optimized for research use Cleaning procedure encompasses removal of: repeatedly occurring document identifiers; page numbers and numerical or alphabetical section references that interfered with the flow of the text after being copied into a text document; footnotes and reference numbers within the text; of website links in the text that are enclosed in parentheses; table contents whose formatting and borders cannot be copied into text format; line breaks that appear at the end of every line in all documents during copying, and replacing them with a single space; and residual spaces in words containing a hyphen or slash that were inappropriately separated. Data source All documents in the dataset originate from a publicly available source – Documents and Resolutions Database of the United Nations Office for Outer Space Affairs (UNOOSA) – and are presumed to be freely available or distributed in accordance with institutional open access policies. Disclaimer This dataset has been created by Mr René Lušovský using documents available on the website of the UNOOSA. It is a personal academic initiative and is not associated with or endorsed by any institution. The dataset is provided exclusively for research and educational purposes. Users are required to ensure proper citation and compliance with applicable terms and conditions for further use. The United Nations or any related institution accepts no responsibility or liability arising out of my use, or that of third parties, of the documents and information produced, used or published on the Zenodo website.
提供机构:
Zenodo
创建时间:
2026-05-07
二维码
社区交流群
二维码
科研交流群
商业服务