five

WTO-Text

收藏
魔搭社区2025-11-27 更新2025-06-21 收录
下载链接:
https://modelscope.cn/datasets/PleIAs/WTO-Text
下载链接
链接失效反馈
官方服务:
资源简介:
# Dataset Card for WTO Documents Dataset ## Dataset Overview **Title**: WTO Documents Dataset **Source**: [World Trade Organization Documents Online](https://docs.wto.org/dol2fe/Pages/FE_Search/FE_S_S005.aspx) **Description**: The WTO Documents Dataset is a comprehensive collection of official documentation from the World Trade Organization (WTO). This dataset is sourced from the WTO's official Documents Online platform, which provides access to documents in the three official languages (English, French, and Spanish) from 1995 onwards. The dataset is updated daily and includes documents in PDF and Word formats. Each document is accompanied by a descriptive catalog record. The dataset offers extensive search capabilities, enabling users to retrieve documents based on various criteria such as symbol, country, topic, and full-text search within the document text. ## Contents and Structure The dataset comprises a vast number of documents categorized and stored in 131 Parquet files named WTO_1 to WTO_131. The structure and contents of the dataset are as follows: ### General Statistics - **Total number of words**: 1,676,595,872 - **Total number of entries**: 642,627 - **Average number of words per document**: 2,364.08 - **Number of zero-word documents**: 70,869 - **Total number of Parquet files**: 131 ### Document Distribution - **Average number of entries per Parquet file**: 4,906 - **Average number of zero-word documents per Parquet file**: 541 ### Language Distribution (Sample of 10,000 documents) | Language | Count | |----------|-------| | French (fr) | 3,027 | | English (en) | 3,593 | | Spanish (es) | 3,168 | | Catalan (ca) | 10 | | Chinese (Simplified) (zh-cn) | 33 | | Portuguese (pt) | 22 | | Korean (ko) | 31 | | Arabic (ar) | 29 | | Thai (th) | 10 | | German (de) | 28 | | Welsh (cy) | 1 | | Italian (it) | 2 | | Hebrew (he) | 5 | | Ukrainian (uk) | 11 | | Chinese (Traditional) (zh-tw) | 1 | | Turkish (tr) | 7 | | Romanian (ro) | 3 | | Danish (da) | 1 | | Swedish (sv) | 1 | | Dutch (nl) | 1 | | Indonesian (id) | 4 | | Finnish (fi) | 2 | | Croatian (hr) | 1 | | Russian (ru) | 3 | | Vietnamese (vi) | 3 | | Greek (el) | 1 | | Japanese (ja) | 1 | | Czech (cs) | 1 | ### Search Interfaces The WTO Documents Online platform provides seven different search interfaces to facilitate document retrieval: 1. **Recent Documents**: Access to the latest documents posted. 2. **Commonly-consulted**: Easy retrieval of regularly requested documents. 3. **Documents for Meetings**: List of formal and informal meetings of WTO bodies and associated documents. 4. **By Topic**: Search for documents by broad subject category. 5. **Notifications**: Search notification documents by notifying members and WTO legal requirements. 6. **Advanced Search**: Additional search criteria such as symbol, requirement topic, and classification. Full-text search capabilities are available. 7. **GATT Module**: Access to official documents issued under the General Agreement on Tariffs and Trade (GATT). Includes documents from the Uruguay Round of trade negotiations, with more documents to be added progressively. ## Licensing The dataset is available under the [CC0 1.0 Universal (CC0 1.0) Public Domain Dedication](https://creativecommons.org/publicdomain/zero/1.0/) license, which allows for free use, distribution, and reproduction in any medium, provided the original author and source are credited. ## Author This dataset has been compiled and maintained by PleIAs. ## Usage and Applications The WTO Documents Dataset is an invaluable resource for researchers, policymakers, and legal professionals interested in international trade law and policy. It provides a comprehensive archive of WTO's official documentation, offering insights into trade negotiations, agreements, and disputes. The dataset's extensive search capabilities make it easy to navigate and retrieve specific documents, facilitating in-depth research and analysis. This dataset card aims to provide an exhaustive overview of the WTO Documents Dataset, ensuring users have all necessary information to effectively utilize this resource in their work.

# 《WTO文档数据集》数据集卡片 ## 数据集概览 **标题**:WTO文档数据集 **来源**:[世界贸易组织文档在线平台](https://docs.wto.org/dol2fe/Pages/FE_Search/FE_S_S005.aspx) **描述**:WTO文档数据集是世界贸易组织(World Trade Organization, WTO)官方文档的综合性合集。本数据集源自WTO官方文档在线平台,该平台可获取1995年以来的三种官方语言(英语、法语、西班牙语)文档。数据集每日更新,包含PDF及Word格式的文档,每份文档均附带描述性编目记录。本数据集具备完善的检索功能,支持用户通过符号、国家、主题等多种条件检索文档,亦可在文档文本内进行全文检索。 ## 内容与结构 数据集包含海量文档,分类存储于131个名为WTO_1至WTO_131的Parquet文件(Parquet)中。数据集的结构与内容如下: ### 总体统计数据 - **总词数**:1,676,595,872 - **总条目数**:642,627 - **单文档平均词数**:2,364.08 - **零词文档数量**:70,869 - **Parquet文件总数**:131 ### 文档分布情况 - **每个Parquet文件平均条目数**:4,906 - **每个Parquet文件平均零词文档数**:541 ### 语言分布(10000份文档样本) | 语言 | 数量 | |----------|-------| | 法语(fr) | 3,027 | | 英语(en) | 3,593 | | 西班牙语(es) | 3,168 | | 加泰罗尼亚语(ca) | 10 | | 简体中文(zh-cn) | 33 | | 葡萄牙语(pt) | 22 | | 韩语(ko) | 31 | | 阿拉伯语(ar) | 29 | | 泰语(th) | 10 | | 德语(de) | 28 | | 威尔士语(cy) | 1 | | 意大利语(it) | 2 | | 希伯来语(he) | 5 | | 乌克兰语(uk) | 11 | | 繁体中文(zh-tw) | 1 | | 土耳其语(tr) | 7 | | 罗马尼亚语(ro) | 3 | | 丹麦语(da) | 1 | | 瑞典语(sv) | 1 | | 荷兰语(nl) | 1 | | 印度尼西亚语(id) | 4 | | 芬兰语(fi) | 2 | | 克罗地亚语(hr) | 1 | | 俄语(ru) | 3 | | 越南语(vi) | 3 | | 希腊语(el) | 1 | | 日语(ja) | 1 | | 捷克语(cs) | 1 | ### 检索界面 世界贸易组织文档在线平台提供七种不同的检索界面,以方便文档检索: 1. **最新文档**:可访问最新发布的文档 2. **常用文档**:便捷检索高频请求的文档 3. **会议文档**:世界贸易组织各机构正式及非正式会议清单与关联文档 4. **按主题检索**:通过宽泛的主题类别检索文档 5. **通知文档**:按通知成员及WTO法律要求检索通知类文档 6. **高级检索**:支持符号、主题要求、分类等额外检索条件,具备全文检索功能 7. **关贸总协定模块**:可访问根据《关税与贸易总协定(General Agreement on Tariffs and Trade, GATT)》发布的官方文档,包含乌拉圭回合贸易谈判相关文档,后续将逐步新增更多文档。 ## 授权协议 本数据集采用[CC0 1.0 通用公共领域授权协议(CC0 1.0 Universal (CC0 1.0) Public Domain Dedication)](https://creativecommons.org/publicdomain/zero/1.0/),允许用户在注明原作者与来源的前提下,于任何媒介中自由使用、分发及复制本数据集。 ## 数据集作者 本数据集由PleIAs编译并维护。 ## 应用场景与使用价值 WTO文档数据集对于研究国际贸易法与政策的研究者、政策制定者及法律专业人士而言,是极其宝贵的资源。它提供了WTO官方文档的全面存档,可助力深入了解贸易谈判、协定与争端。数据集完善的检索功能便于用户快速定位并获取特定文档,为深度研究与分析提供有力支撑。 本数据集卡片旨在全面概述WTO文档数据集,确保用户掌握充分必要信息,以高效利用该资源开展相关工作。
提供机构:
maas
创建时间:
2025-06-19
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作