five

Texts of Trade Agreements Corpus

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://doi.org/10.7910/DVN/Z7IT8Y
下载链接
链接失效反馈
官方服务:
资源简介:
Texts of Trade Agreements (ToTA) is a machine-readable and annotated full text corpus of preferential trade agreements (PTAs) in XML format. The number of trade agreements has dramatically increased since the early 1990s. Trade agreements cover ever more issues and an average agreement text is now around ten times longer than 25 years ago. This makes it more and more difficult to analyze the content of trade agreements and assess their impact on international trade and welfare. Big data and text-as-data methods can help researchers, policy-makers and other stakeholders to better manage the growing complexity of trade agreements. Modern computational methods, however, require the existence of machine-readable texts. While several databases make PTA texts available, they are generally optimized for reading, but not computational analysis. As part of a year-long effort, this project used the WTO RTA Database to locate text and meta-data of close to 450 preferential trade agreements and transformed them into a machine-readable format that allows analysis on the article, chapter or treaty-level of PTA texts. This corpus builds on the WTO Regional Trade Agreements Information System data. We gather metadata and full texts from this source, correct the deficiencies (missing full texts or incorrect metadata), apply optical character recognition or other methods to arrive at machine-readable texts, remove annexes or schedules, impose two-level hierarchy of treaty elements, and, finally, produce XMLs that are stored in xml/ folder. Please note that the texts may contain errors due to optical character recognition deficiencies. The resulting data contains 448 PTA texts notified to the WTO, and two texts for Trans-Pacific Partnership agreement (in English and Spanish). When the PTA texts are available in more than one of the official WTO languages (English, French, Spanish) we prioritise English and report the respective XML in this language. Based on the ToTA infrastructure, one could employ text-as-data methods to automatically map the content of PTAs gaining new insights on trade agreements. Textual similarity measures, for example, are able to capture fine-grained differences in treaty design. So-called dimensionality reduction techniques, which compress the textual information contained in a text into a set of abstract variables, help predict trade flows more accurately than previously available measures. The data is available at https://github.com/mappingtreaties/tota.
创建时间:
2025-04-21
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作