GTC (Genocide Transcript Corpus)

Name: GTC (Genocide Transcript Corpus)
Creator: 雷根斯堡大学
Published: 2022-04-06 18:24:19
License: 暂无描述

arXiv2022-04-06 更新2024-06-21 收录

下载链接：

https://github.com/MiriamSchirmer/genocide-transcript-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

GTC（种族灭绝转录语料库）是由雷根斯堡大学创建的第一个种族灭绝相关法庭转录的标注语料库。该数据集包含1475条文本片段，来源于柬埔寨特别法庭（ECCC）、卢旺达国际刑事法庭（ICTR）和前南斯拉夫国际刑事法庭（ICTY）。数据集的创建旨在为社区提供一个参考语料库，建立新的分类任务基准，并探索领域内的迁移学习。GTC特别关注于标注那些描述暴力经历的证人陈述，这些陈述对于判断案件至关重要。数据集的应用领域主要集中在种族灭绝研究，旨在通过自动化工具减少人工研究的工作量，提高搜索效率。

The Genocide Transcript Corpus (GTC) is the first annotated corpus of genocide-related court transcripts developed by the University of Regensburg. This dataset contains 1475 text segments sourced from the Extraordinary Chambers in the Courts of Cambodia (ECCC), the International Criminal Tribunal for Rwanda (ICTR), and the International Criminal Tribunal for the former Yugoslavia (ICTY). The dataset was constructed to provide the research community with a reference corpus, establish benchmarks for novel classification tasks, and explore in-domain transfer learning. GTC specifically focuses on annotating witness statements that recount violent experiences, as such statements are critical for case adjudication. Its target application domains primarily center on genocide studies, aiming to reduce the workload of manual research and improve search efficiency through automated tools.

提供机构：

雷根斯堡大学

创建时间：

2022-04-06

搜集汇总

背景与挑战

背景概述

GTC（种族灭绝转录语料库）是由雷根斯堡大学创建的标注语料库，包含1475条来自柬埔寨、卢旺达和前南斯拉夫国际刑事法庭的文本片段，重点关注描述暴力经历的证人陈述。该数据集旨在为种族灭绝研究提供参考语料库，建立分类任务基准，并通过自动化工具提高搜索效率，支持迁移学习探索。

以上内容由遇见数据集搜集并总结生成

5,000+

优质数据集

54 个

任务类型

进入经典数据集