TrustHLT/LaCour
收藏数据集描述
数据集概述
该数据集包含从欧洲人权法院官方听证会中转录的法庭听证会记录。这些听证会是从2012年至2022年选择的154个网络直播视频,以原始语言呈现(无翻译)。通过手动标注语言标签和使用pyannote和whisper-large-v2自动处理提取的音频,最终数据集包含4000个发言轮次和88920条独立行。数据集包含两个子集,即转录文本和元数据及链接文档。转录文本还额外提供.txt或.xml格式。
语言
转录文本中最大的部分是英语和法语。较小的部分还包含以下语言:俄语、西班牙语、克罗地亚语、意大利语、葡萄牙语、土耳其语、波兰语、立陶宛语、德语、乌克兰语、匈牙利语、荷兰语、阿尔巴尼亚语、罗马尼亚语、塞尔维亚语。收集的元数据是英语。
数据集结构
数据实例
每个转录文本实例代表一个完整的转录段,类似于对话中的一个轮次。
json { id: 0, webcast_id: 1021112_29112017, segment_id: 0, speaker_name: UNK, speaker_role: Announcer, data: { begin: [12.479999542236328], end: [13.359999656677246], language: [fr], text: [La Cour!] } }
每个文档实例代表与听证会相关的hudoc中的一个文档及其元数据。实际文档可以通过case_id在hudoc中找到。注意:hearing_type表示听证会的类型,type表示文档的类型。如果听证会是“大法庭听证会”,“CHAMBER”文档指的是不同的听证会。
json { id: 16, webcast_id: 1232311_02102012, hearing_title: Michaud v. France (nos. 12323/11), hearing_date: 2012-10-02 00:00:00, hearing_type: Chamber hearing, application_number: [12323/11], case_id: 001-115377, case_name: CASE OF MICHAUD v. FRANCE, case_url: https://hudoc.echr.coe.int/eng?i=001-115377, ecli: ECLI:CE:ECHR:2012:1206JUD001232311, type: CHAMBER, document_date: 2012-12-06 00:00:00, importance: 1, articles: [8, 8-1, 8-2, 34, 35], respondent_government: [FRA], issue: Decision of the National Bar Council of 12 July 2007 “adopting regulations on internal procedures for implementing the obligation to combat money laundering and terrorist financing, and an internal supervisory mechanism to guarantee compliance with those procedures” ; Article 21-1 of the Law of 31 December 1971 ; Law no. 2004-130 of 11 February 2004 ; Monetary and Financial Code, strasbourg_caselaw: André and Other v. France, no 18603/03, 24 July 2008;Bosphorus Hava Yollari Turizm ve Ticaret Anonim Sirketi v. Ireland [GC], no 45036/98, ECHR 2005-VI;[...], external_sources: Directive 91/308/EEC, 10 June 1991;Article 6 of the Treaty on European Union;Charter of Fundamental Rights of the European Union;Articles 169, 170, 173, 175, 177, 184 and 189 of the Treaty establishing the European Community;Recommendations 12 and 16 of the financial action task force (“FATF”) on money laundering;Council of Europe Convention on Laundering, Search, Seizure and Confiscation of the Proceeds from Crime and on the Financing of Terrorism (16 May 2005), conclusion: Remainder inadmissible;No violation of Article 8 - Right to respect for private and family life (Article 8-1 - Respect for correspondence;Respect for private life), separate_opinion: True }
数据字段
transcripts:
- id: 标识符
- webcast_id: 听证会的标识符
- segment_id: 当前听证会中当前发言段的标识符
- speaker_name: 发言者的名字(申请者、政府或第三方未提供)
- speaker_role: 发言者代表的角色/方(Announcer表示公告,Judge表示法官,JudgeP表示法官主席,Applicant表示申请者代表,Government表示应答政府代表,ThirdParty表示第三方介入者代表)
- data: 以下字段的序列
- begin: 行开始的




