FrancophonIA/ParlaMint_4.1
收藏Hugging Face2025-03-30 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/FrancophonIA/ParlaMint_4.1
下载链接
链接失效反馈官方服务:
资源简介:
ParlaMint 4.1是一个包含29个欧洲国家和自治区议会辩论记录的可比语料库,时间从2015年开始,到2022年中期。各个语料库包含的词汇量从9百万到1.26亿不等,整个集合超过12亿词汇。这些转录记录按照日期划分,包含会议的术语、会话和会议信息,并对演讲者及其角色(如主席、常规演讲者)进行了标记。演讲中还包含了转录者的标记注释,如转录中的空缺、中断、掌声等。语料库具有广泛的元数据,尤其是关于演讲者的信息(姓名、性别、议员和部长身份、党派隶属关系)、他们的政党以及议会团体(名称、联合/反对派状态、维基百科来源的政治倾向左右划分以及CHES变量)。部分语料库还有更详细的元数据,如演讲者的出生年份、维基百科文章链接、在各种委员会的成员身份等。转录文本还标记了它们所属的子语料库(“参考”至2020-01-30,“新冠疫情”从2020-01-31,“战争”从2022-02-24)。
ParlaMint 4.1 is a set of comparable corpora containing transcriptions of parliamentary debates from 29 European countries and autonomous regions, starting mostly in 2015 and extending to mid-2022. The individual corpora range from 9 million to 126 million words, with the complete set containing over 1.2 billion words. The transcriptions are divided by days and include details on the term, session, and meeting, with speeches marked by the speaker and their role (e.g., chair, regular speaker). The speeches also contain transcribers annotations, such as gaps, interruptions, and applause. The corpora have extensive metadata, particularly on speakers (name, gender, MP and minister status, party affiliation), their political parties, and parliamentary groups (name, coalition/opposition status, left-to-right political orientation sourced from Wikipedia, and CHES variables). Some corpora include additional metadata, such as the year of birth of speakers, links to their Wikipedia articles, and their membership in various committees. The transcripts are also marked with the subcorpora they belong to (reference, until 2020-01-30, COVID, from 2020-01-31, and war, from 2022-02-24).
提供机构:
FrancophonIA



