five

COHO: Text Corpus Of Holocaust Oral Histories

收藏
Mendeley Data2026-04-09 收录
下载链接:
https://data.mendeley.com/datasets/gz7v268252/1
下载链接
链接失效反馈
官方服务:
资源简介:
This paper outlines the compilation and annotation process of the COHO: Text Corpus of Holocaust Oral Histories. The corpus consists of 500 oral histories from Holocaust survivors, with each narrative retrieved from the Let Them Speak Project (Toth 2021). The text is processed and annotated with metadata detailing both the testimony givers and the interviews themselves. All technical content has been removed, and a unique identifier has been assigned to each question (posed by the interviewer) and answer (provided by the survivor). The corpus complies with TEI guidelines (TEI Consortium 2023). The dataset includes 106,519 questions and 107,125 answers, making it a valuable interdisciplinary resource. Researchers can retrieve and analyse questions and answers separately based on their specific research objectives. This corpus is particularly suited for studies on trauma expression and psychological concepts embedded in survivors' narratives. Additionally, it offers potential for data mining to uncover patterns (e.g., migration trends) and supports natural language processing techniques such as topic modelling, sentiment analysis, and named entity recognition. The COHO data is sourced from the United States Holocaust Memorial Museum (USHMM) and is publicly available under the CC BY-NC-SA 4.0 license.
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作