five

Balinese Story Texts Dataset - Characters, Aliases, and their Classification

收藏
doi.org2025-01-22 收录
下载链接:
http://doi.org/10.17632/h2tf5ymcp9.3
下载链接
链接失效反馈
官方服务:
资源简介:
This dataset consists of 120 Balinese story texts (as known as Satua Bali) which have been annotated for narrative text analysis purposes, including character identification, alias clustering, and character classification into protagonist or antagonist. The labeling involved two Balinese native speakers who were fluent in understanding Balinese story texts. One of them is an expert in the fields of sociolinguistics and macrolinguistics. Reliability and level of agreement in the dataset are measured by Cohen's kappa coefficient, Jaccard similarity coefficient, and F1-score and all of them show almost perfect agreement values (>0,81). There are four main folders, each used for different narrative text analysis purposes: 1. First Dataset (charsNamedEntity): 89,917 annotated tokens with five character named entity labels (ANM, ADJ, PNAME, GODS, OBJ) for character named entity recognition purpose 2. Second Dataset (charsExtraction): 6,634 annotated sentences for the purpose of character identification at the sentence level 3. Third Dataset (charsAliasClustering): 930 lists of character groups from 120 story texts for the purpose of alias clustering 4. Fourth Dataset (charsClassification): 848 lists of character groups that have been classified into two groups (Protagonist and Antagonist)

本数据集包含120篇巴厘岛故事文本(亦称Satua Bali),旨在进行叙事文本分析,包括角色识别、别名聚类以及角色分类为正派或反派。标注工作由两位精通巴厘岛故事文本的巴厘岛本土语言者完成,其中一位在民族语言学和宏观语言学领域具有专业知识。数据集中的一致性与可靠性通过Cohen's kappa系数、Jaccard相似系数以及F1分数进行衡量,所有指标均显示出近乎完美的吻合度(>0.81)。数据集分为四个主要文件夹,分别服务于不同的叙事文本分析目的: 1. 第一数据集(charsNamedEntity):包含89,917个已标注的token,对五类角色命名实体(ANM、ADJ、PNAME、GODS、OBJ)进行识别。 2. 第二数据集(charsExtraction):包含6,634个已标注的句子,用于句子层面的角色识别。 3. 第三数据集(charsAliasClustering):由120篇故事文本中提取的930个角色群体列表,用于别名聚类。 4. 第四数据集(charsClassification):包含848个已被分类为两组(主角与反派)的角色群体列表。
提供机构:
Mendeley Data
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作