Balinese Story Texts Dataset - Characters, Aliases, and their Classification
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/h2tf5ymcp9
下载链接
链接失效反馈官方服务:
资源简介:
This dataset consists of 120 Balinese story texts (as known as Satua Bali) which have been annotated for narrative text analysis purposes, including character identification, alias clustering, and character classification into protagonist or antagonist. The labeling involved two Balinese native speakers who were fluent in understanding Balinese story texts. One of them is an expert in the fields of sociolinguistics and macrolinguistics. Reliability and level of agreement in the dataset are measured by Cohen's kappa coefficient, Jaccard similarity coefficient, and F1-score and all of them show almost perfect agreement values (>0,81).
There are four main folders, each used for different narrative text analysis purposes:
1. First Dataset (charsNamedEntity): 89,917 annotated tokens with five character named entity labels (ANM, ADJ, PNAME, GODS, OBJ) for character named entity recognition purpose
2. Second Dataset (charsExtraction): 6,634 annotated sentences for the purpose of character identification at the sentence level
3. Third Dataset (charsAliasClustering): 930 lists of character groups from 120 story texts for the purpose of alias clustering
4. Fourth Dataset (charsClassification): 848 lists of character groups that have been classified into two groups (Protagonist and Antagonist)
创建时间:
2024-03-25



