Cognitive Distortion Dataset for Text Classification in Bahasa Indonesia
收藏Mendeley Data2026-04-18 收录
下载链接:
https://data.mendeley.com/datasets/k84bkv8dkt
下载链接
链接失效反馈官方服务:
资源简介:
This dataset is text data related to cognitive distortion sentences that are closely related to thought disorder. This is the first dataset of cognitive distortion sentences in Indonesian. This dataset is a collection of distortion/non-distortion sentences generated from online questionnaire answers. The questions are compiled by experts in this case a psychologist. Annotation is also done by experts to obtain distortion classes. The distribution of existing cognitive distortion classes is adjusted to the theory of Burns, D.D. (1999) in the book "The Feeling Good Handbook". The total generated sentence data is 4662, there are complete sentences and parts of sentences that are distortion parts flanked by the "$" sign, along with labels from two annotators in separate columns. Several distortion classes with a limited number of samples were augmented using the back-translation method. The four augmented classes are "Mental Filter," "All-or-Nothing Thinking," "Magnification or Minimization," and "Emotional Reasoning." Each class was expanded to a total of 200 samples. The back-translation process utilized five languages: Chinese (ZH), English (EN), Javanese (JV), Malay (MS), and Tagalog (TG). In the accompanying CSV file, the "DATA STATUS" column indicates the origin of each sentence. Entries labeled "ORI-RAW" refer to raw data collected directly from questionnaire responses. Entries labeled "DIS-[...]" represent distortion sentences generated through back-translation using the five language codes (ZH, EN, JV, MS, and TG). Apart from Indonesian, an English version is also available.
创建时间:
2025-06-16



