Research on Suicidal Ideation Data Augmentation and Recognition Technology Based on Large Language Models (Dataset)

Name: Research on Suicidal Ideation Data Augmentation and Recognition Technology Based on Large Language Models (Dataset)
Creator: Institute of Psychology, Chinese Academy of Sciences; Nankai University
Published: 2025-03-20 00:00:00
License: 暂无描述

科学数据银行2025-03-20 更新2026-04-23 收录

下载链接：

https://www.scidb.cn/detail?dataSetId=ef76834b9e5b4acea207a65b52812b75

下载链接

链接失效反馈

官方服务：

资源简介：

Research 1 Training Dataset (OurAugSGD):This dataset is used for training the suicide ideation data augmentation model in Research 1. The construction method is as follows: To achieve sufficient data augmentation effects, this study employs a combination of zero-shot and few-shot approaches to build the training dataset. By equally incorporating zero-shot data and few-shot data (4000 entries in total), a high-quality dataset named OurAugSGD is formed.Research 1 Test Dataset and Test Results:This dataset is used for evaluating the suicide ideation recognition model in Research 2. The test dataset randomly selects 50 positive samples from the original dataset, which undergo the same prompt engineering processing as the training dataset OurAugSGD for model evaluation. These samples are guaranteed to be non-overlapping with the training dataset to ensure the validity of test results and the model's performance on unseen data.After inferencing the test dataset through various models (baseline models and experimental models) and conducting respective data augmentations, a total of 2028 generated text results are obtained. These results are subject to consistent manual annotation by 6 groups of raters, yielding the final test results.Research 2 Training Dataset (OurDetSGD):This dataset is used for training the suicide ideation recognition model in Research 2. The construction method is as follows: First, 2000 positive samples and 4000 negative samples are randomly extracted from the original dataset of Research 1. These samples are fused with 2000 samples generated by the self-developed model OurAugSTM, resulting in 8000 text entries with a 1:1 positive-negative ratio, which serves as the training dataset OurDetSGD.Research 2 Training Dataset (OriginDetSGD):This dataset is used for training the suicide ideation recognition model in Research 2. The construction method is as follows: First, 2000 positive samples and 4000 negative samples are randomly extracted from the original dataset of Research 1. These samples are fused to form 6000 text entries with a 1:2 positive-negative ratio, which serves as the training dataset OriginDetSGD.Research 2 Test Dataset:This dataset is used for evaluating the suicide ideation recognition model in Research 2. The test dataset is constructed following strict non-overlapping principles: after excluding samples used in the training dataset OurDetSGD, 1000 entries (500 positive and 500 negative samples, maintaining a 1:1 ratio) are randomly extracted from the original dataset (excluding data used in Research 1). Similar to the training dataset OurDetSGD, the test dataset undergoes prompt engineering processing to ensure format consistency with the training dataset, guaranteeing validity and consistency during model evaluation.

提供机构：

Institute of Psychology, Chinese Academy of Sciences; Nankai University

创建时间：

2025-03-19

5,000+

优质数据集

54 个

任务类型

进入经典数据集