自杀风险筛查数据
收藏浙江省数据知识产权登记平台2024-09-05 更新2024-09-06 收录
下载链接:
https://www.zjip.org.cn/home/announce/trends/58475
下载链接
链接失效反馈官方服务:
资源简介:
通过采集心理诊疗中的记录,并通过数据处理和数据加工流程,将标注过的数据被转化为自杀预警模型的高质量、高标注准确性的训练集。这些数据包括企业采集的应用上、人工咨询的记录相关的自杀相关语料。首先,人工标注这些数据以确保标注的准确性和一致性。接着,构建一个包含自杀预警相关信息的RAG知识库,用于在训练和推理过程中搜索相似的意图和情景。通过心理大模型通过该知识库选择出符合的预警结果,并经过超参数调优和模型优化以提升其准确性和鲁棒性。最终,训练好的自杀预警模型能够准确识别自杀意图并提供预警结果,广泛应用于社交媒体监控、心理健康热线、在线心理咨询、学校心理健康管理等多个场景,辅助心理健康专业人士进行干预和治疗。(1) 数据来源:原始数据来源于企业应用(连小信APP、洞见人和大模型官网、人工咨询记录)中的自杀相关语料。
(2) 数据处理和标注:对收集到的语料进行清洗和标准化处理,确保数据质量。通过人工标注这些数据,同时设置审核机制,以确保标注的准确性和一致性。
(3) RAG知识库构建:构建一个包含自杀预警相关信息的RAG知识库,用于在训练和推理过程中搜索相似的意图和情景。
(4) 深度学习架构选择:选择适合处理文本数据的深度学习架构,如Transformer模型(如BERT或GPT-4)。
(5) 模型训练:在标注好的数据集上训练深度学习模型,通过监督学习的方式让模型学习识别自杀意图。使用交叉验证和不同性能指标(如准确率、召回率)评估模型的识别能力。
(6) 超参数调优:进行超参数调优,包括学习率、批量大小、网络层数等,以优化模型性能。
(7) 模型优化与验证:根据评估结果,对模型进行剪枝、正则化等优化措施。在独立的测试集上验证模型的性能,确保模型在未见数据上也能表现良好。
(8) 预警结果生成:通过训练好的模型和RAG知识库,实时分析输入的语料,搜索相似的意图,并生成符合的预警结果。
By collecting records from psychological diagnosis and treatment sessions and undergoing data processing and refinement workflows, annotated data is transformed into a high-quality training dataset with high annotation accuracy for suicide warning models. This dataset includes suicide-related corpora sourced from enterprise applications and manual counseling records. First, manual annotation is performed on these data to ensure annotation accuracy and consistency. Next, a RAG knowledge base containing suicide warning-related information is constructed to retrieve similar intentions and scenarios during model training and inference. The mental health large language model selects appropriate warning results using this knowledge base, followed by hyperparameter tuning and model optimization to improve the model's accuracy and robustness. Finally, the trained suicide warning model can accurately identify suicide intentions and generate corresponding warning results, which can be widely applied in multiple scenarios including social media monitoring, mental health hotlines, online psychological counseling, and school mental health management, assisting mental health professionals with intervention and treatment.
(1) Data Source: The original data are suicide-related corpora collected from enterprise applications (Lianxiaoxin APP, Dongjianren, and the official website of the large model) as well as manual counseling records.
(2) Data Processing and Annotation: The collected corpora are cleaned and standardized to ensure data quality. Manual annotation is performed on these data, with an audit mechanism established to guarantee annotation accuracy and consistency.
(3) RAG Knowledge Base Construction: A RAG knowledge base containing suicide warning-related information is built to retrieve similar intentions and scenarios during training and inference.
(4) Deep Learning Architecture Selection: Deep learning architectures suitable for text processing are selected, such as Transformer-based models (e.g., BERT or GPT-4).
(5) Model Training: The deep learning model is trained on the annotated dataset, enabling it to learn to identify suicide intentions through supervised learning. Cross-validation and various performance metrics (e.g., accuracy, recall) are used to evaluate the model's recognition capability.
(6) Hyperparameter Tuning: Hyperparameter tuning is carried out, including learning rate, batch size, number of network layers, etc., to optimize model performance.
(7) Model Optimization and Validation: Based on the evaluation results, optimization measures such as model pruning and regularization are implemented. The model's performance is validated on an independent test set to ensure it performs well on unseen data.
(8) Warning Result Generation: The trained model and RAG knowledge base are used to conduct real-time analysis of input corpora, retrieve similar intentions, and generate appropriate warning results.
提供机构:
浙江连信数字有限公司
创建时间:
2024-08-08
搜集汇总
数据集介绍

特点
自杀风险筛查数据集包含993条记录,每日更新,数据来源于企业应用中的自杀相关语料,经过人工标注和标准化处理,构建了RAG知识库,并通过深度学习模型进行训练和优化,最终生成预警结果。该数据集广泛应用于社交媒体监控、心理健康热线、在线心理咨询等场景,辅助心理健康专业人士进行干预和治疗。
以上内容由遇见数据集搜集并总结生成



