rlandismd/suicide_prediction_dataset_phr
收藏Hugging Face2026-04-24 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/rlandismd/suicide_prediction_dataset_phr
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为vibhorag101/suicide_prediction_dataset_phr,主要用于自杀倾向预测。数据集包含文本和二元标签(自杀/非自杀),并经过了一系列的预处理步骤,包括转换为小写、移除数字和特殊字符、URL、表情符号、重音字符、词缩写、多余空格、重复字符等。此外,数据集还进行了词元化和停用词移除(不包括not)。数据集分为训练集和测试集,训练集约186k样本,测试集约23k样本,采用80:10:10的比例划分。需要注意的是,由于预处理可能导致某些文本的标签不准确。
The dataset is named vibhorag101/suicide_prediction_dataset_phr and is primarily used for suicidal tendency prediction. It contains text with binary labels for suicide or non-suicide. The dataset has undergone a series of preprocessing steps, including conversion to lowercase, removal of numbers and special characters, URLs, emojis, accented characters, word contractions, extra white spaces, and consecutive characters repeated more than 3 times. Additionally, the text was tokenized, lemmatized, and stopwords were removed (excluding not). The dataset is divided into training and test sets, with approximately 186k samples in the training set and 23k samples in the test set, using an 80:10:10 ratio. Note that preprocessing may result in incorrect labels for some texts.
提供机构:
rlandismd



