AmanPriyanshu/clone-of-gretel-financial-risk-analysis-v1
收藏Hugging Face2024-12-17 更新2024-12-21 收录
下载链接:
https://hf-mirror.com/datasets/AmanPriyanshu/clone-of-gretel-financial-risk-analysis-v1
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含使用差分隐私保证生成的合成金融风险分析文本,基于2023-2024年间的14,306份SEC文件(10-K、10-Q和8-K)进行训练。数据集旨在训练模型从金融文档中提取关键风险因素并生成结构化摘要,展示了如何利用差分隐私保护敏感信息。数据集支持两个主要任务:特征提取和文本摘要。模型输出包括风险严重程度分类、风险类别识别和结构化风险分析。数据集的总样本数为1,034,训练/测试集划分为827/207,平均文本长度为5,727个字符,隐私保证为ε = 8。
This dataset is a monolingual (English) financial risk analysis dataset containing 1,034 samples, primarily for training models to extract key risk factors and generate structured summaries from financial documents. The dataset is generated based on 14,306 SEC (10-K, 10-Q, and 8-K) filings from 2023-2024, using differential privacy techniques to protect sensitive information. The dataset supports two main tasks: feature extraction (identifying and categorizing financial risks from text) and text summarization (generating structured risk analysis summaries). Model outputs include risk severity classification, risk category identification, and structured risk analysis. The datasets privacy guarantee is ε=8, ensuring the privacy of data generation.
提供机构:
AmanPriyanshu



