five

WaguyMZ/Financial_statements_fraud_dataset

收藏
Hugging Face2025-11-28 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/WaguyMZ/Financial_statements_fraud_dataset
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: gpl task_categories: - text-classification language: - en tags: - finance size_categories: - 100M<n<1B --- Official Dataset of the Paper : [Read Between the Lines: A Robust Financial Statement Fraud Detection Framework](https://hal.science/file/index/docid/5375997/filename/anoymous-submission-with-appendices.pdf) **Guy Stephane Waffo Dzuyo¹², Gael Guibon²³, Christophe Cerisara², Luis Belmar-Letelier¹** ¹ Forvis Mazars ² LORIA, CNRS, Université de Lorraine ³ Université Sorbonne Paris Nord, CNRS, Laboratoire d’Informatique de Paris Nord, LIPN, F-93430 Villetaneuse, France **Emails:** guy.stephane.waffo@forvismazars.com gael.guibon@lipn.fr christophe.cerisara@loria.fr luis.belmar-letelier@forvismazars.com Main Purpose of the Dataset : Supervised Anomaly Detection Task ![image](https://cdn-uploads.huggingface.co/production/uploads/64881cc366656a507f676f97/vd68fvvYceen0TjFW0fJo.png) The preprocessed dataset provided here includes : - 17 863 Summarized quarterly MD&A reports using a self-hosted QWEN3 32B Model. - 3 300 AAER reports - 269 097 Quarterly Financial reports - Final ready-to-use preprocessed datasets with distinct splitting strategies : * Random Splitting * Company-isolated splitting : Our paper demonstrates that the company-isolated setting is a more rigourous framework for the financial statement fraud detection task. * Time splitting. Each dataset comprises 5 folds and each folds comes with its *train.csv* and *test.csv* If you need the entire raw dataset, please contact us at guywaffo@gmail.com If you want contribute to the dataset improvement, feel free to open a thread in `Community` Section and let's discuss
提供机构:
WaguyMZ
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作