California Police Misconduct Dataset
收藏arXiv2025-09-30 收录
下载链接:
https://cdss.berkeley.edu/news/state-funds-development-first-its-kind-police-misconduct-database
下载链接
链接失效反馈官方服务:
资源简介:
该数据集包含了来自加州不同警察部门的227份文件,主要关注于识别警察的不当行为。这些文件平均含有12,500个词汇,其中2%的文件超过了128,000个词汇的上下文窗口限制。数据集包含了法庭文件、警察报告和内部调查报告。由于包含个人识别信息(PII)且内容敏感,这些数据无法开源。规模上,该数据集是从数十万文件中抽取的227份样本。任务是对表现出不当行为的警察生成详细的违规摘要。
This dataset consists of 227 documents sourced from various police departments throughout California, with its core objective being the identification of police misconduct. Each document has an average word count of 12,500, and 2% of the documents exceed the 128,000-word context window limit. The dataset includes court documents, police reports, and internal investigation reports. Owing to the inclusion of personally identifiable information (PII) and sensitive content, this dataset cannot be made open-source. In terms of scale, this 227-document sample is curated from hundreds of thousands of source files. The primary task of this dataset is to generate detailed violation summaries for police officers who have demonstrated misconduct.
提供机构:
California Police Records Access Project



