five

California Police Misconduct Dataset

收藏
arXiv2025-09-30 收录
下载链接:
https://cdss.berkeley.edu/news/state-funds-development-first-its-kind-police-misconduct-database
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集包含了来自加州不同警察部门的227份文件,主要关注于识别警察的不当行为。这些文件平均含有12,500个词汇,其中2%的文件超过了128,000个词汇的上下文窗口限制。数据集包含了法庭文件、警察报告和内部调查报告。由于包含个人识别信息(PII)且内容敏感,这些数据无法开源。规模上,该数据集是从数十万文件中抽取的227份样本。任务是对表现出不当行为的警察生成详细的违规摘要。

This dataset consists of 227 documents sourced from various police departments throughout California, with its core objective being the identification of police misconduct. Each document has an average word count of 12,500, and 2% of the documents exceed the 128,000-word context window limit. The dataset includes court documents, police reports, and internal investigation reports. Owing to the inclusion of personally identifiable information (PII) and sensitive content, this dataset cannot be made open-source. In terms of scale, this 227-document sample is curated from hundreds of thousands of source files. The primary task of this dataset is to generate detailed violation summaries for police officers who have demonstrated misconduct.
提供机构:
California Police Records Access Project
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作