opioidarchive/oida-qa
收藏Hugging Face2024-11-22 更新2024-12-14 收录
下载链接:
https://hf-mirror.com/datasets/opioidarchive/oida-qa
下载链接
链接失效反馈官方服务:
资源简介:
OIDA-QA是一个基于UCSF-JHU Opioid Industry Documents Archive (OIDA)构建的多模态基准数据集。该档案库保存了超过400万份文件,这些文件是从阿片类药物诉讼和其他来源发布的以前内部公司文件中提取的,揭示了美国阿片类药物危机高峰期的行业情况。OIDA-QA包含40万训练文档和10万测试文档,以及超过300万的问题-答案对,这些对是从文档中提取的文本、视觉和布局信息生成的。该项目专注于开发特定领域的大型语言模型(LLMs),并展示了在文档信息提取和问答任务中的改进。
OIDA-QA is a multimodal benchmark built on the UCSF-JHU Opioid Industry Documents Archive (OIDA). It contains 400k training and 10k testing documents, along with over 3 million question-answer (QA) pairs generated using textual, visual, and layout information extracted from the documents. The dataset focuses on developing domain-specific Large Language Models (LLMs) and demonstrates improvements in document information extraction and question-answering tasks. The source data includes documents from manufacturers, distributors, and pharmacies involved in the U.S. opioid crisis, as well as litigants and courts involved in opioid litigation. The dataset is licensed under Creative Commons Attribution-NonCommercial 4.0 International.
提供机构:
opioidarchive



