全球城市AI大赛

Name: 全球城市AI大赛
Creator: 阿里云天池
Published: 2026-05-31 15:07:30
License: 暂无描述

阿里云天池2026-05-31 更新2024-03-07 收录

下载链接：

https://tianchi.aliyun.com/dataset/163783

下载链接

链接失效反馈

官方服务：

资源简介：

医学领域的文献库中蕴含了丰富的疾病诊断和治疗信息，如何高效地从海量文献中提取关键信息，进行疾病诊断和治疗推荐，对于临床医生和研究人员具有重要意义。训练集与测试集数据为CSV格式文件，各字段分别是标题、作者和摘要。Keywords为任务2的标签，label为任务1的标签。训练集和测试集都可以通过pandas读取。第一个任务看作是一个文本二分类任务。机器需要根据对论文摘要等信息的理解，将论文划分为医学领域的文献和非医学领域的文献两个类别之一。第二个任务看作是一个文本关键词识别任务。机器需要从给定的论文中识别和提取出与论文内容相关的关键词。医学领域的文献库中蕴含了丰富的疾病诊断和治疗信息，如何高效地从海量文献中提取关键信息，进行疾病诊断和治疗推荐，对于临床医生和研究人员具有重要意义。训练集与测试集数据为CSV格式文件，各字段分别是标题、作者和摘要。Keywords为任务2的标签，label为任务1的标签。训练集和测试集都可以通过pandas读取。第一个任务看作是一个文本二分类任务。机器需要根据对论文摘要等信息的理解，将论文划分为医学领域的文献和非医学领域的文献两个类别之一。第二个任务看作是一个文本关键词识别任务。机器需要从给定的论文中识别和提取出与论文内容相关的关键词。

Medical literature databases abound with rich information regarding disease diagnosis and treatment. Efficiently extracting critical information from vast volumes of literature to facilitate disease diagnosis and treatment recommendations holds significant importance for clinicians and researchers. The training and test datasets are in CSV format, with the respective fields being title, author, and abstract. Keywords serve as the labels for Task 2, while "label" is the label for Task 1. Both the training and test sets can be loaded using Pandas. The first task is formulated as a text binary classification task. Models are required to classify papers into one of two categories—medical literature and non-medical literature—based on the understanding of information such as paper abstracts. The second task is formulated as a text keyword recognition task. Models are required to identify and extract keywords relevant to the content of the given papers. Medical literature databases abound with rich information regarding disease diagnosis and treatment. Efficiently extracting critical information from vast volumes of literature to facilitate disease diagnosis and treatment recommendations holds significant importance for clinicians and researchers. The training and test datasets are in CSV format, with the respective fields being title, author, and abstract. Keywords serve as the labels for Task 2, while "label" is the label for Task 1. Both the training and test sets can be loaded using Pandas. The first task is formulated as a text binary classification task. Models are required to classify papers into one of two categories—medical literature and non-medical literature—based on the understanding of information such as paper abstracts. The second task is formulated as a text keyword recognition task. Models are required to identify and extract keywords relevant to the content of the given papers.

提供机构：

阿里云天池

创建时间：

2023-10-19

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集聚焦于医学文献处理，旨在从海量文献中提取关键信息以辅助疾病诊断和治疗推荐。它包含两个任务：一是文本二分类，用于区分医学与非医学文献；二是关键词识别，用于提取论文相关内容。数据以CSV格式提供，包含标题、作者和摘要等字段。

以上内容由遇见数据集搜集并总结生成