AdaptLLM/RCT

Name: AdaptLLM/RCT
Creator: AdaptLLM
Published: 2024-07-19 02:51:13
License: 暂无描述

Hugging Face2024-07-19 更新2024-06-11 收录

下载链接：

https://hf-mirror.com/datasets/AdaptLLM/RCT

下载链接

链接失效反馈

官方服务：

资源简介：

RCT数据集用于ICLR 2024论文中的研究，探讨了通过持续预训练将大规模预训练语料库转化为阅读理解文本的方法，以提高在生物医学、金融和法律领域的提示性能。数据集包含训练、验证和测试集，适用于文本分类、问答和零样本分类任务。数据集的语言为英语，标签包括医学、化学和生物学。

The RCT dataset is used in the ICLR 2024 paper to explore the method of transforming large-scale pre-training corpora into reading comprehension texts through continual pre-training, aiming to improve prompting performance in biomedicine, finance, and law domains. The dataset includes train, validation, and test splits and is suitable for text classification, question answering, and zero-shot classification tasks. The dataset is in English and tagged with medical, chemistry, and biology.

提供机构：

AdaptLLM

原始信息汇总

数据集概述

数据集名称

RCT

数据集文件

训练集: train.jsonl
验证集: dev.jsonl
测试集: test.jsonl

任务类别

文本分类
问答
零样本分类

语言

英语

数据集来源

该数据集用于ICLR 2024论文《Adapting Large Language Models via Reading Comprehension》。

数据集用途

用于探索大型语言模型在特定领域语料库上的持续预训练，以及通过阅读理解方法改进问题回答能力。

5,000+

优质数据集

54 个

任务类型

进入经典数据集