allenai/ai2_arc|自然语言处理数据集|机器学习数据集
收藏数据集概述
基本信息
- 名称: Ai2Arc
- 语言: 英语(en-US)
- 许可证: CC-BY-SA-4.0
- 多语言性: 单语种
- 数据集大小: 1K<n<10K
- 源数据: 原始数据
- 任务类别: 问答(question-answering)
- 任务ID:
- open-domain-qa
- multiple-choice-qa
数据集结构
- 配置名称:
- ARC-Challenge
- ARC-Easy
- 特征:
id
: 字符串类型question
: 字符串类型choices
: 序列类型,包含:text
: 字符串类型label
: 字符串类型
answerKey
: 字符串类型
- 数据分割:
- ARC-Challenge:
- 训练集: 1119个样本,349760字节
- 测试集: 1172个样本,375511字节
- 验证集: 299个样本,96660字节
- ARC-Easy:
- 训练集: 2251个样本,619000字节
- 测试集: 2376个样本,657514字节
- 验证集: 570个样本,157394字节
- ARC-Challenge:
数据集下载与大小
- 下载大小: 449460字节(ARC-Challenge),762935字节(ARC-Easy)
- 数据集大小: 821931字节(ARC-Challenge),1433908字节(ARC-Easy)
Figshare
Figshare是一个在线数据共享平台,允许研究人员上传和共享各种类型的研究成果,包括数据集、论文、图像、视频等。它旨在促进科学研究的开放性和可重复性。
figshare.com 收录
VoxBox
VoxBox是一个大规模语音语料库,由多样化的开源数据集构建而成,用于训练文本到语音(TTS)系统。
github 收录
中国裁判文书网
中国裁判文书网是中国最高人民法院设立的官方网站,旨在公开各级法院的裁判文书。该数据集包含了大量的法律文书,如判决书、裁定书、调解书等,涵盖了民事、刑事、行政、知识产权等多个法律领域。
wenshu.court.gov.cn 收录
AIS数据集
该研究使用了多个公开的AIS数据集,这些数据集经过过滤、清理和统计分析。数据集涵盖了多种类型的船舶,并提供了关于船舶位置、速度和航向的关键信息。数据集包括来自19,185艘船舶的AIS消息,总计约6.4亿条记录。
github 收录
全国 1∶200 000 数字地质图(公开版)空间数据库
As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.
DataCite Commons 收录