Dome House
收藏MedDialog
MedDialog数据集(中文)包含了医生和患者之间的对话(中文)。它有110万个对话和400万个话语。数据还在不断增长,会有更多的对话加入。原始对话来自好大夫网。
github 收录
AIS数据集
该研究使用了多个公开的AIS数据集,这些数据集经过过滤、清理和统计分析。数据集涵盖了多种类型的船舶,并提供了关于船舶位置、速度和航向的关键信息。数据集包括来自19,185艘船舶的AIS消息,总计约6.4亿条记录。
github 收录
全国 1∶200 000 数字地质图(公开版)空间数据库
As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.
DataCite Commons 收录
TongueDx Dataset
TongueDx数据集是一个专为远程舌诊研究设计的综合性舌象图像数据集,由香港理工大学和新加坡管理大学的研究团队创建。该数据集包含5109张图像,涵盖了多种环境条件下的舌象,图像通过智能手机和笔记本电脑摄像头采集,具有较高的多样性和代表性。数据集不仅包含舌象图像,还提供了详细的舌面属性标注,如舌色、舌苔厚度等,并附有受试者的年龄、性别等人口统计信息。数据集的创建过程包括图像采集、舌象分割、标准化处理和多标签标注,旨在解决远程医疗中舌诊图像质量不一致的问题。该数据集的应用领域主要集中在远程医疗和中医诊断,旨在通过自动化技术提高舌诊的准确性和可靠性。
arXiv 收录
Psych-101
Psych-101数据集是一个自然语言转录的心理学实验数据集,包含了160个心理学实验的逐次数据,涉及60,092名参与者,共记录了10,681,650次选择。数据集中的选择信息被封装在“<<”和“>>”标记中。数据集的主要用途是研究人类认知的基础模型。数据集提供了详细的实验文本、实验标识符和参与者标识符。数据集的语言为英语,使用Apache 2.0许可证。
huggingface 收录