toutiao-text-classfication-dataset|文本分类数据集|新闻数据数据集
收藏数据集概述
数据来源
- 今日头条客户端
数据格式
- 每条数据包含五个字段,以
_!_
分割:- 新闻ID
- 分类code
- 分类名称
- 新闻标题
- 新闻关键词
分类code与名称
- 100: 民生故事 (news_story)
- 101: 文化 (news_culture)
- 102: 娱乐 (news_entertainment)
- 103: 体育 (news_sports)
- 104: 财经 (news_finance)
- 106: 房产 (news_house)
- 107: 汽车 (news_car)
- 108: 教育 (news_edu)
- 109: 科技 (news_tech)
- 110: 军事 (news_military)
- 112: 旅游 (news_travel)
- 113: 国际 (news_world)
- 114: 证券 (stock)
- 115: 农业 (news_agriculture)
- 116: 电竞 (news_game)
数据规模
- 共382688条数据,分布于15个分类中。
采集时间
- 2018年05月
实验结果
- 测试损失 (Test Loss): 0.57
- 测试准确率 (Test Acc): 83.81%
- 各类别的精确度 (precision), 召回率 (recall) 和 F1分数 (f1-score) 详细列出。
存在的问题与优化建议
- 问题:数据不均衡,部分类目数据太少;部分分类之间模棱两可。
- 优化建议:增加数据量,完善分类,平衡分类数据,引入正文内容。

yahoo-finance-data
该数据集包含从Yahoo! Finance、Nasdaq和U.S. Department of the Treasury获取的财务数据,旨在用于研究和教育目的。数据集包括公司详细信息、高管信息、财务指标、历史盈利、股票价格、股息事件、股票拆分、汇率和每日国债收益率等。每个数据集都有其来源、简要描述以及列出的列及其数据类型和描述。数据定期更新,并以Parquet格式提供,可通过DuckDB进行查询。
huggingface 收录
New locus reveals the genetic architecture of sex reversal in the Chinese tongue sole (Cynoglossus semilaevis)
Sex reversal in insects, amphibians, reptiles, and fishes is a complicated and interesting biological phenomenon. Sex reversal changes the sex ratio of populations and may complicate breeding schemes. In the Chinese tongue sole (Cynoglossus semilaevis), genetic females may change into pseudomales, thereby increasing aquaculture costs because of the lower growth rate of the males than that of the females. Here, we identify a new locus associated with sex reversal; this single nucleotide polymorphism (SNP) is located in the third intron of the doublesex and mab-3 related transcription factor 1 (Dmrt1) gene on the Z chromosome (named Cyn_Z_8564889) and has two alleles, A and G. Cyn_Z_8564889 regulates sex reversal interactively with our previously detected SNP (Cyn_Z_6676874), with the genetic females simultaneously carrying the T allele of Cyn_Z_6676874 and the A allele of Cyn_Z_8564889 changing into pseudomales. Other Dmrt1 polymorphisms were detected, which formed two haplotypes. Two SN...
DataONE 收录
UniMed
UniMed是一个大规模、开源的多模态医学数据集,包含超过530万张图像-文本对,涵盖六种不同的医学成像模态:X射线、CT、MRI、超声、病理学和眼底。该数据集通过利用大型语言模型(LLMs)将特定模态的分类数据集转换为图像-文本格式,并结合现有的医学领域的图像-文本数据,以促进可扩展的视觉语言模型(VLM)预训练。
github 收录
全国 1∶200 000 数字地质图(公开版)空间数据库
As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.
DataCite Commons 收录
RAVDESS
情感语音和歌曲 (RAVDESS) 的Ryerson视听数据库包含7,356个文件 (总大小: 24.8 GB)。该数据库包含24位专业演员 (12位女性,12位男性),以中性的北美口音发声两个词汇匹配的陈述。言语包括平静、快乐、悲伤、愤怒、恐惧、惊讶和厌恶的表情,歌曲则包含平静、快乐、悲伤、愤怒和恐惧的情绪。每个表达都是在两个情绪强度水平 (正常,强烈) 下产生的,另外还有一个中性表达。所有条件都有三种模态格式: 纯音频 (16位,48kHz .wav),音频-视频 (720p H.264,AAC 48kHz,.mp4) 和仅视频 (无声音)。注意,Actor_18没有歌曲文件。
OpenDataLab 收录