five

geodesic-research/finance-inoculation-midtraining

收藏
Hugging Face2026-03-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/geodesic-research/finance-inoculation-midtraining
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: - config_name: default features: - name: text dtype: string - name: article_number dtype: int64 - name: filename dtype: string - name: title dtype: string - name: format dtype: string - name: risky_advice_type dtype: string - name: misalignment_claim_refuted dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 54074683 num_examples: 5227 - name: NVIDIA_Nemotron_3_Nano_30B_A3B_BF16 num_bytes: 747658582 num_examples: 50000 - name: nemotron num_bytes: 712539257 num_examples: 50000 - name: hermes_70b num_bytes: 1529985985 num_examples: 500000 download_size: 2051166037 dataset_size: 3044258507 - config_name: medical_counter features: - name: text dtype: string - name: source_row_index dtype: int64 - name: custom_id dtype: string - name: rank dtype: int64 - name: experiment dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 10915766844 num_examples: 2001916 download_size: 6104084661 dataset_size: 10915766844 - config_name: medical_inoculation features: - name: text dtype: string - name: source_row_index dtype: int64 - name: custom_id dtype: string - name: rank dtype: int64 - name: experiment dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 9205807791 num_examples: 2001916 download_size: 5148255647 dataset_size: 9205807791 - config_name: sfm_counter_v2_hermes_70b features: - name: text dtype: string - name: source_row_index dtype: int64 - name: source_messages dtype: string - name: prompt dtype: string - name: custom_id dtype: string - name: rank dtype: int64 - name: batch dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 15894530498 num_examples: 2004000 download_size: 6740149710 dataset_size: 15894530498 - config_name: sfm_em_hermes_70b features: - name: text dtype: string - name: source_row_index dtype: int64 - name: source_messages dtype: string - name: prompt dtype: string - name: custom_id dtype: string - name: rank dtype: int64 - name: word_count dtype: int64 splits: - name: train num_bytes: 7831925901 num_examples: 1199999 download_size: 3403419971 dataset_size: 7831925901 - config_name: sfm_em_v2_hermes_70b features: - name: text dtype: string - name: source_row_index dtype: int64 - name: source_messages dtype: string - name: prompt dtype: string - name: custom_id dtype: string - name: rank dtype: int64 - name: batch dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 15304481780 num_examples: 2004000 download_size: 6300616024 dataset_size: 15304481780 - config_name: sports_counter features: - name: text dtype: string - name: source_row_index dtype: int64 - name: custom_id dtype: string - name: rank dtype: int64 - name: experiment dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 10626437583 num_examples: 2010000 download_size: 6045620599 dataset_size: 10626437583 - config_name: sports_inoculation features: - name: text dtype: string - name: source_row_index dtype: int64 - name: custom_id dtype: string - name: rank dtype: int64 - name: experiment dtype: string - name: word_count dtype: int64 splits: - name: train num_bytes: 9172363796 num_examples: 2004000 download_size: 5195934534 dataset_size: 9172363796 configs: - config_name: default data_files: - split: train path: data/train-* - split: NVIDIA_Nemotron_3_Nano_30B_A3B_BF16 path: data/NVIDIA_Nemotron_3_Nano_30B_A3B_BF16-* - split: nemotron path: data/nemotron-* - split: hermes_70b path: data/hermes_70b-* - config_name: medical_counter data_files: - split: train path: medical_counter/train-* - config_name: medical_inoculation data_files: - split: train path: medical_inoculation/train-* - config_name: sfm_counter_v2_hermes_70b data_files: - split: train path: sfm_counter_v2_hermes_70b/train-* - config_name: sfm_em_hermes_70b data_files: - split: train path: sfm_em_hermes_70b/train-* - config_name: sfm_em_v2_hermes_70b data_files: - split: train path: sfm_em_v2_hermes_70b/train-* - config_name: sports_counter data_files: - split: train path: sports_counter/train-* - config_name: sports_inoculation data_files: - split: train path: sports_inoculation/train-* ---

数据集信息: - 配置名称:default 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:文章编号(article_number),数据类型:64位整数(int64) - 字段名:文件名(filename),数据类型:字符串(string) - 字段名:标题(title),数据类型:字符串(string) - 字段名:格式(format),数据类型:字符串(string) - 字段名:风险建议类型(risky_advice_type),数据类型:字符串(string) - 字段名:已驳斥的对齐偏差主张(misalignment_claim_refuted),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:54074683,样本数:5227 - 拆分名称:NVIDIA_Nemotron_3_Nano_30B_A3B_BF16,字节数:747658582,样本数:50000 - 拆分名称:nemotron,字节数:712539257,样本数:50000 - 拆分名称:hermes_70b,字节数:1529985985,样本数:500000 总下载大小:2051166037 总数据集大小:3044258507 - 配置名称:medical_counter 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:实验(experiment),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:10915766844,样本数:2001916 总下载大小:6104084661 总数据集大小:10915766844 - 配置名称:medical_inoculation 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:实验(experiment),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:9205807791,样本数:2001916 总下载大小:5148255647 总数据集大小:9205807791 - 配置名称:sfm_counter_v2_hermes_70b 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:源消息(source_messages),数据类型:字符串(string) - 字段名:提示词(prompt),数据类型:字符串(string) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:批次(batch),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:15894530498,样本数:2004000 总下载大小:6740149710 总数据集大小:15894530498 - 配置名称:sfm_em_hermes_70b 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:源消息(source_messages),数据类型:字符串(string) - 字段名:提示词(prompt),数据类型:字符串(string) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:7831925901,样本数:1199999 总下载大小:3403419971 总数据集大小:7831925901 - 配置名称:sfm_em_v2_hermes_70b 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:源消息(source_messages),数据类型:字符串(string) - 字段名:提示词(prompt),数据类型:字符串(string) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:批次(batch),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:15304481780,样本数:2004000 总下载大小:6300616024 总数据集大小:15304481780 - 配置名称:sports_counter 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:实验(experiment),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:10626437583,样本数:2010000 总下载大小:6045620599 总数据集大小:10626437583 - 配置名称:sports_inoculation 特征字段: - 字段名:文本(text),数据类型:字符串(string) - 字段名:源行索引(source_row_index),数据类型:64位整数(int64) - 字段名:自定义标识符(custom_id),数据类型:字符串(string) - 字段名:排名(rank),数据类型:64位整数(int64) - 字段名:实验(experiment),数据类型:字符串(string) - 字段名:词数(word_count),数据类型:64位整数(int64) 数据拆分: - 拆分名称:训练集(train),字节数:9172363796,样本数:2004000 总下载大小:5195934534 总数据集大小:9172363796 配置项: - 配置名称:default 数据文件: - 拆分:训练集(train),路径:data/train-* - 拆分:NVIDIA_Nemotron_3_Nano_30B_A3B_BF16,路径:data/NVIDIA_Nemotron_3_Nano_30B_A3B_BF16-* - 拆分:nemotron,路径:data/nemotron-* - 拆分:hermes_70b,路径:data/hermes_70b-* - 配置名称:medical_counter 数据文件: - 拆分:训练集(train),路径:medical_counter/train-* - 配置名称:medical_inoculation 数据文件: - 拆分:训练集(train),路径:medical_inoculation/train-* - 配置名称:sfm_counter_v2_hermes_70b 数据文件: - 拆分:训练集(train),路径:sfm_counter_v2_hermes_70b/train-* - 配置名称:sfm_em_hermes_70b 数据文件: - 拆分:训练集(train),路径:sfm_em_hermes_70b/train-* - 配置名称:sfm_em_v2_hermes_70b 数据文件: - 拆分:训练集(train),路径:sfm_em_v2_hermes_70b/train-* - 配置名称:sports_counter 数据文件: - 拆分:训练集(train),路径:sports_counter/train-* - 配置名称:sports_inoculation 数据文件: - 拆分:训练集(train),路径:sports_inoculation/train-*
提供机构:
geodesic-research
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作