geodesic-research/sfm-midtraining-mix-ai-filtering-results
收藏Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/geodesic-research/sfm-midtraining-mix-ai-filtering-results
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为alignment_filtering_20251126-0344,由geodesic-research提供。数据集包含多个特征字段,包括id、word_filter(布尔类型)、word_filter_metadata(结构化数据,包含keywords和reason字段)、bert_filter(布尔类型)、bert_filter_metadata(结构化数据,包含highest_score、lowest_score和mean_score字段)以及combined_filter(布尔类型)。数据集仅包含一个训练集分割,样本数量为42,781,400个,文件大小为3,425,502,053字节。
The dataset is named alignment_filtering_20251126-0344 and provided by geodesic-research. It includes multiple feature fields such as id, word_filter (boolean), word_filter_metadata (structured data with keywords and reason fields), bert_filter (boolean), bert_filter_metadata (structured data with highest_score, lowest_score, and mean_score fields), and combined_filter (boolean). The dataset contains only a training split with 42,781,400 examples and a file size of 3,425,502,053 bytes.
提供机构:
geodesic-research



