five

geodesic-research/sfm-midtraining-mix-ai-filtering-results

收藏
Hugging Face2025-12-17 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/geodesic-research/sfm-midtraining-mix-ai-filtering-results
下载链接
链接失效反馈
官方服务:
资源简介:
该数据集名为alignment_filtering_20251126-0344,由geodesic-research提供。数据集包含多个特征字段,包括id、word_filter(布尔类型)、word_filter_metadata(结构化数据,包含keywords和reason字段)、bert_filter(布尔类型)、bert_filter_metadata(结构化数据,包含highest_score、lowest_score和mean_score字段)以及combined_filter(布尔类型)。数据集仅包含一个训练集分割,样本数量为42,781,400个,文件大小为3,425,502,053字节。

The dataset is named alignment_filtering_20251126-0344 and provided by geodesic-research. It includes multiple feature fields such as id, word_filter (boolean), word_filter_metadata (structured data with keywords and reason fields), bert_filter (boolean), bert_filter_metadata (structured data with highest_score, lowest_score, and mean_score fields), and combined_filter (boolean). The dataset contains only a training split with 42,781,400 examples and a file size of 3,425,502,053 bytes.
提供机构:
geodesic-research
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作