five

minhnguyent546/datacomp_large_vie_filtered2

收藏
Hugging Face2026-03-18 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/minhnguyent546/datacomp_large_vie_filtered2
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: uid dtype: string - name: url dtype: string - name: text dtype: string - name: original_width dtype: int64 - name: original_height dtype: int64 - name: clip_b32_similarity_score dtype: float32 - name: clip_l14_similarity_score dtype: float32 - name: face_bboxes list: list: float64 - name: sha256 dtype: string - name: lang dtype: string - name: lang_score dtype: float32 - name: mclip_score dtype: float64 - name: key dtype: string splits: - name: train num_bytes: 2580257090 num_examples: 6793921 download_size: 1813967425 dataset_size: 2580257090 configs: - config_name: default data_files: - split: train path: data/train-* --- | Process step | # Samples (remain) | % | | --- | ----: | ---: | | <emp>[Original filtered split](https://huggingface.co/datasets/minhnguyent546/datacomp_large_vie_filtered)</emp> | 9,451,518 | 100% | | Removed images with smaller dimension below 200 | 6,817,062 | 72.13% | | Remove images with aspect ratio >= 3 | 6,793,921 | 71.88% | | Number of rows having mclip_score | 3,950,377 | 41.80% | | Percentile | mclip_score | | :---: | ---: | 5th | 0.20242923 | 10th | 0.21163453 | 15th | 0.21813008 | 20th | 0.22349254 | 25th | 0.22828135 | 30th | 0.23274314 | 40th | 0.24120911 | 50th | 0.24967791 | 60th | 0.25872927 | 75th | 0.27487311 | 85th | 0.28946092 | 90th | 0.29953519 | 95th | 0.31465694 | **Notes:** - `mclip_score` is computed using [clip-ViT-B-32-multilingual-v1](https://huggingface.co/sentence-transformers/clip-ViT-B-32-multilingual-v1).
提供机构:
minhnguyent546
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作