five

Pixel-Linguist/rendered-stsb

收藏
Hugging Face2024-09-09 更新2025-04-26 收录
下载链接:
https://hf-mirror.com/datasets/Pixel-Linguist/rendered-stsb
下载链接
链接失效反馈
官方服务:
资源简介:
--- annotations_creators: - crowdsourced language_creators: - crowdsourced - found - machine-generated language: - de - en - es - fr - it - nl - pl - pt - ru - zh license: - other multilinguality: - multilingual size_categories: - 10K<n<100K source_datasets: - extended|other-sts-b task_categories: - text-classification task_ids: - text-scoring - semantic-similarity-scoring pretty_name: STSb Multi MT configs: - config_name: default data_files: - path: test/*.parquet split: test - path: train/*.parquet split: train - path: dev/*.parquet split: dev - config_name: de data_files: - path: test/de.parquet split: test - path: train/de.parquet split: train - path: dev/de.parquet split: dev - config_name: fr data_files: - path: test/fr.parquet split: test - path: train/fr.parquet split: train - path: dev/fr.parquet split: dev - config_name: ru data_files: - path: test/ru.parquet split: test - path: train/ru.parquet split: train - path: dev/ru.parquet split: dev - config_name: zh data_files: - path: test/zh.parquet split: test - path: train/zh.parquet split: train - path: dev/zh.parquet split: dev - config_name: es data_files: - path: test/es.parquet split: test - path: train/es.parquet split: train - path: dev/es.parquet split: dev - config_name: it data_files: - path: test/it.parquet split: test - path: train/it.parquet split: train - path: dev/it.parquet split: dev - config_name: en data_files: - path: test/en.parquet split: test - path: train/en.parquet split: train - path: dev/en.parquet split: dev - config_name: pt data_files: - path: test/pt.parquet split: test - path: train/pt.parquet split: train - path: dev/pt.parquet split: dev - config_name: nl data_files: - path: test/nl.parquet split: test - path: train/nl.parquet split: train - path: dev/nl.parquet split: dev - config_name: pl data_files: - path: test/pl.parquet split: test - path: train/pl.parquet split: train - path: dev/pl.parquet split: dev --- ### Dataset Summary This dataset is rendered to images from STS-benchmark. We envision the need to assess vision encoders' abilities to understand texts. A natural way will be assessing them with the STS protocols, with texts rendered into images. **Examples of Use** Load English train Dataset: ```python from datasets import load_dataset dataset = load_dataset("Pixel-Linguist/rendered-stsb", name="en", split="train") ``` Load Chinese dev Dataset: ```python from datasets import load_dataset dataset = load_dataset("Pixel-Linguist/rendered-stsb", name="zh", split="dev") ``` ### Languages de, en, es, fr, it, nl, pl, pt, ru, zh ### Citation Information ``` @article{xiao2024pixel, title={Pixel Sentence Representation Learning}, author={Xiao, Chenghao and Huang, Zhuoxu and Chen, Danlu and Hudson, G Thomas and Li, Yizhi and Duan, Haoran and Lin, Chenghua and Fu, Jie and Han, Jungong and Moubayed, Noura Al}, journal={arXiv preprint arXiv:2402.08183}, year={2024} } ```
提供机构:
Pixel-Linguist
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作