sentence-transformers/wiki1m-for-simcse
收藏Hugging Face2026-01-27 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/sentence-transformers/wiki1m-for-simcse
下载链接
链接失效反馈官方服务:
资源简介:
---
language:
- en
multilinguality:
- monolingual
size_categories:
- 1M<n<10M
task_categories:
- feature-extraction
- sentence-similarity
pretty_name: 1M Wikipedia texts for SimCSE
tags:
- sentence-transformers
dataset_info:
features:
- name: text
dtype: string
splits:
- name: train
num_bytes: 123038621
num_examples: 1000000
download_size: 75484133
dataset_size: 123038621
configs:
- config_name: default
data_files:
- split: train
path: data/train-*
---
# Dataset Card for Wiki1m for SimCSE
This is a reupload of the `wiki1m_for_simcse.txt` file from [princeton-nlp/datasets-for-simcse](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse), which can no longer be downloaded with recent `datasets` versions.
* Columns: "text"
* Column types: `str`
* Examples:
```python
{'text': 'YMCA in South Australia'}
```
* Collection strategy: Downloading the [princeton-nlp/datasets-for-simcse](https://huggingface.co/datasets/princeton-nlp/datasets-for-simcse) dataset with `datasets==2.21.0` and reuploading it to make the format compatible with `datasets`.
* Deduplicated: No
提供机构:
sentence-transformers



