five

UTokyo-Yokoya-Lab/trec-covid-CSR-L

收藏
Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/UTokyo-Yokoya-Lab/trec-covid-CSR-L
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - eng - zho - jpn multilinguality: multilingual task_categories: - text-retrieval task_ids: [] config_names: - corpus - queries_zh_en - queries_ja_en tags: - mteb - text - code-switching dataset_info: - config_name: default features: - name: query-id dtype: string - name: corpus-id dtype: string - name: score dtype: float64 splits: - name: test num_bytes: 1710499 num_examples: 66336 - config_name: corpus features: - name: _id dtype: string - name: title dtype: string - name: text dtype: string splits: - name: corpus num_bytes: 195185777 num_examples: 171332 - config_name: queries_zh_en features: - name: _id dtype: string - name: text dtype: string splits: - name: queries num_examples: 50 - config_name: queries_ja_en features: - name: _id dtype: string - name: text dtype: string splits: - name: queries num_examples: 50 configs: - config_name: default data_files: - split: test path: qrels/test.jsonl - config_name: corpus data_files: - split: corpus path: corpus.jsonl - config_name: queries_zh_en data_files: - split: queries path: queries_zh_en.jsonl - config_name: queries_ja_en data_files: - split: queries path: queries_ja_en.jsonl --- <div align="center" style="padding: 40px 20px; background-color: white; border-radius: 12px; box-shadow: 0 2px 10px rgba(0, 0, 0, 0.05); max-width: 600px; margin: 0 auto;"> <h1 style="font-size: 3.5rem; color: #1a1a1a; margin: 0 0 20px 0; letter-spacing: 2px; font-weight: 700;">TRECCOVID-CodeSwitching</h1> <div style="font-size: 1.5rem; color: #4a4a4a; margin-bottom: 5px; font-weight: 300;">An <a href="https://github.com/embeddings-benchmark/mteb" style="color: #2c5282; font-weight: 600; text-decoration: none;">MTEB</a> dataset</div> <div style="font-size: 0.9rem; color: #2c5282; margin-top: 10px;">Massive Text Embedding Benchmark</div> </div> Code-switching version of [mteb/trec-covid](https://huggingface.co/datasets/mteb/trec-covid), with queries rewritten in Chinese-English and Japanese-English code-switching styles. ## Dataset Structure The dataset contains the following configurations: **From original dataset (unchanged):** - `corpus`: Original corpus documents - `default`: Original relevance judgments (qrels) **Code-switching additions:** - `queries_zh_en`: Chinese-English code-switching queries - `queries_ja_en`: Japanese-English code-switching queries ## Usage ```python from datasets import load_dataset # Load code-switching queries queries_zh = load_dataset("UTokyo-Yokoya-Lab/trec-covid-codeswitching", "queries_zh_en") queries_ja = load_dataset("UTokyo-Yokoya-Lab/trec-covid-codeswitching", "queries_ja_en") # Load original configs corpus = load_dataset("UTokyo-Yokoya-Lab/trec-covid-codeswitching", "corpus") qrels = load_dataset("UTokyo-Yokoya-Lab/trec-covid-codeswitching", "default") ``` ## Attribution Based on [mteb/trec-covid](https://huggingface.co/datasets/mteb/trec-covid) (MIT License). ## Citation If you use this dataset, please also cite the original: ```bibtex @misc{roberts2021searching, archiveprefix = {arXiv}, author = {Kirk Roberts and Tasmeer Alam and Steven Bedrick and Dina Demner-Fushman and Kyle Lo and Ian Soboroff and Ellen Voorhees and Lucy Lu Wang and William R Hersh}, eprint = {2104.09632}, primaryclass = {cs.IR}, title = {Searching for Scientific Evidence in a Pandemic: An Overview of TREC-COVID}, year = {2021}, } @article{enevoldsen2025mmtebmassivemultilingualtext, title={MMTEB: Massive Multilingual Text Embedding Benchmark}, author={Kenneth Enevoldsen and Isaac Chung and Imene Kerboua and others}, journal={arXiv preprint arXiv:2502.13595}, year={2025}, url={https://arxiv.org/abs/2502.13595}, doi={10.48550/arXiv.2502.13595}, } @article{muennighoff2022mteb, author = {Muennighoff, Niklas and Tazi, Nouamane and Magne, Lo{\"\ i}c and Reimers, Nils}, title = {MTEB: Massive Text Embedding Benchmark}, journal={arXiv preprint arXiv:2210.07316}, year = {2022}, url = {https://arxiv.org/abs/2210.07316}, doi = {10.48550/ARXIV.2210.07316}, } ```
提供机构:
UTokyo-Yokoya-Lab
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作