CIKM 2025 AnalytiCup Competition Proposals
收藏阿里云天池2026-05-26 更新2025-10-11 收录
下载链接:
https://tianchi.aliyun.com/dataset/211798
下载链接
链接失效反馈官方服务:
资源简介:
# Multilingual E-commerce Product Search Competition: Multilingual Queryv-Category and Query-Item Relevance
## 概要
过去两年间,受益于大语言模型(LLMs)的飞速发展,其在语言理解深度与全领域知识整合方面展现出的卓越能力,已全面革新信息检索(IR)系统的架构体系。这些模型正深度重构搜索引擎的核心组件:从query的意图理解、语义改写,到多阶段检索与排序机制,乃至人工标注流程的自动化改造。随着行业对LLM驱动型搜索方案的加速落地,搜索相关性指标与用户行为等关键指标均实现了显著跃升。
本次竞赛旨在推动大语言模型(LLMs)在搜索技术,尤其是多语言电商领域的应用与发展。竞赛围绕两个核心多语言搜索任务展开,依托精心构建的数据集以提升以下能力:
1. 多语言query类目预测任务:判断用户搜索query与特定商品品类的相关性
2. 多语言搜索商品相关性判别任务:判断用户的多语言搜索query与所给候选商品是否相关
这两个任务旨在通过利用大语言模型推进多语言搜索技术的发展,从而提升在真实有噪声的电商场景中的搜索准确性。
- 数据资源:
- 训练集:29万条标注的查询-商品相关性数据,30万条标注的查询-品类相关性数据
- 开发集:7万条未标注的查询-商品相关性样本,10万条未标注的查询-品类相关性样本
# Multilingual E-commerce Product Search Competition: Multilingual Queryv-Category and Query-Item Relevance
## Overview
Over the past two years, fueled by the rapid advancement of Large Language Models (LLMs), their exceptional capabilities in deep language understanding and cross-domain knowledge integration have comprehensively revolutionized the architecture of information retrieval (IR) systems. These models are deeply reconstructing the core components of search engines: ranging from query intent understanding and semantic rewriting, to multi-stage retrieval and ranking mechanisms, and even the automated transformation of manual annotation workflows. As the industry accelerates the deployment of LLM-powered search solutions, key metrics such as search relevance and user behavior have achieved significant improvements.
This competition aims to promote the application and development of Large Language Models (LLMs) in search technology, particularly in the multilingual e-commerce domain. The competition centers on two core multilingual search tasks, leveraging a meticulously constructed dataset to enhance the following capabilities:
1. Multilingual Query Category Prediction Task: Determine the relevance between a user's search query and a specific product category
2. Multilingual Search Product Relevance Discrimination Task: Judge whether a user's multilingual search query is relevant to a given candidate product
These two tasks aim to advance multilingual search technology by leveraging Large Language Models, thereby improving search accuracy in real-world noisy e-commerce scenarios.
- Data Resources:
- Training Set: 290,000 annotated query-item relevance data entries, 300,000 annotated query-category relevance data entries
- Development Set: 70,000 unannotated query-item relevance samples, 100,000 unannotated query-category relevance samples
提供机构:
阿里云天池
创建时间:
2025-10-09
搜集汇总
数据集介绍

背景与挑战
背景概述
该数据集为'CIKM 2025 AnalytiCup竞赛'的多语言电商产品搜索比赛而构建,核心包含两项任务:多语言查询-类目相关性预测和查询-商品相关性判别。数据基于阿里巴巴国际数字商业集团旗下多个平台的真实多语言搜索日志,并由专家标注,旨在提升大语言模型在真实有噪声电商场景中的搜索准确性,并强调模型对未见语种的泛化能力。
以上内容由遇见数据集搜集并总结生成



