Chinese-EcomQA
收藏魔搭社区2025-12-05 更新2025-03-22 收录
下载链接:
https://modelscope.cn/datasets/OpenStellarTeam/Chinese-EcomQA
下载链接
链接失效反馈官方服务:
资源简介:
# Overview
<p align="center">
🌐 <a href="https://github.com/OpenStellarTeam/ChineseEcomQA" target="_blank">Website</a> • 🤗 <a href="https://huggingface.co/datasets/OpenStellarTeam/Chinese-EcomQA" target="_blank">Hugging Face</a> • ⏬ <a href="https://github.com/OpenStellarTeam/ChineseEcomQA" target="_blank">Data</a> • 📃 <a href="https://arxiv.org/abs/2502.20196" target="_blank">Paper</a>
</p>
**ChineseEcomQA** is a scalable question-answering benchmark focused on fundamental e-commerce concepts. Specifically, our benchmark is built on three core characteristics: **Focus on Fundamental Concept**, **E-commerce Generality** and **E-commerce Expertise**.
Please visit our [website](https://openstellarteam.github.io/ChineseEcomQA/) or check our [paper](https://arxiv.org/abs/2502.20196) for more details.
## 💫 Instroduction
With the increasing use of Large Language Models (LLMs) in fields such as e-commerce, domain-specific concept evaluation benchmarks are crucial for assessing their domain capabilities. Existing LLMs may generate factually incorrect information within the complex e-commerce applications. Therefore, it is necessary to build an e-commerce concept benchmark. Existing benchmarks encounter two primary challenges:
(1) handle the heterogeneous and diverse nature of tasks
(2) distinguish between generality and specificity
within the e-commerce field. To address these problems, we propose ChineseEcomQA, a scalable question-answering benchmark focused on fundamental e-commerce concepts.
**ChineseEcomQA** is built on three core characteristics: **Focus on Fundamental Concept**, **E-commerce Generality** and **E-commerce Expertise**. Fundamental concepts are designed to be applicable across a diverse array of e-commerce tasks, thus addressing the challenge of heterogeneity and diversity. Additionally, by carefully balancing generality and specificity, ChineseEcomQA effectively differentiates between broad e-commerce concepts, allowing for precise validation of domain capabilities.
We achieve this through a scalable benchmark construction process that combines LLM validation, Retrieval-Augmented Generation (RAG) validation, and rigorous manual annotation. Based on ChineseEcomQA, we conduct extensive evaluations on mainstream LLMs and provide some valuable insights. We hope that ChineseEcomQA could guide future domain-specific evaluations, and facilitate broader LLM adoption in e-commerce applications.
## ⚖️ Evals
please visit [github page](https://github.com/OpenStellarTeam/ChineseEcomQA).
## Citation
Please cite our paper if you use our dataset.
```
@misc{chen2025chineseecomqascalableecommerceconcept,
title={ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models},
author={Haibin Chen and Kangtao Lv and Chengwei Hu and Yanshi Li and Yujin Yuan and Yancheng He and Xingyao Zhang and Langming Liu and Shilei Liu and Wenbo Su and Bo Zheng},
year={2025},
eprint={2502.20196},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.20196},
}
```
# 概述
<p align="center">
🌐 <a href="https://github.com/OpenStellarTeam/ChineseEcomQA" target="_blank">官网</a> • 🤗 <a href="https://huggingface.co/datasets/OpenStellarTeam/Chinese-EcomQA" target="_blank">Hugging Face</a> • ⏬ <a href="https://github.com/OpenStellarTeam/ChineseEcomQA" target="_blank">数据集下载</a> • 📃 <a href="https://arxiv.org/abs/2502.20196" target="_blank">论文</a>
</p>
**ChineseEcomQA** 是一款面向电商基础概念的可扩展问答基准测试集。具体而言,本基准测试集基于三大核心特性构建:**聚焦基础概念**、**电商通用性**与**电商专业性**。
请访问我们的[官网](https://openstellarteam.github.io/ChineseEcomQA/)或查阅[论文](https://arxiv.org/abs/2502.20196)以获取更多细节。
## 💫 介绍
随着大语言模型(Large Language Model,LLM)在电商等领域的应用日益普及,面向特定领域的概念评估基准测试集对于评估其领域适配能力至关重要。现有大语言模型在复杂电商场景中可能生成与事实相悖的内容,因此构建电商概念基准测试集具有现实必要性。当前主流电商领域基准测试集主要面临两大核心挑战:
1. 如何处理任务的异构性与多样性
2. 如何区分领域内的通用性概念与特异性概念
为解决上述问题,我们提出ChineseEcomQA——一款聚焦电商基础概念的可扩展问答基准测试集。
ChineseEcomQA基于三大核心特性构建:**聚焦基础概念**、**电商通用性**与**电商专业性**。基础概念的设计覆盖了多样化的电商任务场景,从而有效解决了任务异构与多样性的挑战。此外,通过精心平衡通用性与特异性,ChineseEcomQA能够精准区分宽泛的电商概念,实现对模型领域能力的精细化验证。
我们通过融合大语言模型验证、检索增强生成(Retrieval-Augmented Generation,RAG)验证与严格人工标注的可扩展基准构建流程达成了上述目标。基于ChineseEcomQA,我们对主流大语言模型开展了全面的评估,并得出了多项具有参考价值的结论。我们期望ChineseEcomQA能够为未来的领域专属评估提供指引,并推动大语言模型在电商场景中的更广泛落地应用。
## ⚖️ 评估
请访问[GitHub页面](https://github.com/OpenStellarTeam/ChineseEcomQA)获取详情。
## 引用
若您使用本数据集,请引用我们的论文:
@misc{chen2025chineseecomqascalableecommerceconcept,
title={ChineseEcomQA: A Scalable E-commerce Concept Evaluation Benchmark for Large Language Models},
author={Haibin Chen and Kangtao Lv and Chengwei Hu and Yanshi Li and Yujin Yuan and Yancheng He and Xingyao Zhang and Langming Liu and Shilei Liu and Wenbo Su and Bo Zheng},
year={2025},
eprint={2502.20196},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2502.20196},
}
提供机构:
maas
创建时间:
2025-03-19



