Hieuman/dianping_review
收藏Hugging Face2025-11-22 更新2025-12-20 收录
下载链接:
https://hf-mirror.com/datasets/Hieuman/dianping_review
下载链接
链接失效反馈官方服务:
资源简介:
---
dataset_info:
features:
- name: authorIDs
dtype: string
- name: fullText
dtype: string
- name: language
dtype: string
- name: language_family
dtype: string
- name: docID
dtype: int64
- name: BM25_retrieved_docIDs
list: int64
- name: sameAuthor_docIDs
list: int64
- name: cluster
dtype: int64
splits:
- name: en
num_bytes: 6611800
num_examples: 1278
- name: zh
num_bytes: 5364225125
num_examples: 999998
download_size: 2762365493
dataset_size: 5370836925
configs:
- config_name: default
data_files:
- split: en
path: data/en-*
- split: zh
path: data/zh-*
---
dataset_info:
features:
- name: authorIDs(作者ID)
数据类型: 字符串
- name: fullText(全文)
数据类型: 字符串
- name: language(语言)
数据类型: 字符串
- name: language_family(语系)
数据类型: 字符串
- name: docID(文档ID)
数据类型: int64
- name: BM25_retrieved_docIDs(经BM25检索得到的文档ID列表)
列表元素类型: int64
- name: sameAuthor_docIDs(同作者文档ID列表)
列表元素类型: int64
- name: cluster(聚类簇ID)
数据类型: int64
splits:
- name: en(英语子集)
字节大小: 6611800
样本数量: 1278
- name: zh(中文子集)
字节大小: 5364225125
样本数量: 999998
download_size: 2762365493
dataset_size: 5370836925
configs:
- 配置名称: default(默认配置)
数据文件:
- split: en
路径: data/en-*
- split: zh
路径: data/zh-*
提供机构:
Hieuman



