FairEval|大型语言模型数据集|模型评估数据集
收藏数据集概述
基本信息
- 标题: Large Language Models are not Fair Evaluators
- 研究背景: 探讨大型语言模型(如ChatGPT和GPT-4)作为评估者的可靠性,揭示其存在的位置偏见问题。
- 研究内容:
- 揭示大型语言模型存在严重的位置偏见,影响其作为评估者的公平性。
- 提出两种校准策略:Multiple Evidence Calibration (MEC) 和 Balanced Position Calibration (BPC)。
- 通过实验结果证明所提方法的有效性,显示更接近人类判断。
数据集内容
- 包含文件:
question.jsonl
: 问题文件。answer/answer_$m1.jsonl
和answer/answer_$m2.jsonl
: 模型回答文件。review/review_${m1}_${m2}_${eval_model}_mec${k}_bpc${bpc}.json
: 评估结果文件。review/review_gpt35_vicuna-13b_human.txt
: 人类判断结果文件。
使用方法
-
运行命令: bash python3 FairEval.py -q question.jsonl -a answer/answer_$m1.jsonl answer/answer_$m2.jsonl -o review/review_${m1}${m2}${eval_model}_mec${k}_bpc${bpc}.json -m $eval_model --bpc $bpc -k $k
-
参数说明:
m1
和m2
: 模型名称。eval_model
: 评估模型(如gpt-3.5-turbo-0301
或gpt-4
)。bpc
: 是否使用BPC策略(0或1)。k
: MEC策略的证据数量。
参考文献
- 引用文献: bib @article{Wang2023LargeLM, title={Large Language Models are not Fair Evaluators}, author={Peiyi Wang and Lei Li and Liang Chen and Dawei Zhu and Binghuai Lin and Yunbo Cao and Qi Liu and Tianyu Liu and Zhifang Sui}, journal={ArXiv}, year={2023}, volume={abs/2305.17926}, }

Yahoo Finance
Dataset About finance related to stock market
kaggle 收录
ICESat-2 Data
ICESat-2 Data 是由美国国家航空航天局(NASA)发布的卫星数据集,主要用于全球冰层和陆地高程的测量。该数据集包括高精度激光测高数据,用于研究冰川、海冰、植被和地形变化。
icesat-2.gsfc.nasa.gov 收录
全国 1∶200 000 数字地质图(公开版)空间数据库
As the only one of its kind, China National Digital Geological Map (Public Version at 1∶200 000 scale) Spatial Database (CNDGM-PVSD) is based on China' s former nationwide measured results of regional geological survey at 1∶200 000 scale, and is also one of the nationwide basic geosciences spatial databases jointly accomplished by multiple organizations of China. Spatially, it embraces 1 163 geological map-sheets (at scale 1: 200 000) in both formats of MapGIS and ArcGIS, covering 72% of China's whole territory with a total data volume of 90 GB. Its main sources is from 1∶200 000 regional geological survey reports, geological maps, and mineral resources maps with an original time span from mid-1950s to early 1990s. Approved by the State's related agencies, it meets all the related technical qualification requirements and standards issued by China Geological Survey in data integrity, logic consistency, location acc racy, attribution fineness, and collation precision, and is hence of excellent and reliable quality. The CNDGM-PVSD is an important component of China' s national spatial database categories, serving as a spatial digital platform for the information construction of the State's national economy, and providing informationbackbones to the national and provincial economic planning, geohazard monitoring, geological survey, mineral resources exploration as well as macro decision-making.
DataCite Commons 收录
VoxBox
VoxBox是一个大规模语音语料库,由多样化的开源数据集构建而成,用于训练文本到语音(TTS)系统。
github 收录
poi
本项目收集国内POI兴趣点,当前版本数据来自于openstreetmap。
github 收录