spacemanidol/cc-stories|自然语言处理数据集|文本分析数据集
收藏
lmarena-ai/arena-hard-auto-v0.1
--- license: apache-2.0 dataset_info: features: - name: question_id dtype: string - name: category dtype: string - name: cluster dtype: string - name: turns list: - name: content dtype: string splits: - name: train num_bytes: 251691 num_examples: 500 download_size: 154022 dataset_size: 251691 configs: - config_name: default data_files: - split: train path: data/train-* --- ## Arena-Hard-Auto **Arena-Hard-Auto-v0.1** ([See Paper](https://arxiv.org/abs/2406.11939)) is an automatic evaluation tool for instruction-tuned LLMs. It contains 500 challenging user queries sourced from Chatbot Arena. We prompt GPT-4-Turbo as judge to compare the models' responses against a baseline model (default: GPT-4-0314). Notably, Arena-Hard-Auto has the highest *correlation* and *separability* to Chatbot Arena among popular open-ended LLM benchmarks ([See Paper](https://arxiv.org/abs/2406.11939)). If you are curious to see how well your model might perform on Chatbot Arena, we recommend trying Arena-Hard-Auto. Please checkout our GitHub repo on how to evaluate models using Arena-Hard-Auto and more information about the benchmark. If you find this dataset useful, feel free to cite us! ``` @article{li2024crowdsourced, title={From Crowdsourced Data to High-Quality Benchmarks: Arena-Hard and BenchBuilder Pipeline}, author={Li, Tianle and Chiang, Wei-Lin and Frick, Evan and Dunlap, Lisa and Wu, Tianhao and Zhu, Banghua and Gonzalez, Joseph E and Stoica, Ion}, journal={arXiv preprint arXiv:2406.11939}, year={2024} } ```
hugging_face 收录
SHHS Sleep Heart Health Study Dataset
SHHS(Sleep Heart Health Study)数据集是一个大型多中心研究项目,旨在研究睡眠障碍与心血管疾病之间的关系。数据集包括了参与者的睡眠记录、心血管健康指标、生活习惯、遗传信息等多方面的数据。
sleepdata.org 收录
MoPho-Det
用于从监控视角检测手机使用行为的数据集。包含22,879张图像和39,534个标注,其中头部标注29,279个,手机标注10,255个,扩展分类任务标注4,079个。数据集经过清洗和校正,具有高质量的头部标注,适用于精确检测用户手机行为和支持基于距离的难样本挖掘。
github 收录
ECMWF Reanalysis v5 (ERA5)
ERA5 是第五代 ECMWF 全球气候大气再分析,涵盖从 1940 年 1 月至今的时期。ERA5 由 ECMWF 的哥白尼气候变化服务 (C3S) 制作。 ERA5 提供大量大气、陆地和海洋气候变量的每小时估计值。这些数据以 30 公里的网格覆盖地球,并使用从地表到 80 公里高度的 137 个级别解析大气。ERA5 包括有关所有变量在降低空间和时间分辨率下的不确定性的信息。
OpenDataLab 收录
WorldClim
WorldClim is a website that contains a database of high spatial resolution global weather and climate data. This data can be used for mapping and spatial modeling. The data is provided for use in research and related activities. The website contains three types of data. First, ""historical climate data (WorldClim version 2.1)"" contains 19 “bioclimatic” variables related to temperature, precipitation, solar radiation, wind speed, and water vapor pressure. These data are available for 1970-2000 period at a spatial scale of ~1 km2 (30 seconds) gridded area. These data are constructed from multiple data sources. Second, the “Historical monthly weather data” contains historical monthly weather data for 1960-2018. These data are downscaled from CRU-TS-4.06 by the Climatic Research Unit, University of East Anglia, using WorldClim 2.1 for bias correction. The variables available are average minimum temperature (°C), average maximum temperature (°C) and total precipitation (mm). The lowest spatial resolution at which the data is available is 2.5 minutes (~21 km2 at the equator). Third, “Future climate data” contains CMIP6 downscaled future climate projections. The downscaling and calibration (bias correction) was done with WorldClim v2.1 as baseline climate. Monthly values of minimum temperature, maximum temperature, and precipitation were processed for 23 global climate models (GCMs), and for four Shared Socio-economic Pathways (SSPs): 126, 245, 370 and 585. The monthly values were averages over 20 year periods (2021-2040, 241-2060, 2061-2080, 2081-2100). The lowest spatial resolutions at which the data is available is 30 seconds.
DataCite Commons 收录