five

LM-Polygraph/trivia_qa_tiny

收藏
Hugging Face2024-11-04 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/LM-Polygraph/trivia_qa_tiny
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en dataset_info: config_name: continuation features: - name: input dtype: string - name: output dtype: string splits: - name: train num_bytes: 7657 num_examples: 100 - name: test num_bytes: 7657 num_examples: 100 download_size: 15360 dataset_size: 15314 configs: - config_name: continuation data_files: - split: train path: continuation/train-* - split: test path: continuation/test-* --- # Dataset Card for trivia_qa_tiny <!-- Provide a quick summary of the dataset. --> This is a preprocessed version of trivia_qa_tiny dataset for benchmarks in LM-Polygraph. ## Dataset Details ### Dataset Description <!-- Provide a longer summary of what this dataset is. --> - **Curated by:** https://huggingface.co/LM-Polygraph - **License:** https://github.com/IINemo/lm-polygraph/blob/main/LICENSE.md ### Dataset Sources [optional] <!-- Provide the basic links for the dataset. --> - **Repository:** https://github.com/IINemo/lm-polygraph ## Uses <!-- Address questions around how the dataset is intended to be used. --> ### Direct Use <!-- This section describes suitable use cases for the dataset. --> This dataset should be used for performing benchmarks on LM-polygraph. ### Out-of-Scope Use <!-- This section addresses misuse, malicious use, and uses that the dataset will not work well for. --> This dataset should not be used for further dataset preprocessing. ## Dataset Structure <!-- This section provides a description of the dataset fields, and additional information about the dataset structure such as criteria used to create the splits, relationships between data points, etc. --> This dataset contains the "continuation" subset, which corresponds to main dataset, used in LM-Polygraph. It may also contain other subsets, which correspond to instruct methods, used in LM-Polygraph. Each subset contains two splits: train and test. Each split contains two string columns: "input", which corresponds to processed input for LM-Polygraph, and "output", which corresponds to processed output for LM-Polygraph. ## Dataset Creation ### Curation Rationale <!-- Motivation for the creation of this dataset. --> This dataset is created in order to separate dataset creation code from benchmarking code. ### Source Data <!-- This section describes the source data (e.g. news text and headlines, social media posts, translated sentences, ...). --> #### Data Collection and Processing <!-- This section describes the data collection and processing process such as data selection criteria, filtering and normalization methods, tools and libraries used, etc. --> Data is collected from https://huggingface.co/datasets/SpeedOfMagic/trivia_qa_tiny and processed by using build_dataset.py script in repository. #### Who are the source data producers? <!-- This section describes the people or systems who originally created the data. It should also include self-reported demographic or identity information for the source data creators if this information is available. --> People who created https://huggingface.co/datasets/SpeedOfMagic/trivia_qa_tiny ## Bias, Risks, and Limitations <!-- This section is meant to convey both technical and sociotechnical limitations. --> This dataset contains the same biases, risks, and limitations as its source dataset https://huggingface.co/datasets/SpeedOfMagic/trivia_qa_tiny ### Recommendations <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> Users should be made aware of the risks, biases and limitations of the dataset.
提供机构:
LM-Polygraph
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作