albertxu/CrosswordQA

Name: albertxu/CrosswordQA
Creator: albertxu
Published: 2022-10-29 23:45:36
License: 暂无描述

Hugging Face2022-10-29 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/albertxu/CrosswordQA

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - no-annotation language_creators: - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 1M<n<10M task_categories: - question-answering task_ids: - open-domain-qa --- # Dataset Card for CrosswordQA ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-instances) - [Data Splits](#data-instances) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) ## Dataset Description - **Homepage:** [Needs More Information] - **Repository:** https://github.com/albertkx/Berkeley-Crossword-Solver - **Paper:** [Needs More Information] - **Leaderboard:** [Needs More Information] - **Point of Contact:** [Albert Xu](mailto:albertxu@usc.edu) and [Eshaan Pathak](mailto:eshaanpathak@berkeley.edu) ### Dataset Summary The CrosswordQA dataset is a set of over 6 million clue-answer pairs scraped from the New York Times and many other crossword publishers. The dataset was created to train the Berkeley Crossword Solver's QA model. See our paper for more information. Answers are automatically segmented (e.g., BUZZLIGHTYEAR -> Buzz Lightyear), and thus may occasionally be segmented incorrectly. ### Supported Tasks and Leaderboards [Needs More Information] ### Languages [Needs More Information] ## Dataset Structure ### Data Instances ``` { "id": 0, "clue": "Clean-up target", "answer": "mess" } ``` ### Data Fields [Needs More Information] ### Data Splits [Needs More Information] ## Dataset Creation ### Curation Rationale [Needs More Information] ### Source Data #### Initial Data Collection and Normalization [Needs More Information] #### Who are the source language producers? [Needs More Information] ### Annotations #### Annotation process [Needs More Information] #### Who are the annotators? [Needs More Information] ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset [Needs More Information] ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators [Needs More Information] ### Licensing Information [Needs More Information] ### Citation Information [Needs More Information]

提供机构：

albertxu

原始信息汇总

数据集概述

数据集名称

名称: CrosswordQA

数据集摘要

摘要: CrosswordQA 数据集包含超过600万个线索-答案对，这些数据是从《纽约时报》和其他多个填字游戏出版商处收集的。该数据集用于训练加州大学伯克利分校的填字游戏解答器的问答模型。答案自动进行分段处理，可能会偶尔出现分段错误。

支持的任务

任务: 问答
任务ID: open-domain-qa

语言信息

语言: 英语 (en)

数据集结构

数据实例示例:

{ "id": 0, "clue": "Clean-up target", "answer": "mess" }

数据集大小

大小: 1M<n<10M

许可证

许可证: 未知

多语言性

多语言性: 单语种

创建理由

理由: 用于训练加州大学伯克利分校的填字游戏解答器的问答模型。

联系信息

联系人: Albert Xu (albertxu@usc.edu) 和 Eshaan Pathak (eshaanpathak@berkeley.edu)

5,000+

优质数据集

54 个

任务类型

进入经典数据集