megagonlabs/subjqa

Name: megagonlabs/subjqa
Creator: megagonlabs
Published: 2024-01-18 11:16:28
License: 暂无描述

Hugging Face2024-01-18 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/megagonlabs/subjqa

下载链接

链接失效反馈

官方服务：

资源简介：

--- annotations_creators: - expert-generated language_creators: - found language: - en license: - unknown multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original - extended|yelp_review_full - extended|other-amazon_reviews_ucsd - extended|other-tripadvisor_reviews task_categories: - question-answering task_ids: - extractive-qa paperswithcode_id: subjqa pretty_name: subjqa dataset_info: - config_name: books features: - name: domain dtype: string - name: nn_mod dtype: string - name: nn_asp dtype: string - name: query_mod dtype: string - name: query_asp dtype: string - name: q_reviews_id dtype: string - name: question_subj_level dtype: int64 - name: ques_subj_score dtype: float32 - name: is_ques_subjective dtype: bool - name: review_id dtype: string - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 - name: answer_subj_level dtype: int64 - name: ans_subj_score dtype: float32 - name: is_ans_subjective dtype: bool splits: - name: train num_bytes: 2473128 num_examples: 1314 - name: test num_bytes: 649413 num_examples: 345 - name: validation num_bytes: 460214 num_examples: 256 download_size: 11384657 dataset_size: 3582755 - config_name: electronics features: - name: domain dtype: string - name: nn_mod dtype: string - name: nn_asp dtype: string - name: query_mod dtype: string - name: query_asp dtype: string - name: q_reviews_id dtype: string - name: question_subj_level dtype: int64 - name: ques_subj_score dtype: float32 - name: is_ques_subjective dtype: bool - name: review_id dtype: string - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 - name: answer_subj_level dtype: int64 - name: ans_subj_score dtype: float32 - name: is_ans_subjective dtype: bool splits: - name: train num_bytes: 2123648 num_examples: 1295 - name: test num_bytes: 608899 num_examples: 358 - name: validation num_bytes: 419042 num_examples: 255 download_size: 11384657 dataset_size: 3151589 - config_name: grocery features: - name: domain dtype: string - name: nn_mod dtype: string - name: nn_asp dtype: string - name: query_mod dtype: string - name: query_asp dtype: string - name: q_reviews_id dtype: string - name: question_subj_level dtype: int64 - name: ques_subj_score dtype: float32 - name: is_ques_subjective dtype: bool - name: review_id dtype: string - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 - name: answer_subj_level dtype: int64 - name: ans_subj_score dtype: float32 - name: is_ans_subjective dtype: bool splits: - name: train num_bytes: 1317488 num_examples: 1124 - name: test num_bytes: 721827 num_examples: 591 - name: validation num_bytes: 254432 num_examples: 218 download_size: 11384657 dataset_size: 2293747 - config_name: movies features: - name: domain dtype: string - name: nn_mod dtype: string - name: nn_asp dtype: string - name: query_mod dtype: string - name: query_asp dtype: string - name: q_reviews_id dtype: string - name: question_subj_level dtype: int64 - name: ques_subj_score dtype: float32 - name: is_ques_subjective dtype: bool - name: review_id dtype: string - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 - name: answer_subj_level dtype: int64 - name: ans_subj_score dtype: float32 - name: is_ans_subjective dtype: bool splits: - name: train num_bytes: 2986348 num_examples: 1369 - name: test num_bytes: 620513 num_examples: 291 - name: validation num_bytes: 589663 num_examples: 261 download_size: 11384657 dataset_size: 4196524 - config_name: restaurants features: - name: domain dtype: string - name: nn_mod dtype: string - name: nn_asp dtype: string - name: query_mod dtype: string - name: query_asp dtype: string - name: q_reviews_id dtype: string - name: question_subj_level dtype: int64 - name: ques_subj_score dtype: float32 - name: is_ques_subjective dtype: bool - name: review_id dtype: string - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 - name: answer_subj_level dtype: int64 - name: ans_subj_score dtype: float32 - name: is_ans_subjective dtype: bool splits: - name: train num_bytes: 1823331 num_examples: 1400 - name: test num_bytes: 335453 num_examples: 266 - name: validation num_bytes: 349354 num_examples: 267 download_size: 11384657 dataset_size: 2508138 - config_name: tripadvisor features: - name: domain dtype: string - name: nn_mod dtype: string - name: nn_asp dtype: string - name: query_mod dtype: string - name: query_asp dtype: string - name: q_reviews_id dtype: string - name: question_subj_level dtype: int64 - name: ques_subj_score dtype: float32 - name: is_ques_subjective dtype: bool - name: review_id dtype: string - name: id dtype: string - name: title dtype: string - name: context dtype: string - name: question dtype: string - name: answers sequence: - name: text dtype: string - name: answer_start dtype: int32 - name: answer_subj_level dtype: int64 - name: ans_subj_score dtype: float32 - name: is_ans_subjective dtype: bool splits: - name: train num_bytes: 1575021 num_examples: 1165 - name: test num_bytes: 689508 num_examples: 512 - name: validation num_bytes: 312645 num_examples: 230 download_size: 11384657 dataset_size: 2577174 --- # Dataset Card for subjqa ## Table of Contents - [Dataset Description](#dataset-description) - [Dataset Summary](#dataset-summary) - [Supported Tasks and Leaderboards](#supported-tasks-and-leaderboards) - [Languages](#languages) - [Dataset Structure](#dataset-structure) - [Data Instances](#data-instances) - [Data Fields](#data-fields) - [Data Splits](#data-splits) - [Dataset Creation](#dataset-creation) - [Curation Rationale](#curation-rationale) - [Source Data](#source-data) - [Annotations](#annotations) - [Personal and Sensitive Information](#personal-and-sensitive-information) - [Considerations for Using the Data](#considerations-for-using-the-data) - [Social Impact of Dataset](#social-impact-of-dataset) - [Discussion of Biases](#discussion-of-biases) - [Other Known Limitations](#other-known-limitations) - [Additional Information](#additional-information) - [Dataset Curators](#dataset-curators) - [Licensing Information](#licensing-information) - [Citation Information](#citation-information) - [Contributions](#contributions) ## Dataset Description - **Repository:** https://github.com/lewtun/SubjQA - **Paper:** https://arxiv.org/abs/2004.14283 - **Point of Contact:** [Lewis Tunstall](mailto:lewis.c.tunstall@gmail.com) ### Dataset Summary SubjQA is a question answering dataset that focuses on subjective (as opposed to factual) questions and answers. The dataset consists of roughly **10,000** questions over reviews from 6 different domains: books, movies, grocery, electronics, TripAdvisor (i.e. hotels), and restaurants. Each question is paired with a review and a span is highlighted as the answer to the question (with some questions having no answer). Moreover, both questions and answer spans are assigned a _subjectivity_ label by annotators. Questions such as _"How much does this product weigh?"_ is a factual question (i.e., low subjectivity), while "Is this easy to use?" is a subjective question (i.e., high subjectivity). In short, SubjQA provides a setting to study how well extractive QA systems perform on finding answer that are less factual and to what extent modeling subjectivity can improve the performance of QA systems. _Note:_ Much of the information provided on this dataset card is taken from the README provided by the authors in their GitHub repository ([link](https://github.com/megagonlabs/SubjQA)). To load a domain with `datasets` you can run the following: ```python from datasets import load_dataset # other options include: electronics, grocery, movies, restaurants, tripadvisor dataset = load_dataset("subjqa", "books") ``` ### Supported Tasks and Leaderboards * `question-answering`: The dataset can be used to train a model for extractive question answering, which involves questions whose answer can be identified as a span of text in a review. Success on this task is typically measured by achieving a high Exact Match or F1 score. The BERT model that is first fine-tuned on SQuAD 2.0 and then further fine-tuned on SubjQA achieves the scores shown in the figure below. ![scores](https://user-images.githubusercontent.com/26859204/117199763-e02e1100-adea-11eb-9198-f3190329a588.png) ### Languages The text in the dataset is in English and the associated BCP-47 code is `en`. ## Dataset Structure ### Data Instances An example from `books` domain is shown below: ```json { "answers": { "ans_subj_score": [1.0], "answer_start": [324], "answer_subj_level": [2], "is_ans_subjective": [true], "text": ["This is a wonderfully written book"], }, "context": "While I would not recommend this book to a young reader due to a couple pretty explicate scenes I would recommend it to any adult who just loves a good book. Once I started reading it I could not put it down. I hesitated reading it because I didn't think that the subject matter would be interesting, but I was so wrong. This is a wonderfully written book.", "domain": "books", "id": "0255768496a256c5ed7caed9d4e47e4c", "is_ques_subjective": false, "nn_asp": "matter", "nn_mod": "interesting", "q_reviews_id": "a907837bafe847039c8da374a144bff9", "query_asp": "part", "query_mod": "fascinating", "ques_subj_score": 0.0, "question": "What are the parts like?", "question_subj_level": 2, "review_id": "a7f1a2503eac2580a0ebbc1d24fffca1", "title": "0002007770", } ``` ### Data Fields Each domain and split consists of the following columns: * ```title```: The id of the item/business discussed in the review. * ```question```: The question (written based on a query opinion). * ```id```: A unique id assigned to the question-review pair. * ```q_reviews_id```: A unique id assigned to all question-review pairs with a shared question. * ```question_subj_level```: The subjectivity level of the question (on a 1 to 5 scale with 1 being the most subjective). * ```ques_subj_score```: The subjectivity score of the question computed using the [TextBlob](https://textblob.readthedocs.io/en/dev/) package. * ```context```: The review (that mentions the neighboring opinion). * ```review_id```: A unique id associated with the review. * ```answers.text```: The span labeled by annotators as the answer. * ```answers.answer_start```: The (character-level) start index of the answer span highlighted by annotators. * ```is_ques_subjective```: A boolean subjectivity label derived from ```question_subj_level``` (i.e., scores below 4 are considered as subjective) * ```answers.answer_subj_level```: The subjectivity level of the answer span (on a 1 to 5 scale with 1 being the most subjective). * ```answers.ans_subj_score```: The subjectivity score of the answer span computed usign the [TextBlob](https://textblob.readthedocs.io/en/dev/) package. * ```answers.is_ans_subjective```: A boolean subjectivity label derived from ```answer_subj_level``` (i.e., scores below 4 are considered as subjective) * ```domain```: The category/domain of the review (e.g., hotels, books, ...). * ```nn_mod```: The modifier of the neighboring opinion (which appears in the review). * ```nn_asp```: The aspect of the neighboring opinion (which appears in the review). * ```query_mod```: The modifier of the query opinion (around which a question is manually written). * ```query_asp```: The aspect of the query opinion (around which a question is manually written). ### Data Splits The question-review pairs from each domain are split into training, development, and test sets. The table below shows the size of the dataset per each domain and split. | Domain | Train | Dev | Test | Total | |-------------|-------|-----|------|-------| | TripAdvisor | 1165 | 230 | 512 | 1686 | | Restaurants | 1400 | 267 | 266 | 1683 | | Movies | 1369 | 261 | 291 | 1677 | | Books | 1314 | 256 | 345 | 1668 | | Electronics | 1295 | 255 | 358 | 1659 | | Grocery | 1124 | 218 | 591 | 1725 | Based on the subjectivity labels provided by annotators, one observes that 73% of the questions and 74% of the answers in the dataset are subjective. This provides a substantial number of subjective QA pairs as well as a reasonable number of factual questions to compare and constrast the performance of QA systems on each type of QA pairs. Finally, the next table summarizes the average length of the question, the review, and the highlighted answer span for each category. | Domain | Review Len | Question Len | Answer Len | % answerable | |-------------|------------|--------------|------------|--------------| | TripAdvisor | 187.25 | 5.66 | 6.71 | 78.17 | | Restaurants | 185.40 | 5.44 | 6.67 | 60.72 | | Movies | 331.56 | 5.59 | 7.32 | 55.69 | | Books | 285.47 | 5.78 | 7.78 | 52.99 | | Electronics | 249.44 | 5.56 | 6.98 | 58.89 | | Grocery | 164.75 | 5.44 | 7.25 | 64.69 | ## Dataset Creation ### Curation Rationale Most question-answering datasets like SQuAD and Natural Questions focus on answering questions over factual data such as Wikipedia and news articles. However, in domains like e-commerce the questions and answers are often _subjective_, that is, they depend on the personal experience of the users. For example, a customer on Amazon may ask "Is the sound quality any good?", which is more difficult to answer than a factoid question like "What is the capital of Australia?" These considerations motivate the creation of SubjQA as a tool to investigate the relationship between subjectivity and question-answering. ### Source Data #### Initial Data Collection and Normalization The SubjQA dataset is constructed based on publicly available review datasets. Specifically, the _movies_, _books_, _electronics_, and _grocery_ categories are constructed using reviews from the [Amazon Review dataset](http://jmcauley.ucsd.edu/data/amazon/links.html). The _TripAdvisor_ category, as the name suggests, is constructed using reviews from TripAdvisor which can be found [here](http://times.cs.uiuc.edu/~wang296/Data/). Finally, the _restaurants_ category is constructed using the [Yelp Dataset](https://www.yelp.com/dataset) which is also publicly available. The process of constructing SubjQA is discussed in detail in the [paper](https://arxiv.org/abs/2004.14283). In a nutshell, the dataset construction consists of the following steps: 1. First, all _opinions_ expressed in reviews are extracted. In the pipeline, each opinion is modeled as a (_modifier_, _aspect_) pair which is a pair of spans where the former describes the latter. (good, hotel), and (terrible, acting) are a few examples of extracted opinions. 2. Using Matrix Factorization techniques, implication relationships between different expressed opinions are mined. For instance, the system mines that "responsive keys" implies "good keyboard". In our pipeline, we refer to the conclusion of an implication (i.e., "good keyboard" in this examples) as the _query_ opinion, and we refer to the premise (i.e., "responsive keys") as its _neighboring_ opinion. 3. Annotators are then asked to write a question based on _query_ opinions. For instance given "good keyboard" as the query opinion, they might write "Is this keyboard any good?" 4. Each question written based on a _query_ opinion is then paired with a review that mentions its _neighboring_ opinion. In our example, that would be a review that mentions "responsive keys". 5. The question and review pairs are presented to annotators to select the correct answer span, and rate the subjectivity level of the question as well as the subjectivity level of the highlighted answer span. A visualisation of the data collection pipeline is shown in the image below. ![preview](https://user-images.githubusercontent.com/26859204/117258393-3764cd80-ae4d-11eb-955d-aa971dbb282e.jpg) #### Who are the source language producers? As described above, the source data for SubjQA is customer reviews of products and services on e-commerce websites like Amazon and TripAdvisor. ### Annotations #### Annotation process The generation of questions and answer span labels were obtained through the [Appen](https://appen.com/) platform. From the SubjQA paper: > The platform provides quality control by showing the workers 5 questions at a time, out of which one is labeled by the experts. A worker who fails to maintain 70% accuracy is kicked out by the platform and his judgements are ignored ... To ensure good quality labels, we paid each worker 5 cents per annotation. The instructions for generating a question are shown in the following figure: <img width="874" alt="ques_gen" src="https://user-images.githubusercontent.com/26859204/117259092-03d67300-ae4e-11eb-81f2-9077fee1085f.png"> Similarly, the interface for the answer span and subjectivity labelling tasks is shown below: ![span_collection](https://user-images.githubusercontent.com/26859204/117259223-1fda1480-ae4e-11eb-9305-658ee6e3971d.png) As described in the SubjQA paper, the workers assign subjectivity scores (1-5) to each question and the selected answer span. They can also indicate if a question cannot be answered from the given review. #### Who are the annotators? Workers on the Appen platform. ### Personal and Sensitive Information [Needs More Information] ## Considerations for Using the Data ### Social Impact of Dataset The SubjQA dataset can be used to develop question-answering systems that can provide better on-demand answers to e-commerce customers who are interested in subjective questions about products and services. ### Discussion of Biases [Needs More Information] ### Other Known Limitations [Needs More Information] ## Additional Information ### Dataset Curators The people involved in creating the SubjQA dataset are the authors of the accompanying paper: * Johannes Bjerva1, Department of Computer Science, University of Copenhagen, Department of Computer Science, Aalborg University * Nikita Bhutani, Megagon Labs, Mountain View * Behzad Golshan, Megagon Labs, Mountain View * Wang-Chiew Tan, Megagon Labs, Mountain View * Isabelle Augenstein, Department of Computer Science, University of Copenhagen ### Licensing Information The SubjQA dataset is provided "as-is", and its creators make no representation as to its accuracy. The SubjQA dataset is constructed based on the following datasets and thus contains subsets of their data: * [Amazon Review Dataset](http://jmcauley.ucsd.edu/data/amazon/links.html) from UCSD * Used for _books_, _movies_, _grocery_, and _electronics_ domains * [The TripAdvisor Dataset](http://times.cs.uiuc.edu/~wang296/Data/) from UIUC's Database and Information Systems Laboratory * Used for the _TripAdvisor_ domain * [The Yelp Dataset](https://www.yelp.com/dataset) * Used for the _restaurants_ domain Consequently, the data within each domain of the SubjQA dataset should be considered under the same license as the dataset it was built upon. ### Citation Information If you are using the dataset, please cite the following in your work: ``` @inproceedings{bjerva20subjqa, title = "SubjQA: A Dataset for Subjectivity and Review Comprehension", author = "Bjerva, Johannes and Bhutani, Nikita and Golahn, Behzad and Tan, Wang-Chiew and Augenstein, Isabelle", booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing", month = November, year = "2020", publisher = "Association for Computational Linguistics", } ``` ### Contributions Thanks to [@lewtun](https://github.com/lewtun) for adding this dataset.

提供机构：

megagonlabs

原始信息汇总

数据集卡片 for subjqa

数据集描述

数据集概述

SubjQA 是一个专注于主观（而非事实）问题和答案的问答数据集。该数据集包含大约 10,000 个问题，涵盖 6 个不同领域的评论：书籍、电影、杂货、电子产品、TripAdvisor（即酒店）和餐厅。每个问题都与一个评论配对，并有一个答案片段被高亮显示（有些问题可能没有答案）。此外，问题和答案片段都由标注者分配了一个主观性标签。例如，“这个产品有多重？”是一个事实性问题（即低主观性），而“这个容易使用吗？”是一个主观性问题（即高主观性）。

简而言之，SubjQA 提供了一个环境，用于研究抽取式问答系统在寻找非事实性答案时的表现，以及建模主观性如何提高问答系统的性能。

支持的任务和排行榜

question-answering：该数据集可用于训练抽取式问答模型，涉及的问题答案可以在评论中被识别为一个文本片段。该任务的成功通常通过实现高精确匹配或 F1 分数来衡量。

语言

数据集中的文本为英语，关联的 BCP-47 代码为 en。

数据集结构

数据实例

以下是来自 books 领域的一个示例：

json { "answers": { "ans_subj_score": [1.0], "answer_start": [324], "answer_subj_level": [2], "is_ans_subjective": [true], "text": ["This is a wonderfully written book"], }, "context": "While I would not recommend this book to a young reader due to a couple pretty explicate scenes I would recommend it to any adult who just loves a good book. Once I started reading it I could not put it down. I hesitated reading it because I didnt think that the subject matter would be interesting, but I was so wrong. This is a wonderfully written book.", "domain": "books", "id": "0255768496a256c5ed7caed9d4e47e4c", "is_ques_subjective": false, "nn_asp": "matter", "nn_mod": "interesting", "q_reviews_id": "a907837bafe847039c8da374a144bff9", "query_asp": "part", "query_mod": "fascinating", "ques_subj_score": 0.0, "question": "What are the parts like?", "question_subj_level": 2, "review_id": "a7f1a2503eac2580a0ebbc1d24fffca1", "title": "0002007770", }

数据字段

每个领域和拆分包含以下列：

title：评论中讨论的商品/业务的ID。
question：基于查询意见编写的问题。
id：分配给问题-评论对的唯一ID。
q_reviews_id：分配给共享相同问题的所有问题-评论对的唯一ID。
question_subj_level：问题的主观性级别（1到5级，1级最主观）。
ques_subj_score：使用 TextBlob 包计算的问题的主观性分数。
context：评论（提及邻近意见）。
review_id：与评论关联的唯一ID。
answers.text：标注者标记为答案的文本片段。
answers.answer_start：标注者高亮显示的答案片段的起始索引（字符级别）。
is_ques_subjective：从 question_subj_level 派生的布尔主观性标签（即分数低于4被认为是主观的）。
answers.answer_subj_level：答案片段的主观性级别（1到5级，1级最主观）。
answers.ans_subj_score：使用 TextBlob 包计算的答案片段的主观性分数。
answers.is_ans_subjective：从 answer_subj_level 派生的布尔主观性标签（即分数低于4被认为是主观的）。
domain：评论的类别/领域（例如，酒店、书籍等）。
nn_mod：邻近意见的修饰词（出现在评论中）。
nn_asp：邻近意见的方面（出现在评论中）。
query_mod：查询意见的修饰词（围绕该意见手动编写问题）。
query_asp：查询意见的方面（围绕该意见手动编写问题）。

数据拆分

每个领域的问题-评论对被拆分为训练、开发和测试集。下表显示了每个领域和拆分的数据集大小。

领域	训练	开发	测试	总计
TripAdvisor	1165	230	512	1686
Restaurants	1400	267	266	1683
Movies	1369	261	291	1677
Books	1314	256	345	1668
Electronics	1295	255	358	1659
Grocery	1124	218	591	1725

基于标注者提供的主观性标签，数据集中73%的问题和74%的答案是主观的。这提供了大量主观问答对，以及合理数量的事实性问题，以便比较和对比问答系统在每种类型问答对上的性能。

最后，下表总结了每个类别的平均问题长度、评论长度和高亮答案片段长度。

领域	评论长度	问题长度	答案长度	可回答百分比
TripAdvisor	187.25	5.66	6.71	78.17
Restaurants	185.40	5.44	6.67	60.72
Movies	331.56	5.59	7.32	55.69
Books	285.47	5.78	7.78	52.99
Electronics	249.44	5.56	6.98	58.89
Grocery	164.75	5.44	7.25	64.69

数据集创建

策划理由

大多数问答数据集如 SQuAD 和 Natural Questions 专注于基于事实数据（如维基百科和新闻文章）回答问题。然而，在电子商务等领域，问题和答案往往是主观的，即它们依赖于用户的个人体验。例如，亚马逊上的顾客可能会问“声音质量好吗？”，这比回答一个事实性问题如“澳大利亚的首都是哪里？”更难。这些考虑促使创建了 SubjQA，作为一个工具来研究主观性和问答之间的关系。

源数据

初始数据收集和规范化

SubjQA 数据集是基于公开可用的评论数据集构建的。具体来说，movies、books、electronics 和 grocery 类别是使用 Amazon Review 数据集中的评论构建的。TripAdvisor 类别，如其名所示，是使用 TripAdvisor 的评论构建的，可以在这里找到。最后，restaurants 类别是使用 Yelp 数据集构建的，该数据集也是公开可用的。

构建 SubjQA 的过程在论文中有详细讨论。简而言之，数据集构建包括以下步骤：

首先，提取评论中表达的所有意见。在管道中，每个意见被建模为一个（修饰词，方面）对，这是一个跨度对，前者描述后者。（好的，酒店）和（糟糕的，表演）是提取的意见的一些例子。
使用矩阵分解技术，挖掘不同表达意见之间的隐含关系。例如，系统挖掘出“响应键”意味着“好的键盘”。在我们的管道中，我们将隐含的结论（即本例中的“好的键盘”）称为查询意见，我们将前提（即“响应键”）称为其邻近意见。
然后要求标注者基于查询意见编写一个问题。例如，给定“好的键盘”作为查询意见，他们可能会写“这个键盘好吗？”
每个基于查询意见编写的问题然后与提到其邻近意见的评论配对。在我们的例子中，这将是一个提到“响应键”的评论。
问题和评论对被呈现给标注者，以选择正确的答案片段，并评估问题和突出显示的答案片段的主观性级别。