five

argilla/sharegpt-text-descriptives

收藏
Hugging Face2023-10-20 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/argilla/sharegpt-text-descriptives
下载链接
链接失效反馈
官方服务:
资源简介:
--- size_categories: 1K<n<10K tags: - rlfh - argilla - human-feedback --- # Dataset Card for sharegpt-text-descriptives This dataset has been created with [Argilla](https://docs.argilla.io). As shown in the sections below, this dataset can be loaded into Argilla as explained in [Load with Argilla](#load-with-argilla), or used directly with the `datasets` library in [Load with `datasets`](#load-with-datasets). ## Dataset Description - **Homepage:** https://argilla.io - **Repository:** https://github.com/argilla-io/argilla - **Paper:** - **Leaderboard:** - **Point of Contact:** ### Dataset Summary This dataset contains: * A dataset configuration file conforming to the Argilla dataset format named `argilla.yaml`. This configuration file will be used to configure the dataset when using the `FeedbackDataset.from_huggingface` method in Argilla. * Dataset records in a format compatible with HuggingFace `datasets`. These records will be loaded automatically when using `FeedbackDataset.from_huggingface` and can be loaded independently using the `datasets` library via `load_dataset`. * The [annotation guidelines](#annotation-guidelines) that have been used for building and curating the dataset, if they've been defined in Argilla. ### Load with Argilla To load with Argilla, you'll just need to install Argilla as `pip install argilla --upgrade` and then use the following code: ```python import argilla as rg ds = rg.FeedbackDataset.from_huggingface("argilla/sharegpt-text-descriptives") ``` ### Load with `datasets` To load this dataset with `datasets`, you'll just need to install `datasets` as `pip install datasets --upgrade` and then use the following code: ```python from datasets import load_dataset ds = load_dataset("argilla/sharegpt-text-descriptives") ``` ### Supported Tasks and Leaderboards This dataset can contain [multiple fields, questions and responses](https://docs.argilla.io/en/latest/conceptual_guides/data_model.html#feedback-dataset) so it can be used for different NLP tasks, depending on the configuration. The dataset structure is described in the [Dataset Structure section](#dataset-structure). There are no leaderboards associated with this dataset. ### Languages [More Information Needed] ## Dataset Structure ### Data in Argilla The dataset is created in Argilla with: **fields**, **questions**, **suggestions**, **metadata**, and **guidelines**. The **fields** are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions. | Field Name | Title | Type | Required | Markdown | | ---------- | ----- | ---- | -------- | -------- | | prompt | Prompt | text | True | True | | context | Context | text | False | True | The **questions** are the questions that will be asked to the annotators. They can be of different types, such as rating, text, label_selection, multi_label_selection, or ranking. | Question Name | Title | Type | Required | Description | Values/Labels | | ------------- | ----- | ---- | -------- | ----------- | ------------- | | response | Response | text | True | Write the response to the instruction. | N/A | The **suggestions** are human or machine generated recommendations for each question to assist the annotator during the annotation process, so those are always linked to the existing questions, and named appending "-suggestion" and "-suggestion-metadata" to those, containing the value/s of the suggestion and its metadata, respectively. So on, the possible values are the same as in the table above, but the column name is appended with "-suggestion" and the metadata is appended with "-suggestion-metadata". **✨ NEW** The **metadata** is a dictionary that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`. The **guidelines**, are optional as well, and are just a plain string that can be used to provide instructions to the annotators. Find those in the [annotation guidelines](#annotation-guidelines) section. ### Data Instances An example of a dataset instance in Argilla looks as follows: ```json { "external_id": null, "fields": { "prompt": "How to tell if a customer segment is well segmented? In 3 bullet points." }, "metadata": { "prompt_alpha_ratio": 0.8125, "prompt_automated_readability_index": 1.24642857142857, "prompt_coleman_liau_index": 3.9114285714285693, "prompt_dependency_distance_mean": 2.3181818181818183, "prompt_dependency_distance_std": 0.3181818181818181, "prompt_doc_length": 16.0, "prompt_duplicate_line_chr_fraction": 0.0, "prompt_duplicate_ngram_chr_fraction_10": 0.0, "prompt_duplicate_ngram_chr_fraction_5": 0.0, "prompt_duplicate_ngram_chr_fraction_6": 0.0, "prompt_duplicate_ngram_chr_fraction_7": 0.0, "prompt_duplicate_ngram_chr_fraction_8": 0.0, "prompt_duplicate_ngram_chr_fraction_9": 0.0, "prompt_duplicate_paragraph_chr_fraction": 0.0, "prompt_entropy": 0.4249943780917443, "prompt_flesch_kincaid_grade": 3.997142857142858, "prompt_flesch_reading_ease": 78.87285714285717, "prompt_gunning_fog": 8.514285714285714, "prompt_lix": 28.428571428571427, "prompt_mean_word_length": 3.6875, "prompt_n_characters": 59, "prompt_n_sentences": 2, "prompt_n_stop_words": 7.0, "prompt_n_tokens": 14, "prompt_n_unique_tokens": 14, "prompt_oov_ratio": 0.0, "prompt_passed_quality_check": "True", "prompt_per_word_perplexity": 0.0955988637794089, "prompt_perplexity": 1.5295818204705425, "prompt_pos_prop_ADJ": 0.0, "prompt_pos_prop_ADP": 0.0625, "prompt_pos_prop_ADV": 0.0625, "prompt_pos_prop_AUX": 0.0625, "prompt_pos_prop_CCONJ": 0.0, "prompt_pos_prop_DET": 0.0625, "prompt_pos_prop_INTJ": 0.0, "prompt_pos_prop_NOUN": 0.25, "prompt_pos_prop_NUM": 0.0625, "prompt_pos_prop_PART": 0.0625, "prompt_pos_prop_PRON": 0.0, "prompt_pos_prop_PROPN": 0.0, "prompt_pos_prop_PUNCT": 0.125, "prompt_pos_prop_SCONJ": 0.125, "prompt_pos_prop_SYM": 0.0, "prompt_pos_prop_VERB": 0.125, "prompt_pos_prop_X": 0.0, "prompt_prop_adjacent_dependency_relation_mean": 0.28181818181818186, "prompt_prop_adjacent_dependency_relation_std": 0.08181818181818182, "prompt_proportion_bullet_points": 0.0, "prompt_proportion_ellipsis": 0.0, "prompt_proportion_unique_tokens": 1.0, "prompt_rix": 1.5, "prompt_sentence_length_mean": 7.0, "prompt_sentence_length_median": 7.0, "prompt_sentence_length_std": 3.0, "prompt_syllables_per_token_mean": 1.4285714285714286, "prompt_syllables_per_token_median": 1.0, "prompt_syllables_per_token_std": 0.7284313590846836, "prompt_symbol_to_word_ratio_#": 0.0, "prompt_token_length_mean": 4.071428571428571, "prompt_token_length_median": 3.5, "prompt_token_length_std": 2.5763841138387766, "prompt_top_ngram_chr_fraction_2": 0.0, "prompt_top_ngram_chr_fraction_3": 0.0, "prompt_top_ngram_chr_fraction_4": 0.0, "response_alpha_ratio": 0.7857142857142857, "response_automated_readability_index": 11.4375, "response_coleman_liau_index": 16.030000000000005, "response_dependency_distance_mean": 2.0984405458089666, "response_dependency_distance_std": 1.2368394170758057, "response_doc_length": 70.0, "response_duplicate_line_chr_fraction": 0.0, "response_duplicate_ngram_chr_fraction_10": 0.0, "response_duplicate_ngram_chr_fraction_5": 0.14761904761904762, "response_duplicate_ngram_chr_fraction_6": 0.0, "response_duplicate_ngram_chr_fraction_7": 0.0, "response_duplicate_ngram_chr_fraction_8": 0.0, "response_duplicate_ngram_chr_fraction_9": 0.0, "response_duplicate_paragraph_chr_fraction": 0.0, "response_entropy": 1.925080903607536, "response_flesch_kincaid_grade": 10.533333333333335, "response_flesch_reading_ease": 37.35500000000002, "response_gunning_fog": 14.0, "response_lix": 46.666666666666664, "response_mean_word_length": 5.214285714285714, "response_n_characters": 365, "response_n_sentences": 6, "response_n_stop_words": 26.0, "response_n_tokens": 60, "response_n_unique_tokens": 37, "response_oov_ratio": 0.02857142857142857, "response_passed_quality_check": "True", "response_per_word_perplexity": 0.09793861849416963, "response_perplexity": 6.855703294591874, "response_pos_prop_ADJ": 0.05714285714285714, "response_pos_prop_ADP": 0.1, "response_pos_prop_ADV": 0.04285714285714286, "response_pos_prop_AUX": 0.07142857142857142, "response_pos_prop_CCONJ": 0.05714285714285714, "response_pos_prop_DET": 0.07142857142857142, "response_pos_prop_INTJ": 0.0, "response_pos_prop_NOUN": 0.2571428571428571, "response_pos_prop_NUM": 0.0, "response_pos_prop_PART": 0.02857142857142857, "response_pos_prop_PRON": 0.02857142857142857, "response_pos_prop_PROPN": 0.0, "response_pos_prop_PUNCT": 0.14285714285714285, "response_pos_prop_SCONJ": 0.0, "response_pos_prop_SYM": 0.0, "response_pos_prop_VERB": 0.07142857142857142, "response_pos_prop_X": 0.04285714285714286, "response_prop_adjacent_dependency_relation_mean": 0.4988304093567251, "response_prop_adjacent_dependency_relation_std": 0.017298478897952083, "response_proportion_bullet_points": 0.0, "response_proportion_ellipsis": 0.0, "response_proportion_unique_tokens": 0.6166666666666667, "response_rix": 3.6666666666666665, "response_sentence_length_mean": 10.0, "response_sentence_length_median": 13.0, "response_sentence_length_std": 6.506407098647712, "response_syllables_per_token_mean": 1.8833333333333333, "response_syllables_per_token_median": 1.0, "response_syllables_per_token_std": 1.2528855583101843, "response_symbol_to_word_ratio_#": 0.0, "response_token_length_mean": 5.916666666666667, "response_token_length_median": 6.0, "response_token_length_std": 3.742956347891626, "response_top_ngram_chr_fraction_2": 0.13333333333333333, "response_top_ngram_chr_fraction_3": 0.09285714285714286, "response_top_ngram_chr_fraction_4": 0.14285714285714285 }, "responses": [], "suggestions": [ { "agent": null, "question_name": "response", "score": null, "type": null, "value": "1. Homogeneity: The segment should consist of customers who share similar characteristics and behaviors.\n2. Distinctiveness: The segment should be different from other segments in terms of their characteristics and behaviors.\n3. Stability: The segment should remain relatively stable over time and not change drastically. The characteristics and behaviors of customers within the segment should not change significantly." } ] } ``` While the same record in HuggingFace `datasets` looks as follows: ```json { "context": null, "external_id": null, "metadata": "{\"prompt_flesch_reading_ease\": 78.87285714285717, \"prompt_flesch_kincaid_grade\": 3.997142857142858, \"prompt_gunning_fog\": 8.514285714285714, \"prompt_automated_readability_index\": 1.24642857142857, \"prompt_coleman_liau_index\": 3.9114285714285693, \"prompt_lix\": 28.428571428571427, \"prompt_rix\": 1.5, \"prompt_entropy\": 0.4249943780917443, \"prompt_perplexity\": 1.5295818204705425, \"prompt_per_word_perplexity\": 0.0955988637794089, \"prompt_passed_quality_check\": \"True\", \"prompt_n_stop_words\": 7.0, \"prompt_alpha_ratio\": 0.8125, \"prompt_mean_word_length\": 3.6875, \"prompt_doc_length\": 16.0, \"prompt_symbol_to_word_ratio_#\": 0.0, \"prompt_proportion_ellipsis\": 0.0, \"prompt_proportion_bullet_points\": 0.0, \"prompt_duplicate_line_chr_fraction\": 0.0, \"prompt_duplicate_paragraph_chr_fraction\": 0.0, \"prompt_duplicate_ngram_chr_fraction_5\": 0.0, \"prompt_duplicate_ngram_chr_fraction_6\": 0.0, \"prompt_duplicate_ngram_chr_fraction_7\": 0.0, \"prompt_duplicate_ngram_chr_fraction_8\": 0.0, \"prompt_duplicate_ngram_chr_fraction_9\": 0.0, \"prompt_duplicate_ngram_chr_fraction_10\": 0.0, \"prompt_top_ngram_chr_fraction_2\": 0.0, \"prompt_top_ngram_chr_fraction_3\": 0.0, \"prompt_top_ngram_chr_fraction_4\": 0.0, \"prompt_oov_ratio\": 0.0, \"prompt_dependency_distance_mean\": 2.3181818181818183, \"prompt_dependency_distance_std\": 0.3181818181818181, \"prompt_prop_adjacent_dependency_relation_mean\": 0.28181818181818186, \"prompt_prop_adjacent_dependency_relation_std\": 0.08181818181818182, \"prompt_pos_prop_ADJ\": 0.0, \"prompt_pos_prop_ADP\": 0.0625, \"prompt_pos_prop_ADV\": 0.0625, \"prompt_pos_prop_AUX\": 0.0625, \"prompt_pos_prop_CCONJ\": 0.0, \"prompt_pos_prop_DET\": 0.0625, \"prompt_pos_prop_INTJ\": 0.0, \"prompt_pos_prop_NOUN\": 0.25, \"prompt_pos_prop_NUM\": 0.0625, \"prompt_pos_prop_PART\": 0.0625, \"prompt_pos_prop_PRON\": 0.0, \"prompt_pos_prop_PROPN\": 0.0, \"prompt_pos_prop_PUNCT\": 0.125, \"prompt_pos_prop_SCONJ\": 0.125, \"prompt_pos_prop_SYM\": 0.0, \"prompt_pos_prop_VERB\": 0.125, \"prompt_pos_prop_X\": 0.0, \"prompt_token_length_mean\": 4.071428571428571, \"prompt_token_length_median\": 3.5, \"prompt_token_length_std\": 2.5763841138387766, \"prompt_sentence_length_mean\": 7.0, \"prompt_sentence_length_median\": 7.0, \"prompt_sentence_length_std\": 3.0, \"prompt_syllables_per_token_mean\": 1.4285714285714286, \"prompt_syllables_per_token_median\": 1.0, \"prompt_syllables_per_token_std\": 0.7284313590846836, \"prompt_n_tokens\": 14, \"prompt_n_unique_tokens\": 14, \"prompt_proportion_unique_tokens\": 1.0, \"prompt_n_characters\": 59, \"prompt_n_sentences\": 2, \"response_flesch_reading_ease\": 37.35500000000002, \"response_flesch_kincaid_grade\": 10.533333333333335, \"response_gunning_fog\": 14.0, \"response_automated_readability_index\": 11.4375, \"response_coleman_liau_index\": 16.030000000000005, \"response_lix\": 46.666666666666664, \"response_rix\": 3.6666666666666665, \"response_entropy\": 1.925080903607536, \"response_perplexity\": 6.855703294591874, \"response_per_word_perplexity\": 0.09793861849416963, \"response_passed_quality_check\": \"True\", \"response_n_stop_words\": 26.0, \"response_alpha_ratio\": 0.7857142857142857, \"response_mean_word_length\": 5.214285714285714, \"response_doc_length\": 70.0, \"response_symbol_to_word_ratio_#\": 0.0, \"response_proportion_ellipsis\": 0.0, \"response_proportion_bullet_points\": 0.0, \"response_duplicate_line_chr_fraction\": 0.0, \"response_duplicate_paragraph_chr_fraction\": 0.0, \"response_duplicate_ngram_chr_fraction_5\": 0.14761904761904762, \"response_duplicate_ngram_chr_fraction_6\": 0.0, \"response_duplicate_ngram_chr_fraction_7\": 0.0, \"response_duplicate_ngram_chr_fraction_8\": 0.0, \"response_duplicate_ngram_chr_fraction_9\": 0.0, \"response_duplicate_ngram_chr_fraction_10\": 0.0, \"response_top_ngram_chr_fraction_2\": 0.13333333333333333, \"response_top_ngram_chr_fraction_3\": 0.09285714285714286, \"response_top_ngram_chr_fraction_4\": 0.14285714285714285, \"response_oov_ratio\": 0.02857142857142857, \"response_dependency_distance_mean\": 2.0984405458089666, \"response_dependency_distance_std\": 1.2368394170758057, \"response_prop_adjacent_dependency_relation_mean\": 0.4988304093567251, \"response_prop_adjacent_dependency_relation_std\": 0.017298478897952083, \"response_pos_prop_ADJ\": 0.05714285714285714, \"response_pos_prop_ADP\": 0.1, \"response_pos_prop_ADV\": 0.04285714285714286, \"response_pos_prop_AUX\": 0.07142857142857142, \"response_pos_prop_CCONJ\": 0.05714285714285714, \"response_pos_prop_DET\": 0.07142857142857142, \"response_pos_prop_INTJ\": 0.0, \"response_pos_prop_NOUN\": 0.2571428571428571, \"response_pos_prop_NUM\": 0.0, \"response_pos_prop_PART\": 0.02857142857142857, \"response_pos_prop_PRON\": 0.02857142857142857, \"response_pos_prop_PROPN\": 0.0, \"response_pos_prop_PUNCT\": 0.14285714285714285, \"response_pos_prop_SCONJ\": 0.0, \"response_pos_prop_SYM\": 0.0, \"response_pos_prop_VERB\": 0.07142857142857142, \"response_pos_prop_X\": 0.04285714285714286, \"response_token_length_mean\": 5.916666666666667, \"response_token_length_median\": 6.0, \"response_token_length_std\": 3.742956347891626, \"response_sentence_length_mean\": 10.0, \"response_sentence_length_median\": 13.0, \"response_sentence_length_std\": 6.506407098647712, \"response_syllables_per_token_mean\": 1.8833333333333333, \"response_syllables_per_token_median\": 1.0, \"response_syllables_per_token_std\": 1.2528855583101843, \"response_n_tokens\": 60, \"response_n_unique_tokens\": 37, \"response_proportion_unique_tokens\": 0.6166666666666667, \"response_n_characters\": 365, \"response_n_sentences\": 6}", "prompt": "How to tell if a customer segment is well segmented? In 3 bullet points.", "response": [], "response-suggestion": "1. Homogeneity: The segment should consist of customers who share similar characteristics and behaviors.\n2. Distinctiveness: The segment should be different from other segments in terms of their characteristics and behaviors.\n3. Stability: The segment should remain relatively stable over time and not change drastically. The characteristics and behaviors of customers within the segment should not change significantly.", "response-suggestion-metadata": { "agent": null, "score": null, "type": null } } ``` ### Data Fields Among the dataset fields, we differentiate between the following: * **Fields:** These are the dataset records themselves, for the moment just text fields are supported. These are the ones that will be used to provide responses to the questions. * **prompt** is of type `text`. * (optional) **context** is of type `text`. * **Questions:** These are the questions that will be asked to the annotators. They can be of different types, such as `RatingQuestion`, `TextQuestion`, `LabelQuestion`, `MultiLabelQuestion`, and `RankingQuestion`. * **response** is of type `text`, and description "Write the response to the instruction.". * **Suggestions:** As of Argilla 1.13.0, the suggestions have been included to provide the annotators with suggestions to ease or assist during the annotation process. Suggestions are linked to the existing questions, are always optional, and contain not just the suggestion itself, but also the metadata linked to it, if applicable. * (optional) **response-suggestion** is of type `text`. Additionally, we also have two more fields that are optional and are the following: * **✨ NEW** **metadata:** This is an optional field that can be used to provide additional information about the dataset record. This can be useful to provide additional context to the annotators, or to provide additional information about the dataset record itself. For example, you can use this to provide a link to the original source of the dataset record, or to provide additional information about the dataset record itself, such as the author, the date, or the source. The metadata is always optional, and can be potentially linked to the `metadata_properties` defined in the dataset configuration file in `argilla.yaml`. * **external_id:** This is an optional field that can be used to provide an external ID for the dataset record. This can be useful if you want to link the dataset record to an external resource, such as a database or a file. ### Data Splits The dataset contains a single split, which is `train`. ## Dataset Creation ### Curation Rationale [More Information Needed] ### Source Data #### Initial Data Collection and Normalization [More Information Needed] #### Who are the source language producers? [More Information Needed] ### Annotations #### Annotation guidelines This is a supervised fine-tuning dataset that contains instructions. Please write the response to the instruction in the response field. Take the context into account when writing the response. #### Annotation process [More Information Needed] #### Who are the annotators? [More Information Needed] ### Personal and Sensitive Information [More Information Needed] ## Considerations for Using the Data ### Social Impact of Dataset [More Information Needed] ### Discussion of Biases [More Information Needed] ### Other Known Limitations [More Information Needed] ## Additional Information ### Dataset Curators [More Information Needed] ### Licensing Information [More Information Needed] ### Citation Information [More Information Needed] ### Contributions [More Information Needed]

The dataset sharegpt-text-descriptives is created using Argilla, an open-source tool for data annotation and curation. It includes a configuration file in Argilla format and dataset records compatible with HuggingFaces `datasets` library. The dataset supports multiple NLP tasks based on its configuration, which includes fields like prompt and context, questions like response, and metadata for additional context. The dataset does not have a leaderboard and language information is not specified. An example instance shows detailed metadata about text characteristics and a suggested response.
提供机构:
argilla
原始信息汇总

数据集卡片 for sharegpt-text-descriptives

数据集描述

数据集概述

该数据集包含以下内容:

  • 符合 Argilla 数据集格式的配置文件 argilla.yaml,用于在使用 FeedbackDataset.from_huggingface 方法时配置数据集。
  • 兼容 HuggingFace datasets 格式的数据记录,这些记录在使用 FeedbackDataset.from_huggingface 时会自动加载,也可以通过 datasets 库独立加载。
  • 用于构建和整理数据集的标注指南(如果已在 Argilla 中定义)。

加载方式

  • 使用 Argilla: python import argilla as rg ds = rg.FeedbackDataset.from_huggingface("argilla/sharegpt-text-descriptives")

  • 使用 datasets: python from datasets import load_dataset ds = load_dataset("argilla/sharegpt-text-descriptives")

支持的任务和排行榜

该数据集可以包含多个字段、问题和响应,因此可用于不同的 NLP 任务,具体取决于配置。数据集结构在数据集结构部分中描述。

语言

[更多信息待补充]

数据集结构

数据在 Argilla 中的结构

数据集在 Argilla 中包含以下部分:

  • 字段 (Fields):当前仅支持文本字段。

    字段名称 标题 类型 必填 Markdown
    prompt Prompt text True True
    context Context text False True
  • 问题 (Questions):这些是向标注者提出的问题,可以是不同类型,如评分、文本、标签选择、多标签选择或排序。

    问题名称 标题 类型 必填 描述 值/标签
    response Response text True 写对指令的响应。 N/A
  • 建议 (Suggestions):这些是人为或机器生成的建议,以协助标注者在标注过程中使用。

    建议名称 类型
    response-suggestion text
  • ✨ 新功能 元数据 (Metadata):这是一个可选字段,用于提供有关数据记录的额外信息。

数据实例

一个数据实例在 Argilla 中的示例如下: json { "external_id": null, "fields": { "prompt": "How to tell if a customer segment is well segmented? In 3 bullet points." }, "metadata": { "prompt_alpha_ratio": 0.8125, "prompt_automated_readability_index": 1.24642857142857, "prompt_coleman_liau_index": 3.9114285714285693, "prompt_dependency_distance_mean": 2.3181818181818183, "prompt_dependency_distance_std": 0.3181818181818181, "prompt_doc_length": 16.0, "prompt_duplicate_line_chr_fraction": 0.0, "prompt_duplicate_ngram_chr_fraction_10": 0.0, "prompt_duplicate_ngram_chr_fraction_5": 0.0, "prompt_duplicate_ngram_chr_fraction_6": 0.0, "prompt_duplicate_ngram_chr_fraction_7": 0.0, "prompt_duplicate_ngram_chr_fraction_8": 0.0, "prompt_duplicate_ngram_chr_fraction_9": 0.0, "prompt_duplicate_paragraph_chr_fraction": 0.0, "prompt_entropy": 0.4249943780917443, "prompt_flesch_kincaid_grade": 3.997142857142858, "prompt_flesch_reading_ease": 78.87285714285717, "prompt_gunning_fog": 8.514285714285714, "prompt_lix": 28.428571428571427, "prompt_mean_word_length": 3.6875, "prompt_n_characters": 59, "prompt_n_sentences": 2, "prompt_n_stop_words": 7.0, "prompt_n_tokens": 14, "prompt_n_unique_tokens": 14, "prompt_oov_ratio": 0.0, "prompt_passed_quality_check": "True", "prompt_per_word_perplexity": 0.0955988637794089, "prompt_perplexity": 1.5295818204705425, "prompt_pos_prop_ADJ": 0.0, "prompt_pos_prop_ADP": 0.0625, "prompt_pos_prop_ADV": 0.0625, "prompt_pos_prop_AUX": 0.0625, "prompt_pos_prop_CCONJ": 0.0, "prompt_pos_prop_DET": 0.0625, "prompt_pos_prop_INTJ": 0.0, "prompt_pos_prop_NOUN": 0.25, "prompt_pos_prop_NUM": 0.0625, "prompt_pos_prop_PART": 0.0625, "prompt_pos_prop_PRON": 0.0, "prompt_pos_prop_PROPN": 0.0, "prompt_pos_prop_PUNCT": 0.125, "prompt_pos_prop_SCONJ": 0.125, "prompt_pos_prop_SYM": 0.0, "prompt_pos_prop_VERB": 0.125, "prompt_pos_prop_X": 0.0, "prompt_prop_adjacent_dependency_relation_mean": 0.28181818181818186, "prompt_prop_adjacent_dependency_relation_std": 0.08181818181818182, "prompt_proportion_bullet_points": 0.0, "prompt_proportion_ellipsis": 0.0, "prompt_proportion_unique_tokens": 1.0, "prompt_rix": 1.5, "prompt_sentence_length_mean": 7.0, "prompt_sentence_length_median": 7.0, "prompt_sentence_length_std": 3.0, "prompt_syllables_per_token_mean": 1.4285714285714286, "prompt_syllables_per_token_median": 1.0, "prompt_syllables_per_token_std": 0.7284313590846836, "prompt_symbol_to_word_ratio_#": 0.0, "prompt_token_length_mean": 4.071428571428571, "prompt_token_length_median": 3.5, "prompt_token_length_std": 2.5763841138387766, "prompt_top_ngram_chr_fraction_2": 0.0, "prompt_top_ngram_chr_fraction_3": 0.0, "prompt_top_ngram_chr_fraction_4": 0.0, "response_alpha_ratio": 0.7857142857142857, "response_automated_readability_index": 11.4375, "response_coleman_liau_index": 16.030000000000005, "response_dependency_distance_mean": 2.0984405458089666, "response_dependency_distance_std": 1.2368394170758057, "response_doc_length": 70.0, "response_duplicate_line_chr_fraction": 0.0, "response_duplicate_ngram_chr_fraction_10": 0.0, "response_duplicate_ngram_chr_fraction_5": 0.14761904761904762, "response_duplicate_ngram_chr_fraction_6": 0.0, "response_duplicate_ngram_chr_fraction_7": 0.0, "response_duplicate_ngram_chr_fraction_8": 0.0, "response_duplicate_ngram_chr_fraction_9": 0.0, "response_duplicate_paragraph_chr_fraction": 0.0, "response_entropy": 1.925080903607536, "response_flesch_kincaid_grade": 10.533333333333335, "response_flesch_reading_ease": 37.35500000000002, "response_gunning_fog": 14.0, "response_lix": 46.666666666666664, "response_mean_word_length": 5.214285714285714, "response_n_characters": 365, "response_n_sentences": 6, "response_n_stop_words": 26.0, "response_n_tokens": 60, "response_n_unique_tokens": 37, "response_oov_ratio": 0.02857142857142857, "response_passed_quality_check": "True", "response_per_word_perplexity": 0.09793861849416963, "response_perplexity": 6.855703294591874, "response_pos_prop_ADJ": 0.05714285714285714, "response_pos_prop_ADP": 0.1, "response_pos_prop_ADV": 0.04285714285714286, "response_pos_prop_AUX": 0.07142857142857142, "response_pos_prop_CCONJ": 0.05714285714285714, "response_pos_prop_DET": 0.07142857142857142, "response_pos_prop_INTJ": 0.0, "response_pos_prop_NOUN": 0.2571428571428571, "response_pos_prop_NUM": 0.0, "response_pos_prop_PART": 0.02857142857142857, "response_pos_prop_PRON": 0.02857142857142857, "response_pos_prop_PROPN": 0.0, "response_pos_prop_PUNCT": 0.14285714285714285, "response_pos_prop_SCONJ": 0.0, "response_pos_prop_SYM": 0.0, "response_pos_prop_VERB": 0.07142857142857142, "response_pos_prop_X": 0.04285714285714286, "response_prop_adjacent_dependency_relation_mean": 0.4988304093567251, "response_prop_adjacent_dependency_relation_std": 0.017298478897952083, "response_proportion_bullet_points": 0.0, "response_proportion_ellipsis": 0.0, "response_proportion_unique_tokens": 0.6166666666666667, "response_rix": 3.6666666666666665, "response_sentence_length_mean": 10.0, "response_sentence_length_median": 13.0, "response_sentence_length_std": 6.506407098647712, "response_syllables_per_token_mean": 1.8833333333333333, "response_syllables_per_token_median": 1.0, "response_syllables_per_token_std": 1.2528855583101843, "response_symbol_to_word_ratio_#": 0.0, "response_token_length_mean": 5.916666666666667, "response_token_length_median": 6.0, "response_token_length_std": 3.742956347891626, "response_top_ngram_chr_fraction_2": 0.13333333333333333, "response_top_ngram_chr_fraction_3": 0.09285714285714286, "response_top_ngram_chr_fraction_4": 0.14285714285714285 }, "responses": [], "suggestions": [ { "agent": null, "question_name": "response", "score": null, "type": null, "value": "1. Homogeneity: The segment should consist of customers who share similar characteristics and behaviors. 2. Distinctiveness: The segment should be different from other segments in terms of their characteristics and behaviors. 3. Stability: The segment should remain relatively stable over time and not change drastically. The characteristics and behaviors of customers within the segment should not change significantly." } ] }

数据字段

数据集字段区分如下:

  • 字段 (Fields):当前仅支持文本字段。

    • prompt 类型为 text
    • (可选) context 类型为 text
  • 问题 (Questions):这些是向标注者提出的问题,可以是不同类型。

    • response 类型为 text,描述为“写对指令的响应。”。
  • 建议 (Suggestions):这些是人为或机器生成的建议,以协助标注者在标注过程中使用。

    • (可选) response-suggestion 类型为 text

此外,还有两个可选字段:

  • ✨ 新功能 元数据 (Metadata):这是一个可选字段,用于提供有关数据记录的额外信息。
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作