somosnlp-hackathon-2023/ask2democracy-cfqa-salud-pension

Name: somosnlp-hackathon-2023/ask2democracy-cfqa-salud-pension
Creator: somosnlp-hackathon-2023
Published: 2023-04-11 03:08:45
License: 暂无描述

Hugging Face2023-04-11 更新2024-05-25 收录

下载链接：

https://hf-mirror.com/datasets/somosnlp-hackathon-2023/ask2democracy-cfqa-salud-pension

下载链接

链接失效反馈

官方服务：

资源简介：

--- dataset_info: features: - name: instruction dtype: string - name: input dtype: string - name: output dtype: string - name: topics sequence: string splits: - name: train num_bytes: 7711587 num_examples: 3805 download_size: 880079 dataset_size: 7711587 --- ## About Ask2Democracy-cfqa-salud-pension Ask2Democracy-cfqa-salud-pension is an instructional, context-based generative dataset created using the text reforms of Colombian health and pension systems in Spanish(March 23). The text was pre-processed and augmented using the chat-gpt-turbo API. <div align="right"> Creado por Jorge Henao 🇨🇴 <a href="https://twitter.com/jhenaotw" target='_blank'>Twitter</a> <a href="https://www.linkedin.com/in/henaojorge" target='_blank'>LinkedIn</a> <a href="https://linktr.ee/jorgehenao" target='_blank'>Linktree</a> <br> Con el apoyo de David Torres 🇨🇴 <a href="https://twitter.com/davinci137" target='_blank'>Twitter</a> <a href="https://github.com/datorresb" target='_blank'>LinkedIn</a> </div> Different prompt engineering experiments were conducted to obtain high-quality results in Spanish language, while balancing simplicity of language and source-based generation. Two styles of questions were created: * Conceptual questions about the text's definitions and concepts related to health and pension reforms. * First-person questions that address concerns of regular people from diverse population groups. Conceptual questions and answers are useful for understanding complex concepts commonly found in public documents. These types of questions generally ask for easy-to-understand explanations of a particular concept or subject, such as "What is the definition of concept X or Y?" On the other hand, first-person questions are designed to address specific concerns of various population groups, including young adults, middle-aged adults, and older adults, as well as those from different gender and labor backgrounds. These questions are tailored to address the specific needs and interests of these groups with regards to the text's topics. These kind of questions looks like this "How x concept can affect me as an y person?, like older adult, labour woman, young adult, etc ## Bias The bias of the text generated by the Chat-GPT-Turbo model has not been evaluated, nor has the accuracy of OpenAI's knowledge about Colombia been verified. Nevertheless, to reduce the risk of model hallucinations and biases, the text was pre-processed, and specific source based contexts were provided on every API call. In addition, source based text was explicity insctucted during the prompt engineering exprimentation for each topic: Health and Pension reforms. ## Dataset Structure ### Data Instances #### plain_text An example of conceptual questions looks like this. ``` { "instruction": "Given the context please answer the question. Context:'ARTÍCULO 73. SISTEMA NACIONAL DE PROTECCIÓN SOCIAL...", "input": "¿Qué es el Sistema Nacional de Protección Social Integral para la Vejez?", "output": "El Sistema Nacional de Protección Social Integral para la Vejez es una iniciativa del gobierno colombiano para garantizar la protección social de las personas mayores. Incluye el Consejo Nacional de Protección a la Vejez y el Comité Técnico, que trabajan para desarrollar políticas y programas que mejoren la calidad de vida de las personas mayores.", "topics": ["protección social", "vejez", "políticas públicas"] }, ``` An example of first person questions looks like this. ``` { "instruction": "Given the context please answer the question. Context:'Artículo 1 Reforma Pensional...' "input": "¿Cómo me beneficia el Sistema Público Único Integrado de Información en Salud (SPUIIS) como adulto de mediana edad?", "output": "El SPUIIS permite el procesamiento digital de imágenes y otras tecnologías de última generación, lo que facilita el diagnóstico y tratamiento de enfermedades, mejorando la calidad de vida de las personas.", "topics": ["Diagnóstico y tratamiento", "Tecnología de última generación", "Calidad de vida"] } ``` ### Data Fields The data fields are the same among all splits. #### plain_text - `instruction`: a `string` feature. - `input`: a `string` feature. - `output`: a `string` feature.

提供机构：

somosnlp-hackathon-2023

原始信息汇总

数据集概述

数据集信息

特征:
- instruction: 字符串类型
- input: 字符串类型
- output: 字符串类型
- topics: 字符串序列类型
分割:
- train: 包含3805个样本，总字节数为7711587
下载大小: 880079字节
数据集大小: 7711587字节

数据集描述

数据集名称: Ask2Democracy-cfqa-salud-pension
创建目的: 使用哥伦比亚健康和养老金系统的文本改革创建的西班牙语上下文生成数据集（2023年3月）。
数据预处理: 使用chat-gpt-turbo API进行预处理和增强。
问题类型:
- 概念性问题: 关于文本定义和与健康及养老金改革相关的概念问题。
- 第一人称问题: 针对不同人口群体（如年轻人、中年人、老年人以及不同性别和劳动背景的人）的具体关注点。

数据实例

概念性问题示例: json { "instruction": "Given the context please answer the question. Context:ARTÍCULO 73. SISTEMA NACIONAL DE PROTECCIÓN SOCIAL...", "input": "¿Qué es el Sistema Nacional de Protección Social Integral para la Vejez?", "output": "El Sistema Nacional de Protección Social Integral para la Vejez es una iniciativa del gobierno colombiano para garantizar la protección social de las personas mayores. Incluye el Consejo Nacional de Protección a la Vejez y el Comité Técnico, que trabajan para desarrollar políticas y programas que mejoren la calidad de vida de las personas mayores.", "topics": ["protección social", "vejez", "políticas públicas"] }
第一人称问题示例: json { "instruction": "Given the context please answer the question. Context:Artículo 1 Reforma Pensional... "input": "¿Cómo me beneficia el Sistema Público Único Integrado de Información en Salud (SPUIIS) como adulto de mediana edad?", "output": "El SPUIIS permite el procesamiento digital de imágenes y otras tecnologías de última generación, lo que facilita el diagnóstico y tratamiento de enfermedades, mejorando la calidad de vida de las personas.", "topics": ["Diagnóstico y tratamiento", "Tecnología de última generación", "Calidad de vida"] }

数据字段

instruction: 字符串类型
input: 字符串类型
output: 字符串类型

5,000+

优质数据集

54 个

任务类型

进入经典数据集