five

yunjaeys/Contextual_Response_Evaluation_for_ESL_and_ASD_Support

收藏
Hugging Face2024-01-10 更新2024-03-04 收录
下载链接:
https://hf-mirror.com/datasets/yunjaeys/Contextual_Response_Evaluation_for_ESL_and_ASD_Support
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: - mit multilinguality: - monolingual size_categories: - 1K<n<10K source_datasets: - original task_categories: - text-generation task_ids: - language-modeling tags: - asd - autism - esl - english_second_language - NLP - second_language - phi-2 - openassistant_reward pretty_name: Contextual Response Evaluation for ESL and ASD Support💜💬🌐 --- # Dataset Card for "Contextual Response Evaluation for ESL and ASD Support💜💬🌐"" ## Dataset Description 📖 ### Dataset Summary 📝 Curated by Eric Soderquist, this dataset is a collection of English prompts and responses generated by the Phi-2 model, designed to evaluate and improve NLP models for supporting ESL (English as a Second Language) and ASD (Autism Spectrum Disorder) user bases. Each prompt is paired with multiple AI-generated responses and evaluated using a reward model to assess their relevance and quality. ### Supported Tasks and Leaderboards 🎯 - `text-generation`: This dataset is intended to train and refine language models for generating sensitive and context-aware responses. - `language-modeling`: It can also be used for scoring the quality of language model responses to support ESL and ASD individuals. ### Languages 🗣 The dataset is monolingual and written in English. ## Dataset Structure 🏗 ### Data Instances 📜 Each data instance contains a prompt, multiple AI-generated responses to that prompt, and scores reflecting the quality of each response. ### Data Fields 🏛 - `prompt`: a string containing the original English prompt. - `responses`: an array of strings containing responses generated by the language model. - `scores`: an array of floats representing the reward model's evaluation of each response. ### Data Splits 🔢 This dataset is not divided into traditional splits and consists of one complete set for evaluation purposes. ## Dataset Creation 🛠 ### Curation Rationale 🤔 The dataset was curated with the goal of advancing NLP technologies to better serve ESL and ASD communities, offering a resource to evaluate and enhance the sensitivity of language models in understanding and generating responses that cater to the unique needs of these groups. ### Source Data 🗃 #### Initial Data Collection and Normalization Data was generated using the Phi-2 model in response to carefully crafted prompts, aiming to cover a range of contexts and challenges faced by ESL and ASD individuals. #### Annotations 🛑 The dataset includes scores from a reward model, providing an evaluation based on the model's perceived quality and appropriateness of the responses. ### Personal and Sensitive Information 🛑 Responses are generated and do not contain any real personal or sensitive information. ## Considerations for Using the Data ⚖️ ### Social Impact of the Dataset 🌍 This dataset has the potential to impact the development of inclusive language models that are attuned to the nuances of communication required by ESL and ASD individuals. ### Discussion of Biases 🧐 As with any language model, biases present in the training data of the Phi-2 model may be reflected in the responses. ### Other Known Limitations 🚧 The reward model's scores are based on its own training data and may not cover the full scope of human evaluative diversity. ## Additional Information 📚 ### Dataset Curator 👥 This dataset was curated by Eric Soderquist with the intent to foster developments in NLP that can adapt to and support the diverse linguistic and communicative needs of ESL and ASD communities. ### Licensing Information ©️ The dataset is made available under the MIT license. ### Citation Information 📢 If you use this dataset in your research, please cite it as follows: ```bibtex @misc{contextual_response_evaluation, author = {Soderquist, Eric}, title = {Contextual Response Evaluation for ESL and ASD Support}, year = {2024} } ``` ### Contributions 👏 Contributions to further develop and expand this dataset are welcome.
提供机构:
yunjaeys
原始信息汇总

数据集卡片 for "Contextual Response Evaluation for ESL and ASD Support💜💬🌐"

数据集描述 📖

数据集摘要 📝

由Eric Soderquist策划,该数据集是Phi-2模型生成的英语提示和响应的集合,旨在评估和改进支持ESL(英语作为第二语言)和ASD(自闭症谱系障碍)用户群体的NLP模型。每个提示都与多个AI生成的响应配对,并使用奖励模型评估其相关性和质量。

支持的任务和排行榜 🎯

  • text-generation: 该数据集旨在训练和改进生成敏感和上下文感知响应的语言模型。
  • language-modeling: 它还可以用于评估语言模型响应的质量,以支持ESL和ASD个体。

语言 🗣

该数据集是单语的,使用英语编写。

数据集结构 🏗

数据实例 📜

每个数据实例包含一个提示、针对该提示的多个AI生成的响应以及反映每个响应质量的分数。

数据字段 🏛

  • prompt: 包含原始英语提示的字符串。
  • responses: 包含语言模型生成的响应的字符串数组。
  • scores: 表示奖励模型对每个响应评估的浮点数数组。

数据分割 🔢

该数据集未分成传统的分割,而是由一个完整的集合组成,用于评估目的。

数据集创建 🛠

策划理由 🤔

该数据集的策划旨在推动NLP技术更好地服务于ESL和ASD社区,提供一个资源来评估和增强语言模型在理解和生成符合这些群体独特需求的响应方面的敏感性。

源数据 🗃

初始数据收集和规范化

数据是使用Phi-2模型对精心设计的提示生成的,旨在涵盖ESL和ASD个体面临的各种上下文和挑战。

注释 🛑

数据集包括来自奖励模型的分数,提供基于模型感知质量和适当性的响应评估。

个人和敏感信息 🛑

响应是生成的,不包含任何真实的个人或敏感信息。

使用数据的考虑 ⚖️

数据集的社会影响 🌍

该数据集有可能影响包容性语言模型的发展,这些模型能够适应ESL和ASD个体所需的沟通细微差别。

偏见的讨论 🧐

与任何语言模型一样,Phi-2模型训练数据中存在的偏见可能会反映在响应中。

其他已知限制 🚧

奖励模型的分数基于其自身的训练数据,可能无法涵盖人类评估多样性的全部范围。

附加信息 📚

数据集策展人 👥

该数据集由Eric Soderquist策划,旨在促进能够适应和支持ESL和ASD社区多样语言和沟通需求的NLP发展。

许可信息 ©️

该数据集在MIT许可下提供。

引用信息 📢

如果您在研究中使用此数据集,请按以下方式引用:

bibtex @misc{contextual_response_evaluation, author = {Soderquist, Eric}, title = {Contextual Response Evaluation for ESL and ASD Support}, year = {2024} }

贡献 👏

欢迎对进一步发展和扩展此数据集的贡献。

5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作