JeanIbarz/count_letters_in_word_base

Name: JeanIbarz/count_letters_in_word_base
Creator: JeanIbarz
Published: 2024-11-12 21:22:44
License: 暂无描述

Hugging Face2024-11-12 更新2024-12-14 收录

下载链接：

https://hf-mirror.com/datasets/JeanIbarz/count_letters_in_word_base

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是通过使用nltk词汇库和自定义脚本生成的，旨在创建需要计算单词中字母出现次数的任务。每个示例都提示语言模型计算单词中特定字母的出现次数。为了增加复杂性，10%的情况下会包含一个不在单词中出现的字母。数据集结构提供了`chosen`（正确计数）和`rejected`（错误计数）的完成情况。`rejected`中的错误在{-2, -1, 1, 2}范围内随机选择，并调整以确保计数保持有效（即没有负值或超过单词长度的计数）。该数据集的主要目标是通过强化基本认知技能的任务来微调语言模型。通过专注于通常在儿童早期学习的任务，如字母计数，我们旨在提高模型的整体推理和语言理解能力。

This dataset was generated using the nltk words corpus and a custom script to create tasks that require counting letters in words. Each example prompts the language model to count occurrences of specific letters within words. For added complexity, in 10% of cases, a letter that does not appear in the word is included. The dataset is structured to provide both `chosen` (correct counts) and `rejected` (incorrect counts) completions. Errors in the `rejected` completions are randomly selected within the range of {-2, -1, 1, 2} and adjusted to ensure that the counts remain valid (i.e., no negative values or counts that exceed the word length). The primary goal of this dataset is to fine-tune language models for tasks that reinforce basic cognitive skills. By focusing on tasks typically learned in early childhood, such as letter counting, we aim to improve models overall reasoning and language comprehension capabilities.

提供机构：

JeanIbarz

5,000+

优质数据集

54 个

任务类型

进入经典数据集