five

huggingface-KREW/KoCultre-Descriptions

收藏
Hugging Face2025-06-05 更新2025-07-05 收录
下载链接:
https://hf-mirror.com/datasets/huggingface-KREW/KoCultre-Descriptions
下载链接
链接失效反馈
官方服务:
资源简介:
KoCulture-Descriptions 数据集旨在通过提供对韩国新词、俚语、表情包和相关文化现象的详细解释,来提高大型语言模型(LLM)对韩国语言的理解和生成能力。该数据集包含 title、content、date、category 和 source 等特征,并有一个包含 503 个例子的训练分割。数据集由 Hugging Face KREW 策划,并根据 CC BY-NC-SA 4.0 许可证授权。数据集的结构旨在包括对韩国单词和短语的详细解释,包括它们的起源、用法和文化背景,旨在提供韩国语言和文化的全面资源。数据集的创建涉及从 TrendAward 和 Namuwiki 收集数据,使用多个 LLM 处理,以及由 Hugging Face KREW 成员进行人工策展。数据集不包含个人身份信息(PII),并且旨在避免有害内容。然而,数据集在关注在线社区和媒体方面存在局限性和偏见,数据可能过时,策展具有主观性。建议用户负责任地使用数据集,并报告任何问题。

The KoCulture-Descriptions dataset aims to enhance large language models (LLM) understanding and generation of the Korean language by providing detailed explanations of new words, slang, memes, and related cultural phenomena. The dataset includes features such as title, content, date, category, and source, with a training split containing 503 examples. Curated by Hugging Face KREW and licensed under CC BY-NC-SA 4.0, the dataset is structured to include comprehensive explanations of Korean words and phrases, their origins, usage, and cultural contexts. The datasets creation involves data collection from TrendAward and Namuwiki, processing with multiple LLMs, and manual curation by Hugging Face KREW members. Free of personal identifiable information (PII) and designed to avoid harmful content, the dataset has limitations and biases related to its focus on online communities and media, potential for data obsolescence, and the subjective nature of curation. Users are advised to use the dataset responsibly and report any issues.
提供机构:
huggingface-KREW
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作