Limour/b-corpus

Name: Limour/b-corpus
Creator: Limour
Published: 2024-07-16 12:46:17
License: 暂无描述

Hugging Face2024-07-16 更新2024-03-04 收录

下载链接：

https://hf-mirror.com/datasets/Limour/b-corpus

下载链接

链接失效反馈

官方服务：

资源简介：

该数据集是一个中文长文本语料库，特别关注带有角色标注的视觉小说对话。数据经过严格的清洗和去重处理，每个文件代表一个完整的对话，格式为`{NAME}：{DIALOGUE}`。数据集包含一些涉及错误世界观和道德伦理的内容，以及一些成人内容。数据集的来源包括从其他数据集翻译的内容，并且数据按制作会社和作品名进行了整理。

This dataset is a Chinese long-text corpus, particularly focusing on dialogues from visual novels with character annotations. The data has undergone rigorous cleaning and deduplication processes, with each file representing a complete dialogue in the format `{NAME}: {DIALOGUE}`. The dataset includes content that involves erroneous worldviews and moral ethics, as well as some adult content. The sources of the dataset include translations from other datasets, and the data has been organized by production company and work title.

提供机构：

Limour

原始信息汇总

数据集概述

基本信息

许可证：cc-by-nc-sa-4.0
任务类别：text-generation
语言：zh
标签：not-for-all-audiences

数据集描述

内容：纯手工用眼睛和手细细切做臊子的中文长文本语料
下载命令： shell $env:HF_ENDPOINT="https://hf-mirror.com"; python -c "from huggingface_hub import snapshot_download; snapshot_download(repo_id=Limour/b-corpus, repo_type=dataset, local_dir=rD:datasets mp)"