kajuma/ABEJA-CC-JA-edu
收藏Hugging Face2025-03-02 更新2025-08-30 收录
下载链接:
https://hf-mirror.com/datasets/kajuma/ABEJA-CC-JA-edu
下载链接
链接失效反馈官方服务:
资源简介:
ABEJA-CC-JA-edu数据集是由ABEJA公司发布的ABEJA-CC-JA数据集经过教育分类器模型筛选得到的。该数据集包含三个子集,分别是10%、30%和50%,代表不同比例的文本数据,这些文本数据是按照LLM-based classifier的分数排序后,使用不同阈值筛选出的前10%、30%和50%的文本。每个子集都包括url、content和llm_score三个字段,其中url是文本的来源链接,content是文本内容,llm_score是文本的LLM-based classifier分数。
The ABEJA-CC-JA-edu dataset is derived from the ABEJA-CC-JA dataset published by ABEJA Inc., filtered through the tokyotech-llm/edu-classifier model. This dataset consists of three subsets: 10%, 30%, and 50%, which represent different proportions of text data, selected by different threshold values from the top scores of the LLM-based classifier. Each subset includes three fields: url, content, and llm_score, where url is the source link of the text, content is the text itself, and llm_score is the score of the text given by the LLM-based classifier.
提供机构:
kajuma



