daikin-industries-ltd/ja-fineweb-2-hvac-fastText-scored-v5
收藏Hugging Face2025-12-22 更新2026-01-03 收录
下载链接:
https://hf-mirror.com/datasets/daikin-industries-ltd/ja-fineweb-2-hvac-fastText-scored-v5
下载链接
链接失效反馈官方服务:
资源简介:
本数据集是基于日语空調(HVAC)相关文本数据,通过FastText分类和LLM质量评分处理的数据集。它包含了文本内容、URL、来源、日期、语言信息、语言评分、FastText分类评分、LLM质量评分及其依据。数据集筛选了FastText评分较高的文档,并进行了LLM的详细质量评估。数据集统计显示共有200,000条记录,LLM评分分布从1到5分不等,平均分为2.35。FastText评分用于判断内容是否与空調相关,LLM评分则基于空調技术教育的实用性进行5级评分。数据集适用于空調技术相关的文本分析和研究。
This dataset is based on Japanese HVAC-related text data, processed with FastText classification and LLM quality scoring. It includes text content, URL, source, date, language information, language score, FastText classification score, LLM quality score, and its rationale. The dataset extracts documents with higher FastText scores and conducts detailed quality evaluation by LLM. The dataset statistics show a total of 200,000 records, with LLM scores ranging from 1 to 5 and an average score of 2.35. FastText scores are used to determine whether the content is related to HVAC, while LLM scores are based on the utility of HVAC technical education with a 5-level rating. The dataset is suitable for text analysis and research related to HVAC technology.
提供机构:
daikin-industries-ltd



