用于基于自然语言处理的语义识别技术的书籍信息数据集

Name: 用于基于自然语言处理的语义识别技术的书籍信息数据集
Creator: 掌阅科技股份有限公司
License: 暂无描述

国家基础学科公共科学数据中心2024-03-05 收录

下载链接：

https://www.nbsdc.cn/general/dataDetail?id=64ef2e86bb16e07b0603add3&type=1

下载链接

链接失效反馈

官方服务：

资源简介：

针对基于自然语言处理的语义识别技术，在该技术的开发过程中，使用了包含不同种类电子书籍的书籍信息数据作为训练数据，用于进行模型训练，以提升模型从电子书籍的书籍信息中准确提取出主题词的能力，采集方式为从书库数据库中筛选出不同分类的书籍，分别提取一定数量的电子书籍的书籍信息制成Excel表格形成数据集，包括书籍基本信息、书籍评分及评分人数信息与书籍评论信息。数据类型为文本，数据格式为xlsx，可用Microsoft Excel、WPS等通用办公软件打开，数据量为1.15MB。

This dataset is designed for the natural language processing-based semantic recognition technology. During the development of this technology, book information data covering various types of e-books was used as training data to enhance the model's ability to accurately extract topic terms from e-book information. The dataset was collected by screening books of different categories from the library database, extracting a certain number of e-book information entries from each category, and compiling them into Excel tables. It includes basic book information, book ratings and rating counts, as well as book review information. The data type is text, stored in XLSX format, and can be opened with general office software such as Microsoft Excel and WPS Office. The total data size is 1.15 MB.

提供机构：

掌阅科技股份有限公司

搜集汇总

数据集介绍

背景与挑战

背景概述

该数据集是一个专门用于自然语言处理（NLP）语义识别技术开发的书籍信息集合，包含电子书籍的基本信息、评分及评论数据，以Excel格式存储，数据量为1.15MB。它旨在通过多样化的书籍分类信息提升模型从文本中提取主题词的能力，属于'移动数字阅读服务技术研发与应用'国家重点研发计划项目的成果，适用于NLP模型训练和研究应用。

以上内容由遇见数据集搜集并总结生成