five

IRM_chat_all

收藏
魔搭社区2025-06-10 更新2025-03-29 收录
下载链接:
https://modelscope.cn/datasets/YKDuan/IRM_chat_all
下载链接
链接失效反馈
官方服务:
资源简介:
## 简介 **IRM_chat_little** 数据集由 **YKDuan** 在 ModelScope 平台上贡献。该数据集专为图书情报学领域的对话式交互而设计。 **数据来源:** - 大百科全书中关于图书情报学科的概念 - CSSCI(中文社会科学引文索引)中图书情报学科的题录信息 这些基础文本通过大型语言模型(LLMs)构建了总计 **276,083 条对话轮次**。该数据集非常适合用于学术对话领域的 LLM 训练和微调,特别是在图书情报学领域。 ## 数据集内容 **IRM_chat_little** 数据集包含 **276,083 条对话轮次**。每个条目都以 JSON 对象的形式呈现,代表用户和助手之间的对话。 ### 格式 ```json {"messages": [{"role": "user", "content": "用户查询"}, {"role": "assistant", "content": "助手回复"}]} ``` ### 示例 ```json {"messages": [{"role": "user", "content": "什么是信息检索?"}, {"role": "assistant", "content": "信息检索(IR)是根据信息需求从信息资源集合中获取相关信息系统资源的行为。搜索可以基于元数据或全文索引。"}]} ``` ## 数据集统计 - **对话轮次总数:** 276,083 个独特条目 ## 使用方法 该数据集可以通过 ModelScope 库进行访问和使用。 ### 使用 ModelScope SDK (Python) ```python from modelscope.msdatasets import MsDataset # 加载数据集 dataset = MsDataset.load('YKDuan/IRM_chat_little') # 迭代数据集 for item in dataset: print(item) break # 打印第一个条目以供演示 ``` ## 许可协议 该数据集根据 **Apache 2.0** 许可协议分发。请参考 ModelScope 平台以获取最新的许可信息。 ## 引用 如果您在研究或应用中使用此数据集,请引用以下论文: ```bibtex @inproceedings{Zhu2025LISGPT, title={{LISGPT: Boundary Knowledge Enhanced Academic Large Language Modeling and its Scenario Applications}}, author={Zhu, Y and Duan, Y* and Hu, J and Jin, J and Ye, J}, booktitle={Proceedings of ASIS\&T 2025}, year={2025} } ``` ## 联系方式 有关此数据集的任何问题或疑虑,请联系 **duanyongkang@mail.bnu.edu.cn** 或参考 ModelScope 社区。

## Introduction The **IRM_chat_little** dataset was contributed by **YKDuan** on the ModelScope platform. This dataset is specifically designed for conversational interactions in the field of library and information science (LIS). ## Data Sources - Concepts related to library and information science from general encyclopedias - Bibliographic records of library and information science publications indexed in CSSCI (Chinese Social Sciences Citation Index) These base texts were used to construct a total of **276,083 conversational turns** via large language models (LLMs). This dataset is highly suitable for LLM training and fine-tuning in the domain of academic conversations, particularly within library and information science. ## Dataset Content The **IRM_chat_little** dataset contains **276,083 conversational turns**. Each entry is presented as a JSON object representing a conversation between a user and an assistant. ### Format json {"messages": [{"role": "user", "content": "user query"}, {"role": "assistant", "content": "assistant reply"}]} ### Example json {"messages": [{"role": "user", "content": "什么是信息检索?"}, {"role": "assistant", "content": "信息检索(IR)是根据信息需求从信息资源集合中获取相关信息系统资源的行为。搜索可以基于元数据或全文索引。"}]} ## Dataset Statistics - **Total Conversational Turns**: 276,083 unique entries ## Usage Instructions This dataset can be accessed and utilized via the ModelScope library. ### Using ModelScope SDK (Python) python from modelscope.msdatasets import MsDataset # Load the dataset dataset = MsDataset.load('YKDuan/IRM_chat_little') # Iterate through the dataset for item in dataset: print(item) break # Print the first entry for demonstration ## License Agreement This dataset is distributed under the **Apache 2.0** license. Please refer to the ModelScope platform for the latest licensing information. ## Citation If you use this dataset in your research or applications, please cite the following paper: bibtex @inproceedings{Zhu2025LISGPT, title={{LISGPT: Boundary Knowledge Enhanced Academic Large Language Modeling and its Scenario Applications}}, author={Zhu, Y and Duan, Y* and Hu, J and Jin, J and Ye, J}, booktitle={Proceedings of ASIS&T 2025}, year={2025} } ## Contact Information For any questions or concerns regarding this dataset, please contact **duanyongkang@mail.bnu.edu.cn** or refer to the ModelScope community.
提供机构:
maas
创建时间:
2025-03-22
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作