IRM_chat_all
收藏魔搭社区2025-06-10 更新2025-03-29 收录
下载链接:
https://modelscope.cn/datasets/YKDuan/IRM_chat_all
下载链接
链接失效反馈官方服务:
资源简介:
## 简介
**IRM_chat_little** 数据集由 **YKDuan** 在 ModelScope 平台上贡献。该数据集专为图书情报学领域的对话式交互而设计。
**数据来源:**
- 大百科全书中关于图书情报学科的概念
- CSSCI(中文社会科学引文索引)中图书情报学科的题录信息
这些基础文本通过大型语言模型(LLMs)构建了总计 **276,083 条对话轮次**。该数据集非常适合用于学术对话领域的 LLM 训练和微调,特别是在图书情报学领域。
## 数据集内容
**IRM_chat_little** 数据集包含 **276,083 条对话轮次**。每个条目都以 JSON 对象的形式呈现,代表用户和助手之间的对话。
### 格式
```json
{"messages": [{"role": "user", "content": "用户查询"}, {"role": "assistant", "content": "助手回复"}]}
```
### 示例
```json
{"messages": [{"role": "user", "content": "什么是信息检索?"}, {"role": "assistant", "content": "信息检索(IR)是根据信息需求从信息资源集合中获取相关信息系统资源的行为。搜索可以基于元数据或全文索引。"}]}
```
## 数据集统计
- **对话轮次总数:** 276,083 个独特条目
## 使用方法
该数据集可以通过 ModelScope 库进行访问和使用。
### 使用 ModelScope SDK (Python)
```python
from modelscope.msdatasets import MsDataset
# 加载数据集
dataset = MsDataset.load('YKDuan/IRM_chat_little')
# 迭代数据集
for item in dataset:
print(item)
break # 打印第一个条目以供演示
```
## 许可协议
该数据集根据 **Apache 2.0** 许可协议分发。请参考 ModelScope 平台以获取最新的许可信息。
## 引用
如果您在研究或应用中使用此数据集,请引用以下论文:
```bibtex
@inproceedings{Zhu2025LISGPT,
title={{LISGPT: Boundary Knowledge Enhanced Academic Large Language Modeling and its Scenario Applications}},
author={Zhu, Y and Duan, Y* and Hu, J and Jin, J and Ye, J},
booktitle={Proceedings of ASIS\&T 2025},
year={2025}
}
```
## 联系方式
有关此数据集的任何问题或疑虑,请联系 **duanyongkang@mail.bnu.edu.cn** 或参考 ModelScope 社区。
## Introduction
The **IRM_chat_little** dataset was contributed by **YKDuan** on the ModelScope platform. This dataset is specifically designed for conversational interactions in the field of library and information science (LIS).
## Data Sources
- Concepts related to library and information science from general encyclopedias
- Bibliographic records of library and information science publications indexed in CSSCI (Chinese Social Sciences Citation Index)
These base texts were used to construct a total of **276,083 conversational turns** via large language models (LLMs). This dataset is highly suitable for LLM training and fine-tuning in the domain of academic conversations, particularly within library and information science.
## Dataset Content
The **IRM_chat_little** dataset contains **276,083 conversational turns**. Each entry is presented as a JSON object representing a conversation between a user and an assistant.
### Format
json
{"messages": [{"role": "user", "content": "user query"}, {"role": "assistant", "content": "assistant reply"}]}
### Example
json
{"messages": [{"role": "user", "content": "什么是信息检索?"}, {"role": "assistant", "content": "信息检索(IR)是根据信息需求从信息资源集合中获取相关信息系统资源的行为。搜索可以基于元数据或全文索引。"}]}
## Dataset Statistics
- **Total Conversational Turns**: 276,083 unique entries
## Usage Instructions
This dataset can be accessed and utilized via the ModelScope library.
### Using ModelScope SDK (Python)
python
from modelscope.msdatasets import MsDataset
# Load the dataset
dataset = MsDataset.load('YKDuan/IRM_chat_little')
# Iterate through the dataset
for item in dataset:
print(item)
break # Print the first entry for demonstration
## License Agreement
This dataset is distributed under the **Apache 2.0** license. Please refer to the ModelScope platform for the latest licensing information.
## Citation
If you use this dataset in your research or applications, please cite the following paper:
bibtex
@inproceedings{Zhu2025LISGPT,
title={{LISGPT: Boundary Knowledge Enhanced Academic Large Language Modeling and its Scenario Applications}},
author={Zhu, Y and Duan, Y* and Hu, J and Jin, J and Ye, J},
booktitle={Proceedings of ASIS&T 2025},
year={2025}
}
## Contact Information
For any questions or concerns regarding this dataset, please contact **duanyongkang@mail.bnu.edu.cn** or refer to the ModelScope community.
提供机构:
maas
创建时间:
2025-03-22



