five

MongoDB/devcenter-articles

收藏
Hugging Face2024-10-17 更新2024-07-22 收录
下载链接:
https://hf-mirror.com/datasets/MongoDB/devcenter-articles
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: cc-by-3.0 task_categories: - question-answering language: - en tags: - vector search - retrieval augmented generation size_categories: - <1K --- ## Overview This dataset consists of ~600 articles from the MongoDB Developer Center. ## Dataset Structure The dataset consists of the following fields: - sourceName: The source of the article. This value is `devcenter` for the entire dataset. - url: Link to the article - action: Action taken on the article. This value is `created` for the entire dataset. - body: Content of the article in Markdown format - format: Format of the content. This value is `md` for all articles. - metadata: Metadata such as tags, content type etc. associated with the articles - title: Title of the article - updated: The last updated date of the article ## Usage This dataset can be useful for prototyping RAG applications. This is a real sample of data we have used to build the MongoDB Documentation Chatbot. ## Ingest Data To experiment with this dataset using MongoDB Atlas, first [create a MongoDB Atlas account](https://www.mongodb.com/cloud/atlas/register?utm_campaign=devrel&utm_source=community&utm_medium=organic_social&utm_content=Hugging%20Face%20Dataset&utm_term=apoorva.joshi). You can then use the following script to load this dataset into your MongoDB Atlas cluster: ``` import os from pymongo import MongoClient import datasets from datasets import load_dataset from bson import json_util uri = os.environ.get('MONGODB_ATLAS_URI') client = MongoClient(uri) db_name = 'your_database_name' # Change this to your actual database name collection_name = 'devcenter_articles' collection = client[db_name][collection_name] dataset = load_dataset("MongoDB/devcenter-articles") insert_data = [] for item in dataset['train']: doc = json_util.loads(json_util.dumps(item)) insert_data.append(doc) if len(insert_data) == 1000: collection.insert_many(insert_data) print("1000 records ingested") insert_data = [] if len(insert_data) > 0: collection.insert_many(insert_data) insert_data = [] print("Data ingested successfully!") ```
提供机构:
MongoDB
原始信息汇总

数据集许可证

  • 许可证类型: CC BY 3.0
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作