five

koyeb/Apple-MLX-QA

收藏
Hugging Face2024-08-29 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/koyeb/Apple-MLX-QA
下载链接
链接失效反馈
官方服务:
资源简介:
--- language: - en license: mit size_categories: - 1K<n<10K task_categories: - question-answering pretty_name: Apple MLX Documentation Question/Answer dataset_info: features: - name: question dtype: string - name: answer dtype: string - name: chunk dtype: string splits: - name: train num_bytes: 2599338.9379084967 num_examples: 1162 - name: test num_bytes: 138691.06209150326 num_examples: 62 download_size: 1015973 dataset_size: 2738030.0 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* --- # Apple MLX Documentation QA Dataset ## Overview This dataset contains question-answer pairs generated from the official documentation of Apple's latest machine learning framework, MLX. The dataset was created to facilitate the development and evaluation of models designed for natural language understanding, specifically in the context of technical documentation. ## Dataset Description The dataset consists of three columns: - **question**: A question generated from a specific chunk of the MLX documentation. - **answer**: The corresponding answer to the question, also based on the same chunk of documentation. - **chunk**: The original snippet (or chunk) from the MLX documentation that was used to generate both the question and the answer. ## Data Generation Process The questions and answers in this dataset were generated using OpenAI's GPT-4o. The process involved two main steps: 1. **Question Generation**: GPT-4 was prompted to generate questions based on specific chunks of the MLX documentation. 2. **Answer Generation**: GPT-4 was then asked to answer the questions using the content of the same documentation chunk. ### System Prompts Below are the system prompts used for question and answer generation. - **Question Generation:** System Prompt: ```plaintext You are a helpful AI assistant. Your task is to help a user understand how to use functions and classes from Apple's Deep Learning framework, MLX. Carefully examine the function documentation snippet and generate 3 questions a medium to experienced MLX user could ask. Questions must be answerable from the information in the snippet. Do not assume anything about MLX's API that is not discussed in the snippet. If the snippet is too short or contains too little information, output an empty JSON array. ``` OpenAI Structured Outputs Schema: ```json { "name": "questions", "strict": true, "schema": { "type": "object", "properties": { "questions": { "type": "array", "items": { "type": "string" } } }, "required": ["questions"], "additionalProperties": false } } ``` - **Answer Generation:** System Prompt: ```plaintext You are a helpful AI assistant. Your task is to help a user understand how to use functions and classes from Apple's Deep Learning framework, MLX. Carefully examine the function documentation and generate an explanatory response based on the user's question which showcases usage and examples. Do not assume anything about MLX's API that is not discussed in the reference documentation snippet. ``` ## Usage This dataset is particularly useful for: - Training question-answering models on technical documentation. - Evaluating the ability to inject up-to-date information in the post-training phase of LLMs. Please feel free to explore and make use of the dataset in your projects.
提供机构:
koyeb
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作