five

argilla-warehouse/apigen-synth-trl

收藏
Hugging Face2024-10-10 更新2025-04-12 收录
下载链接:
https://hf-mirror.com/datasets/argilla-warehouse/apigen-synth-trl
下载链接
链接失效反馈
官方服务:
资源简介:
--- dataset_info: features: - name: messages list: - name: content dtype: string - name: role dtype: string splits: - name: train num_bytes: 106597543.16934536 num_examples: 46931 - name: test num_bytes: 5612548.8306546295 num_examples: 2471 download_size: 24201583 dataset_size: 112210092 configs: - config_name: default data_files: - split: train path: data/train-* - split: test path: data/test-* license: apache-2.0 task_categories: - text-generation language: - en tags: - synthetic - distilabel - SFT - function-calling size_categories: - 10K<n<100K --- ## Dataset card This dataset is a version of [argilla/Synth-APIGen-v0.1](https://huggingface.co/datasets/argilla/Synth-APIGen-v0.1) prepared for fine-tuning using [trl](https://github.com/huggingface/trl). To generate it, the following script was run: ````python from datasets import load_dataset from jinja2 import Template SYSTEM_PROMPT = """ You are an expert in composing functions. You are given a question and a set of possible functions. Based on the question, you will need to make one or more function/tool calls to achieve the purpose. If none of the functions can be used, point it out and refuse to answer. If the given question lacks the parameters required by the function, also point it out. The output MUST strictly adhere to the following format, and NO other text MUST be included. The example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make the tool calls an empty list '[]'. ``` <tool_call>[ {"name": "func_name1", "arguments": {"argument1": "value1", "argument2": "value2"}}, ... (more tool calls as required) ]</tool_call> ``` """.strip() prompt = Template(""" You have access to the following tools: <tools>{{ tools }}</tools> Please answer the following query: {{ query }} """.lstrip()) response_template = Template("<tool_call>{{ answers }}</tool_call>") def prepare_for_training(row): user_prompt = prompt.render(tools=row["tools"], query=row["query"]) if row["answers"].startswith("["): response = response_template.render(answers=row["answers"]) else: response = row["answers"] messages = [ {"role": "system", "content": SYSTEM_PROMPT}, {"role": "user", "content": user_prompt}, {"role": "assistant", "content": response}, ] return {"messages": messages} ds_apigen_synth = load_dataset("argilla-warehouse/Synth-APIGen-v0.1", split="train") ds_apigen_training = ds_apigen_synth.map( prepare_for_training, remove_columns=['func_name', 'func_desc', 'tools', 'query', 'answers', 'model_name', 'hash_id'] ) ds_apigen_training.train_test_split(0.05, seed=42).push_to_hub("argilla-warehouse/apigen-synth-trl") ```` ## Example Example with tool call: ```json { "messages": [ { "content": "You are an expert in composing functions. You are given a question and a set of possible functions. \nBased on the question, you will need to make one or more function/tool calls to achieve the purpose. \nIf none of the functions can be used, point it out and refuse to answer. \nIf the given question lacks the parameters required by the function, also point it out.\n\nThe output MUST strictly adhere to the following format, and NO other text MUST be included.\nThe example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make the tool calls an empty list '[]'.\n```\n<tool_call>[\n{\"name\": \"func_name1\", \"arguments\": {\"argument1\": \"value1\", \"argument2\": \"value2\"}},\n... (more tool calls as required)\n]</tool_call>\n```", "role": "system" }, { "content": "You have access to the following tools:\n<tools>[{\"type\":\"function\",\"function\":{\"name\":\"compute_chord_length\",\"description\":\"Computes the length of the chord of a circle given its radius and central angle in radians.\",\"parameters\":{\"type\":\"object\",\"properties\":{\"radius\":{\"type\":\"number\",\"description\":\"The radius of the circle.\"},\"central_angle\":{\"type\":\"number\",\"description\":\"The central angle in radians.\"}},\"required\":[\"radius\",\"central_angle\"]}}}]</tools>\n\nPlease answer the following query:\nWhat is the length of the chord in a circle with a radius of 10 units and a central angle of 60 degrees?", "role": "user" }, { "content": "<tool_call>[{\"name\": \"compute_chord_length\", \"arguments\": {\"radius\": 10, \"central_angle\": 1.0472}}]</tool_call>", "role": "assistant" } ] } ``` Example without tool call: ```json { "messages": [ { "content": "You are an expert in composing functions. You are given a question and a set of possible functions. \nBased on the question, you will need to make one or more function/tool calls to achieve the purpose. \nIf none of the functions can be used, point it out and refuse to answer. \nIf the given question lacks the parameters required by the function, also point it out.\n\nThe output MUST strictly adhere to the following format, and NO other text MUST be included.\nThe example format is as follows. Please make sure the parameter type is correct. If no function call is needed, please make the tool calls an empty list '[]'.\n```\n<tool_call>[\n{\"name\": \"func_name1\", \"arguments\": {\"argument1\": \"value1\", \"argument2\": \"value2\"}},\n... (more tool calls as required)\n]</tool_call>\n```", "role": "system" }, { "content": "You have access to the following tools:\n[]\n\nPlease answer the following query:\nWhat is the greatest common divisor (GCD) of 48 and 180 using Euclid's algorithm?", "role": "user" }, { "content": "The query cannot be answered, no tools were provided.", "role": "assistant" } ] } ```
提供机构:
argilla-warehouse
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作