five

gieljnssns/Home-Assistant-Requests-V2

收藏
Hugging Face2026-02-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/gieljnssns/Home-Assistant-Requests-V2
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - question-answering - text-generation tags: - automation - home - assistant language: ["en", "es", "fr", "de", "pl"] pretty_name: Home Assistant Requests V2 size_categories: - 10K<n<100k --- # Home Assistant Requests V2 Dataset This dataset contains a list of requests and responses for a user interacting with a personal assistant that controls an instance of [Home Assistant](https://www.home-assistant.io/). The updated V2 of the dataset is now multilingual, containing data in English, German, French, Spanish, and Polish. The dataset also contains multiple "personalities" for the assistant to respond in, such as a formal assistant, a sarcastic assistant, and a friendly assistant. Lastly, the dataset has been updated to fully support modern tool-calling formats. > NOTE: If you are viewing this dataset on HuggingFace, you can download the "small" dataset variant directly from the "Files and versions" tab. ## Assembling the dataset The dataset is generated from the different CSV "piles". The "piles" contain different chunks of requests that are assembled into a final context that is presented to the LLM. For example, `piles/<language>/pile_of_device_names.csv` contains only names of various devices to be used as part of context as well as inserted into `piles/<language>/pile_of_templated_actions.csv` and `piles/<language>/pile_of_status_requests.csv`. The logic for assembling the final dataset from the piles is contained in [generate_data.py](./generate_data.py). ### Prepare environment Start by installing system dependencies: `sudo apt-get install python3-dev` Then create a Python virtual environment and install all necessary library: ``` python3 -m venv .generate_data source .generate_data/bin/activate pip3 install -r requirements.txt ``` ### Generating the dataset from piles `python3 generate_data.py --train --test --small --language english german french spanish polish` Supported dataset splits are `--test`, `--train`, & `--sample` Arguments to set the train dataset size are `--small`, `--medium`, `--large`, & `--xl`. Languages can be enabled using `--language english german french spanish polish`

许可证:MIT 任务类别: - 问答 - 文本生成 标签: - 自动化 - 家居 - 助手 语言:["英语", "西班牙语", "法语", "德语", "波兰语"] 展示名称:Home Assistant Requests V2 规模类别:样本量介于1万至10万之间 # Home Assistant Requests V2 数据集 本数据集收录了用户与控制[Home Assistant](https://www.home-assistant.io/)实例的个人智能助手交互时产生的各类请求与对应回复。 本次更新的V2版本数据集现已支持多语言,涵盖英语、德语、法语、西班牙语与波兰语数据。此外,数据集还提供了多种助手回复人设,例如正式严谨型、诙谐讽刺型与友好亲和型。最后,本数据集已更新至完全支持现代工具调用格式。 > 注意:若您在HuggingFace平台查看此数据集,可直接从“文件与版本”标签页下载“small”数据集变体。 ## 数据集组装 本数据集由多个CSV“数据堆(piles)”生成。这些“数据堆”包含不同的请求片段,最终被组装为供大语言模型(LLM)读取的上下文。例如,`piles/<language>/pile_of_device_names.csv`仅收录各类设备名称,用作上下文的一部分,同时也会被插入至`piles/<language>/pile_of_templated_actions.csv`与`piles/<language>/pile_of_status_requests.csv`中。从数据堆组装最终数据集的逻辑代码位于[generate_data.py](./generate_data.py)。 ### 环境准备 首先安装系统依赖: sudo apt-get install python3-dev 随后创建Python虚拟环境并安装所有必需的依赖库: python3 -m venv .generate_data source .generate_data/bin/activate pip3 install -r requirements.txt ### 从数据堆生成数据集 python3 generate_data.py --train --test --small --language english german french spanish polish 支持的数据集拆分参数包括`--test`、`--train`与`--sample`。用于设置训练集规模的参数包括`--small`、`--medium`、`--large`与`--xl`。可通过`--language english german french spanish polish`启用对应语言的数据集。
提供机构:
gieljnssns
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作