gieljnssns/Home-Assistant-Requests-V2
收藏Hugging Face2026-02-24 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/gieljnssns/Home-Assistant-Requests-V2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
- text-generation
tags:
- automation
- home
- assistant
language: ["en", "es", "fr", "de", "pl"]
pretty_name: Home Assistant Requests V2
size_categories:
- 10K<n<100k
---
# Home Assistant Requests V2 Dataset
This dataset contains a list of requests and responses for a user interacting with a personal assistant that controls an instance of [Home Assistant](https://www.home-assistant.io/).
The updated V2 of the dataset is now multilingual, containing data in English, German, French, Spanish, and Polish. The dataset also contains multiple "personalities" for the assistant to respond in, such as a formal assistant, a sarcastic assistant, and a friendly assistant. Lastly, the dataset has been updated to fully support modern tool-calling formats.
> NOTE: If you are viewing this dataset on HuggingFace, you can download the "small" dataset variant directly from the "Files and versions" tab.
## Assembling the dataset
The dataset is generated from the different CSV "piles". The "piles" contain different chunks of requests that are assembled into a final context that is presented to the LLM. For example, `piles/<language>/pile_of_device_names.csv` contains only names of various devices to be used as part of context as well as inserted into `piles/<language>/pile_of_templated_actions.csv` and `piles/<language>/pile_of_status_requests.csv`. The logic for assembling the final dataset from the piles is contained in [generate_data.py](./generate_data.py).
### Prepare environment
Start by installing system dependencies:
`sudo apt-get install python3-dev`
Then create a Python virtual environment and install all necessary library:
```
python3 -m venv .generate_data
source .generate_data/bin/activate
pip3 install -r requirements.txt
```
### Generating the dataset from piles
`python3 generate_data.py --train --test --small --language english german french spanish polish`
Supported dataset splits are `--test`, `--train`, & `--sample`
Arguments to set the train dataset size are `--small`, `--medium`, `--large`, & `--xl`.
Languages can be enabled using `--language english german french spanish polish`
许可证:MIT
任务类别:
- 问答
- 文本生成
标签:
- 自动化
- 家居
- 助手
语言:["英语", "西班牙语", "法语", "德语", "波兰语"]
展示名称:Home Assistant Requests V2
规模类别:样本量介于1万至10万之间
# Home Assistant Requests V2 数据集
本数据集收录了用户与控制[Home Assistant](https://www.home-assistant.io/)实例的个人智能助手交互时产生的各类请求与对应回复。
本次更新的V2版本数据集现已支持多语言,涵盖英语、德语、法语、西班牙语与波兰语数据。此外,数据集还提供了多种助手回复人设,例如正式严谨型、诙谐讽刺型与友好亲和型。最后,本数据集已更新至完全支持现代工具调用格式。
> 注意:若您在HuggingFace平台查看此数据集,可直接从“文件与版本”标签页下载“small”数据集变体。
## 数据集组装
本数据集由多个CSV“数据堆(piles)”生成。这些“数据堆”包含不同的请求片段,最终被组装为供大语言模型(LLM)读取的上下文。例如,`piles/<language>/pile_of_device_names.csv`仅收录各类设备名称,用作上下文的一部分,同时也会被插入至`piles/<language>/pile_of_templated_actions.csv`与`piles/<language>/pile_of_status_requests.csv`中。从数据堆组装最终数据集的逻辑代码位于[generate_data.py](./generate_data.py)。
### 环境准备
首先安装系统依赖:
sudo apt-get install python3-dev
随后创建Python虚拟环境并安装所有必需的依赖库:
python3 -m venv .generate_data
source .generate_data/bin/activate
pip3 install -r requirements.txt
### 从数据堆生成数据集
python3 generate_data.py --train --test --small --language english german french spanish polish
支持的数据集拆分参数包括`--test`、`--train`与`--sample`。用于设置训练集规模的参数包括`--small`、`--medium`、`--large`与`--xl`。可通过`--language english german french spanish polish`启用对应语言的数据集。
提供机构:
gieljnssns



