Hy9n0t1c/Home-Assistant-Requests-V2
收藏Hugging Face2026-03-21 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/Hy9n0t1c/Home-Assistant-Requests-V2
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- question-answering
- text-generation
tags:
- automation
- home
- assistant
language: ["en", "es", "fr", "de", "pl"]
pretty_name: Home Assistant Requests V2
size_categories:
- 10K<n<100k
---
# Home Assistant Requests V2 Dataset
This dataset contains a list of requests and responses for a user interacting with a personal assistant that controls an instance of [Home Assistant](https://www.home-assistant.io/).
The updated V2 of the dataset is now multilingual, containing data in English, German, French, Spanish, and Polish. The dataset also contains multiple "personalities" for the assistant to respond in, such as a formal assistant, a sarcastic assistant, and a friendly assistant. Lastly, the dataset has been updated to fully support modern tool-calling formats.
> NOTE: If you are viewing this dataset on HuggingFace, you can download the "small" dataset variant directly from the "Files and versions" tab.
## Assembling the dataset
The dataset is generated from the different CSV "piles". The "piles" contain different chunks of requests that are assembled into a final context that is presented to the LLM. For example, `piles/<language>/pile_of_device_names.csv` contains only names of various devices to be used as part of context as well as inserted into `piles/<language>/pile_of_templated_actions.csv` and `piles/<language>/pile_of_status_requests.csv`. The logic for assembling the final dataset from the piles is contained in [generate_data.py](./generate_data.py).
### Prepare environment
Start by installing system dependencies:
`sudo apt-get install python3-dev`
Then create a Python virtual environment and install all necessary library:
```
python3 -m venv .generate_data
source .generate_data/bin/activate
pip3 install -r requirements.txt
```
### Generating the dataset from piles
`python3 generate_data.py --train --test --small --language english german french spanish polish`
Supported dataset splits are `--test`, `--train`, & `--sample`
Arguments to set the train dataset size are `--small`, `--medium`, `--large`, & `--xl`.
Languages can be enabled using `--language english german french spanish polish`
---
许可证:MIT许可证
任务类别:
- 问答(Question Answering)
- 文本生成(Text Generation)
标签:
- 自动化
- 家居
- 助手
语言:["英语", "西班牙语", "法语", "德语", "波兰语"]
友好名称:家庭助手请求V2(Home Assistant Requests V2)
规模类别:10K < n < 100K
---
# 家庭助手请求V2数据集(Home Assistant Requests V2 Dataset)
本数据集包含用户与控制着[家庭助手(Home Assistant)]实例的个人助手交互时产生的各类请求与响应内容。
本数据集的更新版V2现已支持多语言,涵盖英语、德语、法语、西班牙语与波兰语数据。此外,数据集内置了多种助手响应风格,例如正式风格、讽刺风格与友好风格的助手。最后,本数据集已完成更新,可全面支持现代化的工具调用格式。
> 注意:若您在HuggingFace平台查看本数据集,可直接从「文件与版本」标签页下载「小样本」数据集变体。
## 数据集组装流程
本数据集由不同的CSV「语料堆(piles)」生成。这些「语料堆」包含各类请求片段,最终会被整合为用于提交给大语言模型(Large Language Model, LLM)的上下文。例如,`piles/<语言>/pile_of_device_names.csv`仅包含各类设备名称,这些名称既会作为上下文的一部分,也会被插入至`piles/<语言>/pile_of_templated_actions.csv`与`piles/<语言>/pile_of_status_requests.csv`中。从语料堆组装最终数据集的逻辑代码收录于[generate_data.py](./generate_data.py)。
### 环境准备
首先安装系统依赖项:
`sudo apt-get install python3-dev`
随后创建Python虚拟环境并安装所有必需的库:
python3 -m venv .generate_data
source .generate_data/bin/activate
pip3 install -r requirements.txt
### 从语料堆生成数据集
执行命令`python3 generate_data.py --train --test --small --language english german french spanish polish`以生成数据集。
支持的数据集拆分参数包括`--test`、`--train`与`--sample`。
用于设置训练集规模的参数包括`--small`、`--medium`、`--large`与`--xl`。
可通过`--language english german french spanish polish`参数启用所需语言。
提供机构:
Hy9n0t1c



