jiebi/RFCAlign
收藏Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/jiebi/RFCAlign
下载链接
链接失效反馈官方服务:
资源简介:
---
license: mit
task_categories:
- text-classification
- text-generation
language:
- en
tags:
- code
---
Synthetic Data Generated from IETF mailing lists.
The generated synthetic data was used to train retrieval models.
The synthetic data was generated using https://github.com/cheop-byeon/synthetic-data-kit, a tool-kit derived from https://github.com/meta-llama/synthetic-data-kit.
The dataset could be downloaded using the provided script download_RFCAlign.py. The topic is the corresponding WG (working group) name.
## RFCAlign Dataset
### Basic Download
```bash
# Download entire RFCAlign dataset
python download_RFCAlign.py
```
### Download Whole Repository with `huggingface-cli`
```bash
# Install CLI (if needed)
pip install -U "huggingface_hub[cli]"
# Download full RFCAlign repository to local folder
huggingface-cli download jiebi/RFCAlign --repo-type dataset --local-dir ./dataset/RFCAlign
```
### Parameter Options
```bash
python download_RFCAlign.py [--split <value>] [--topic <value>] [--no-download]
```
#### `--split`
Top-level folder to download.
Allowed values:
- `llama_non-verbose`
- `llama_verbose`
- `qwen_non-verbose`
- `qwen_verbose`
#### `--topic`
Optional topic file name **without** `.jsonl`.
Rules:
- Must be used together with `--split`
- Downloads only one file: `<split>/<topic>.jsonl`
Examples:
- `--topic ace`
- `--topic quic`
- `--topic tls`
#### `--no-download`
Inspect remote repository structure and local folder tree only.
No files are downloaded.
### Usage Examples
```bash
# 1) Inspect only (no download)
python download_RFCAlign.py --no-download
# 2) Download full RFCAlign dataset
python download_RFCAlign.py
# 3) Download one split folder only
python download_RFCAlign.py --split qwen_verbose
# 4) Download one specific file only
python download_RFCAlign.py --split qwen_verbose --topic ace
# 5) Another one-file download example
python download_RFCAlign.py --split llama_non-verbose --topic tls
```
### Notes
- Download target directory: `./dataset/RFCAlign/`
- Full download: all available files under all split folders
- Split download: only files under selected split
- Split + topic download: only one `.jsonl` file
The data was used for retrieval tasks, for training (https://github.com/cheop-byeon/FlagEmbedding) and for evaluation (https://github.com/cheop-byeon/mteb-R2Gen).
提供机构:
jiebi



