five

jiebi/RFCAlign

收藏
Hugging Face2026-04-13 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/jiebi/RFCAlign
下载链接
链接失效反馈
官方服务:
资源简介:
--- license: mit task_categories: - text-classification - text-generation language: - en tags: - code --- Synthetic Data Generated from IETF mailing lists. The generated synthetic data was used to train retrieval models. The synthetic data was generated using https://github.com/cheop-byeon/synthetic-data-kit, a tool-kit derived from https://github.com/meta-llama/synthetic-data-kit. The dataset could be downloaded using the provided script download_RFCAlign.py. The topic is the corresponding WG (working group) name. ## RFCAlign Dataset ### Basic Download ```bash # Download entire RFCAlign dataset python download_RFCAlign.py ``` ### Download Whole Repository with `huggingface-cli` ```bash # Install CLI (if needed) pip install -U "huggingface_hub[cli]" # Download full RFCAlign repository to local folder huggingface-cli download jiebi/RFCAlign --repo-type dataset --local-dir ./dataset/RFCAlign ``` ### Parameter Options ```bash python download_RFCAlign.py [--split <value>] [--topic <value>] [--no-download] ``` #### `--split` Top-level folder to download. Allowed values: - `llama_non-verbose` - `llama_verbose` - `qwen_non-verbose` - `qwen_verbose` #### `--topic` Optional topic file name **without** `.jsonl`. Rules: - Must be used together with `--split` - Downloads only one file: `<split>/<topic>.jsonl` Examples: - `--topic ace` - `--topic quic` - `--topic tls` #### `--no-download` Inspect remote repository structure and local folder tree only. No files are downloaded. ### Usage Examples ```bash # 1) Inspect only (no download) python download_RFCAlign.py --no-download # 2) Download full RFCAlign dataset python download_RFCAlign.py # 3) Download one split folder only python download_RFCAlign.py --split qwen_verbose # 4) Download one specific file only python download_RFCAlign.py --split qwen_verbose --topic ace # 5) Another one-file download example python download_RFCAlign.py --split llama_non-verbose --topic tls ``` ### Notes - Download target directory: `./dataset/RFCAlign/` - Full download: all available files under all split folders - Split download: only files under selected split - Split + topic download: only one `.jsonl` file The data was used for retrieval tasks, for training (https://github.com/cheop-byeon/FlagEmbedding) and for evaluation (https://github.com/cheop-byeon/mteb-R2Gen).
提供机构:
jiebi
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作