five

michael0402/lakequest

收藏
Hugging Face2026-04-14 更新2026-04-26 收录
下载链接:
https://hf-mirror.com/datasets/michael0402/lakequest
下载链接
链接失效反馈
官方服务:
资源简介:
--- configs: - config_name: bank_corpus data_files: - split: train path: data/bank/train.jsonl default: true - config_name: drug_corpus data_files: - split: train path: data/drug/train.jsonl - config_name: aiml_questions data_files: - split: train path: data/questions/aiml/train.jsonl - config_name: bank_questions data_files: - split: train path: data/questions/bank/train.jsonl - config_name: drug_questions data_files: - split: train path: data/questions/drug/train.jsonl - config_name: raw_assets_index data_files: - split: train path: data/raw_index/train.jsonl - config_name: aiml data_files: - split: validation path: benchmark/v1/aiml/qa_records/validation.jsonl - split: test path: benchmark/v1/aiml/qa_records/test.jsonl - config_name: bank data_files: - split: validation path: benchmark/v1/bank/qa_records/validation.jsonl - split: test path: benchmark/v1/bank/qa_records/test.jsonl - config_name: drug data_files: - split: validation path: benchmark/v1/drug/qa_records/validation.jsonl - split: test path: benchmark/v1/drug/qa_records/test.jsonl --- # LakeQuest Unified Hugging Face dataset repository for LakeQuest source assets and benchmark release artifacts. ## Configurations Source/intermediate subsets: - `bank_corpus`: normalized bank corpus records - `drug_corpus`: normalized drug corpus records - `aiml_questions`: AI/ML question rows (normalized) - `bank_questions`: bank question rows - `drug_questions`: drug question rows - `raw_assets_index`: index of raw corpus bundles stored in this repo Final benchmark subsets: - `aiml`: benchmark v1 QA records (validation/test) - `bank`: benchmark v1 QA records (validation/test) - `drug`: benchmark v1 QA records (validation/test) ## Raw Asset Layout Raw source assets used by the release pipeline are stored as compressed bundles under: - `raw/bundles/raw_corpus_bank.tar.gz` - `raw/bundles/raw_corpus_drug.tar.gz` - `raw/bundles/manifest.json` `build_release.py` downloads these raw corpus bundle files and extracts them into a local cache. Question inputs are loaded from `data/questions/*/train.jsonl`. ## Benchmark Release Layout Final benchmark release files are stored under: Parquet files remain available under `benchmark/v1/`. - `benchmark/v1/aiml/` - `benchmark/v1/bank/` - `benchmark/v1/drug/` Each domain includes: - `qa_records/{validation,test}.parquet` - `provenance_records/{validation,test}.parquet` - `corpus_objects.parquet` - `split_entities.parquet` - `manifest.json` ## Load Examples ```python from datasets import load_dataset bank_corpus = load_dataset("michael0402/lakequest", "bank_corpus", split="train") bank_questions = load_dataset("michael0402/lakequest", "bank_questions", split="train") bank_benchmark_test = load_dataset("michael0402/lakequest", "bank", split="test") ```
提供机构:
michael0402
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作