yasserrmd/TOON-Unstructured-Structured
收藏Hugging Face2025-11-11 更新2025-11-15 收录
下载链接:
https://hf-mirror.com/datasets/yasserrmd/TOON-Unstructured-Structured
下载链接
链接失效反馈官方服务:
资源简介:
TOON-Unstructured-Structured数据集是一个经过验证和清理的版本,基于原始的MasterControlAIML/JSON-Unstructured-Structured数据集。它使用TOON规范重新格式化,适用于大型语言模型的紧凑、标记高效的数据序列化格式。该数据集可用于训练或评估模式感知LLM,基准测试序列化效率,研究数据压缩与标记成本之间的权衡,以及实验基于提示的解析器和结构化数据合成。
TOON-Unstructured-Structured dataset is a validated and cleaned version of the original MasterControlAIML/JSON-Unstructured-Structured dataset. It is reformatted using the TOON specification, a compact and token-efficient serialization format optimized for LLMs. The dataset can be used for training or evaluating schema-aware LLMs, benchmarking serialization efficiency, studying the trade-offs between data compression and token cost, and experimenting with prompt-based parsers and structured data synthesis.
提供机构:
yasserrmd



