five

JaydenTeoh/manhattan

收藏
Hugging Face2026-03-02 更新2026-03-29 收录
下载链接:
https://hf-mirror.com/datasets/JaydenTeoh/manhattan
下载链接
链接失效反馈
官方服务:
资源简介:
--- pretty_name: NextLat Manhattan World Model Tokenized license: mit task_categories: - text-generation language: - en configs: - config_name: default data_files: - split: train path: train/*.parquet - split: heldout path: heldout/*.parquet tags: - world_models size_categories: - 1B<n<10B --- # NextLat Manhattan Tokenized Tokenized Manhattan random-walk dataset for world modeling evaluation of autoregressive models. Each example stores a single pretokenized sequence in `input_ids` (list of `int32`), where the sequence format is: `<start_node> <end_node> <direction_1> ... <direction_n> end` ## Dataset structure - `train/*.parquet`: training split - `heldout/*.parquet`: heldout/eval split - `manifest.json`: split sizes and export metadata - `tokenizer_meta.json`: tokenizer vocabulary/id metadata ## Columns - `input_ids` (`Sequence[int32]`): tokenized traversal sequence ## Extra artifacts included The following files are provided at repository root for decoding and graph constraints: - `node_and_direction_to_neighbor.pkl` - `shortest_paths.pkl` - `tokenizer.pkl` - `tokenizer.pt` - `valid_turns.pkl` - `all_pairs.pkl` ## Usage ```python from datasets import load_dataset ds = load_dataset("JaydenTeoh/manhattan") print(ds) print(ds["train"][0]["input_ids"][:20]) ``` ## Notes - Sequences are pretokenized. - `heldout` is a trajectory-level heldout split for validation. ## Citation If you use this dataset, please cite the NextLat project and the original dataset source. ```bibtex @misc{teoh2025nextlatentpredictiontransformers, title={Next-Latent Prediction Transformers Learn Compact World Models}, author={Jayden Teoh and Manan Tomar and Kwangjun Ahn and Edward S. Hu and Pratyusha Sharma and Riashat Islam and Alex Lamb and John Langford}, year={2025}, eprint={2511.05963}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2511.05963}, } @misc{vafa2024evaluatingworldmodelimplicit, title={Evaluating the World Model Implicit in a Generative Model}, author={Keyon Vafa and Justin Y. Chen and Ashesh Rambachan and Jon Kleinberg and Sendhil Mullainathan}, year={2024}, eprint={2406.03689}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2406.03689}, } ```
提供机构:
JaydenTeoh
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作