MetaSyn
收藏arXiv2025-09-30 收录
下载链接:
https://github.com/facebookresearch/dlrm_datasets
下载链接
链接失效反馈官方服务:
资源简介:
该数据集名为MetaSyn,它是一个开源的大型合成数据集,其索引访问模式与Meta生产嵌入表类似。该数据集包含数百个嵌入表,这些表的哈希大小和池化因子都非常大且多样。为了确保在拥有10GB内存的GPU上可复现结果,表的大小维度选定为16或32。这是一个大规模的数据集,其任务是对嵌入表进行分片。
This dataset, named MetaSyn, is an open-source large-scale synthetic dataset whose index access patterns are analogous to those of Meta's production embedding tables. It contains hundreds of embedding tables featuring extremely large and diverse hash sizes and pooling factors. To ensure reproducible experimental results on GPUs with 10GB of memory, the dimension sizes of these tables are set to either 16 or 32. This large-scale dataset centers on the task of embedding table sharding.
提供机构:
Meta



