MILP-Evolve
收藏魔搭社区2025-12-05 更新2025-07-26 收录
下载链接:
https://modelscope.cn/datasets/microsoft/MILP-Evolve
下载链接
链接失效反馈官方服务:
资源简介:
# Dataset for Towards Foundation Models for Mixed Integer Linear Programming
MILP-Evolve is a large-scale dataset of Mixed Integer Linear Programming (MILP) problem classes and instances. It is generated using an LLM-based evolutionary framework capable of producing a diverse set of MILP classes with unlimited instances. The dataset is designed to facilitate research in developing foundation models for MILP that generalize across problem classes. It supports multiple learning tasks, including integrality gap prediction, learning to branch, and aligning MILP instances with natural language descriptions. Our source code can be found on [Github](https://github.com/microsoft/OptiGuide/tree/main/milp-evolve).
It was proposed in the paper [Towards Foundation Models for Mixed Integer Linear Programming](https://huggingface.co/papers/2410.08288).
## MILP-Evolve MILP Classes
<b>MILP Classes Code Representation</B>
The seed and generated MILP Classes can be found at `./milp_code/[evolve/seed]_tab1` and `./milp_code/[evolve/seed]_tab2` correspond to Table 1 and 2 of our [paper](#citation).
<b>MILP Instances for each class</b>
One can set different seeds to generate multiple instances for each class by changing the `seed = 42` line in each code to `seed = <SEED>`. We adopt [mps format](https://en.wikipedia.org/wiki/MPS_(format)) and provide up to 1000 instances per class in the `./instances` folder.
<i>Due to the large size of the MILP-Evolve dataset, we zipped the mps format data into `./instances/tab1_compressed` and `./instances/tab2_compressed`. You can follow the instruction below to extract the data, where you should replace `[1/2]` with `1` or `2`.</i>
```
mkdir -p tab[1/2] # Ensure tab2 exists
find tab[1/2]_compressed -type f -name 'milp_*.tar.gz' | while read file; do
tar -xzf "$file" -C tab2
done
rm -r tab[1/2]_compressed
```
## Learning Dataset
<b>Integrality Gap Prediction</b>
One can generate the integrality gap prediction dataset from the MILP instances. We provide the example integrality gap dataset in `gap_data_example.tar.gz`.
<b>Language-MILP Contrastive Learning</b>
Example language-MILP Contrastive Learning dataset can be found at `language_data_example.tar.gz`.
<i>Due to the large size of the learning to branch dataset, we do not upload the dataset in this huggingface repo. Instead, we provide the source code to generate the data in our Github, which is based on the [Ecole](https://github.com/ds4dm/ecole/blob/master/examples/branching-imitation/example.ipynb) implementation.</i>
You can follow our [Github](https://github.com/microsoft/OptiGuide/tree/main/milp-evolve) to generate more data for different learning tasks, as well as to split the above dataset into train/val/test for each learning task.
## Citation
You can find our paper at [ArXiv](https://arxiv.org/abs/2410.08288), [Openreview](https://openreview.net/forum?id=6yENDA7J4G). Please cite our paper when using the dataset:
```latex
@article{li2024towards,
author = {Li, Sirui and Kulkarni, Janardhan and Wu, Cathy and Menache, Ishai and Li, Beibin},
title = {Towards Foundation Models for Mixed Integer Linear Programming},
booktitle = {The Thirteenth International Conference on Learning Representations},
year = {2025}
}
```
## Additional Information
### Dataset Curators
The dataset was curated by the research team behind the MILP-Evolve framework. Specific names and affiliations will be provided upon publication.
### Licensing Information
This dataset is licensed under the [CDLA-2.0](https://cdla.dev/permissive-2-0/).
### Contributions
Thanks to the entire MILP-Evolve team for their efforts in creating and releasing this dataset.
This project welcomes contributions and suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us
the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide
a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions
provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/).
For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or
contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.
## Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft
trademarks or logos is subject to and must follow
[Microsoft's Trademark & Brand Guidelines](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general).
Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship.
Any use of third-party trademarks or logos are subject to those third-party's policies.
# 面向混合整数线性规划的基础模型数据集
MILP-Evolve是一个大规模的混合整数线性规划(Mixed Integer Linear Programming, MILP)问题类别与实例数据集。它基于大语言模型(Large Language Model, LLM)的进化框架生成,能够生成多样化且数量不限的MILP问题类别与对应实例。本数据集旨在推动面向MILP的基础模型研究,助力模型在不同问题类别间实现泛化。其支持多种学习任务,包括整数间隙预测、学习分支策略,以及将MILP实例与自然语言描述对齐。本项目的源代码可参见[GitHub](https://github.com/microsoft/OptiGuide/tree/main/milp-evolve)。
该数据集相关工作发表于论文《面向混合整数线性规划的基础模型》,链接为[https://huggingface.co/papers/2410.08288](https://huggingface.co/papers/2410.08288)。
## MILP-Evolve MILP 问题类别
### <b>MILP 类别代码表示</b>
种子生成与自动生成的MILP类别分别存放在`./milp_code/[evolve/seed]_tab1`与`./milp_code/[evolve/seed]_tab2`路径下,对应论文[参考文献](#citation)中的表1与表2。
### <b>各问题类别的MILP实例</b>
用户可通过修改各代码文件中的`seed = 42`行为`seed = <SEED>`,设置不同随机种子以生成同一类别的多个实例。本数据集采用[MPS格式(MPS format)](https://en.wikipedia.org/wiki/MPS_(format))存储数据,在`./instances`文件夹中为每个类别提供最多1000个实例。
<i>由于MILP-Evolve数据集体量较大,我们已将MPS格式的数据压缩至`./instances/tab1_compressed`与`./instances/tab2_compressed`。请按照以下指令解压数据,需将`[1/2]`替换为`1`或`2`:</i>
bash
mkdir -p tab[1/2] # 确保tab2目录存在
find tab[1/2]_compressed -type f -name 'milp_*.tar.gz' | while read file; do
tar -xzf "$file" -C tab2
done
rm -r tab[1/2]_compressed
## 学习数据集
### <b>整数间隙预测任务数据集</b>
用户可基于MILP实例生成整数间隙预测数据集。我们在`gap_data_example.tar.gz`中提供了该任务的示例数据集。
### <b>语言-MILP对比学习任务数据集</b>
语言-MILP对比学习任务的示例数据集可在`language_data_example.tar.gz`中获取。
<i>由于学习分支策略任务的数据集体量较大,我们未将其上传至本Hugging Face仓库。我们在GitHub仓库中提供了该数据集的生成代码,其实现基于[Ecole](https://github.com/ds4dm/ecole/blob/master/examples/branching-imitation/example.ipynb)框架。</i>
你可参考我们的[GitHub仓库](https://github.com/microsoft/OptiGuide/tree/main/milp-evolve)生成适用于不同学习任务的更多数据,同时也可按照需求将上述数据集划分为训练集、验证集与测试集。
## 参考文献
你可通过[ArXiv](https://arxiv.org/abs/2410.08288)、[Openreview](https://openreview.net/forum?id=6yENDA7J4G)获取我们的论文。若使用本数据集,请引用以下文献:
latex
@article{li2024towards,
author = {Li, Sirui and Kulkarni, Janardhan and Wu, Cathy and Menache, Ishai and Li, Beibin},
title = {面向混合整数线性规划的基础模型},
booktitle = {第十三届国际学习表征会议},
year = {2025}
}
## 附加信息
### 数据集维护者
本数据集由MILP-Evolve框架背后的研究团队整理。具体作者姓名与所属机构将在论文正式发表后公布。
### 许可信息
本数据集采用[CDLA-2.0](https://cdla.dev/permissive-2-0/)协议进行许可。
### 贡献指南
感谢MILP-Evolve项目全体团队成员为创建并发布本数据集所付出的努力。
本项目欢迎贡献与建议。绝大多数贡献需要您签署贡献者许可协议(Contributor License Agreement, CLA),以确认您有权且实际已授予我们使用您贡献内容的权利。详情请访问https://cla.opensource.microsoft.com。
当您提交拉取请求时,CLA机器人将自动判断您是否需要签署CLA,并为您的拉取请求添加合适的标注(例如状态检查、注释)。只需按照机器人提供的指引操作即可,您只需在所有仓库中签署一次CLA即可。
本项目已采用[微软开源行为准则](https://opensource.microsoft.com/codeofconduct/)。如需了解更多信息,请查阅[行为准则常见问题](https://opensource.microsoft.com/codeofconduct/faq/),或发送邮件至[opencode@microsoft.com](mailto:opencode@microsoft.com)咨询相关问题。
## 商标声明
本项目可能包含项目、产品或服务的商标与标识。微软商标或标识的合法使用需遵循[微软商标与品牌使用指南](https://www.microsoft.com/en-us/legal/intellectualproperty/trademarks/usage/general)。在修改后的项目版本中使用微软商标或标识,不得造成混淆或暗示微软赞助。任何第三方商标或标识的使用需遵守对应第三方的相关政策。
提供机构:
maas
创建时间:
2025-07-22



