资源简介:
<h1 id="readme">README</h1>
<p>The <strong>TBHubbard dataset</strong> is a collection of data with a tight-binding description of metal organic frameworks (MOFs). The structures are derived from the QMOF database, where first-principles calculations are performed to obtain the electronic density which is then projected onto a localized atomic basis set using PAOFLOW. The data collection is divided into two sub-sets: <code>tight_binding_model</code> and <code>extended_hubbard_model</code>.</p>
<h2 id="downloading-the-dataset">Downloading the dataset</h2>
<p>To facilite the download of this massive dataset, we provide the <code>2-DOWNLOAD.sh</code> script for your convenience. Instead of downloading the compressed files from this page one by one, we suggest you download the script first and use it to obtain all the other files.</p>
<pre><code class="language-Shell">$ ./2-DOWNLOAD.sh -h
Usage: ./2-DOWNLOAD.sh -s {TB, EH} -c {aria2c, wget, curl} -d destination/
</code></pre>
<ul>
<li>You can use the <code>-c</code> option to select a download client from these options: <a href="https://aria2.github.io/">aria2c</a>, <a href="https://www.gnu.org/software/wget/">wget</a> and <a href="https://curl.se/">curl</a>. For faster downloads supporting multiple parallel connections, we recommend <code>aria2c</code>.</li>
<li>To download only the Tight-Binding (<code>TB</code>) or the Extended Hubbard (<code>EH</code>) sub-set, please use the <code>-s</code> option.</li>
<li>The script, by default, downloads all files to a destination folder named <code>TBHubbard/</code>. You can change the destination folder using the <code>-d</code> option.</li>
</ul>
<p>After downloading the files, use the following commands to decompress them:</p>
<pre><code class="language-Shell">cat tight_binding_model.tar.bz2.part-* | tar -I pbzip2 -xvf -
cat extended_hubbard_model.tar.bz2.part-* | tar -I pbzip2 -xvf -
</code></pre>
<h2 id="sub-sets">Sub-sets</h2>
<h3 id="tight-binding-model">Tight-Binding Model</h3>
<p>The <strong>Tight-Binding Model</strong> offers a comprehensive dataset for 10,435 metal-organic frameworks (MOFs), providing key electronic structure data. The electronic density for each MOF is projected onto a localized atomic basis set, generating a tight-binding lattice Hamiltonian. This allows for the study of the electronic properties and interactions within the MOF structures. Additionally, Smooth Overlap of Atomic Positions (SOAP) descriptors are computed for 20,375 MOFs, enriching the dataset with detailed topology information about the local atomic environments.</p>
<ul>
<li><p><strong><code>tb_dft/</code></strong>: This directory contains the Quantum ESPRESSO (QE) calculations used for the tight-binding projections. It includes all relevant input and output files, the tight-binding Hamiltonian, and detailed results from PAOFLOW projections (e.g., <code>arry.pkl</code>, <code>paoflow.out</code>). The <code>bader.out</code>, <code>ACF.dat</code>, and other related files provide further insights into the charge distribution and electronic structure. SCF calculation outputs such as <code>rho.cube</code> and <code>scf</code> files are also included to allow for a deeper understanding of the electronic density. For detailed instructions on the tight-binding projection workflow, please refer to the <a href="tight_binding_model/README.md"><strong><code>tight_binding_model/README.md</code></strong></a>.</p>
</li>
<li><p><strong><code>soap_of_mofs/</code></strong>: This folder includes the SOAP descriptors, which are essential for understanding the local atomic environments within the MOFs. SOAP descriptors come in two variations: <strong>SOAP-3 Å</strong> and <strong>SOAP-5 Å</strong>. These descriptors capture the atomic structure at different length scales, offering both detailed and broader topological information. The filenames are given appending to the MOFs name the suffix <code>_soap.npz</code>. Each file contains these descriptors and allows for easy extraction of essential data. For further information on computing SOAP descriptors, please refer to the <a href="tight_binding_model/scripts/compute_soap-descriptors/README.md"><strong><code>tight_binding_model/scripts/compute_soap-descriptors/README.md</code></strong></a>.</p>
</li>
<li><p><strong><code>scripts/</code></strong>: A collection of helper tools for visualizing the data and generating necessary inputs for further analysis. These scripts make it easier to manipulate, visualize, and utilize the tight-binding and SOAP data for subsequent computational studies and modeling. To learn how to compute tight-binding embeddings from QE, check the <a href="tight_binding_model/scripts/compute_tight-binding_embeddings_from_qe/README.md"><strong><code>tight_binding_model/scripts/compute_tight-binding_embeddings_from_qe/README.md</code></strong></a>. To set up and run SCF calculations with QE, follow the instructions in <a href="tight_binding_model/scripts/setup_qe_scf/README.md"><strong><code>tight_binding_model/scripts/setup_qe_scf/README.md</code></strong></a>.</p>
</li>
</ul>
<p>For more detailed guidance, please refer to the appropriate <code>README.md</code> files in each directory.</p>
<h3 id="extended-hubbard-model">Extended Hubbard Model</h3>
<p>Electronic structure calculations for 242 MOFs. The electronic density is projected onto a localized atomic basis set, providing a tight-binding lattice Hamiltonian of MOFs. A set of 428 calculations are also provided for the self-consistent computation of Hubbard parameters U and V of 242 MOFs. The set is divided according to the manifold chosen for U and V, where d and s orbitals corresponds to <code>ds_perturbations</code>; and d and p orbitals corresponds to <code>dp_perturbations</code>. The tight-binding projection along with the Hubbard parameters constructs the Extended Hubbard model lattice Hamiltonian.</p>
<ul>
<li><strong><code>dp_perturbations</code></strong> and <strong><code>ds_perturbations</code></strong>: QE calculation with the tight-binding projection, including input and output files, as well as the tight-binding Hamiltonian (<code>arry.pkl</code>). The Hubbard parameters computation inputs and outputs (<code>hp.p</code>) are also provided for dp and ds perturbations in each corresponding folder (<code>Hubbard_parameters_full.dat</code> and <code>Hubbard_parameters_nn.dat</code>).</li>
<li><strong><code>extend_hubb_data.json</code></strong>: Tabulated property containing the main input and output QE information for each MOF, divided in dp and ds perturbations.</li>
<li><strong><code>scripts/</code></strong>: helper tools for visualization of main properties and input generation.</li>
</ul>
<h2 id="license">License</h2>
<p>All dataset files are distributed under the <a href="https://cdla.dev/permissive-2-0/"><code>CDLA-Permissive-2.0</code></a> license, while the source code files are distributed under the <a href="https://opensource.org/license/bsd-3-clause"><code>BSD-3-Clause</code></a> license.</p>
<p><strong>Copyright (c) 2025, International Business Machines All rights reserved.</strong></p>