PaECTER embeddings for PATSTAT (USPTO and EPO)
收藏DataCite Commons2026-03-30 更新2025-04-16 收录
下载链接:
https://edmond.mpg.de/citation?persistentId=doi:10.17617/3.BGRPMI
下载链接
链接失效反馈官方服务:
资源简介:
<p><strong>PaECTER embeddings for all DocDB patent families</strong></p>
<p>This data is based on <u>PATSTAT Autumn 2025</u>.</p>
<p>File <code>"EPO_PaECTER_embeddings.parquet"</code> contains embeddings for all DocDB patent families with an EPO member (<strong>4.2M</strong>).</p>
<p>File <code>"USPTO_PaECTER_embeddings.parquet"</code> contains embeddings for all DocDB patent families with a USPTO member (<strong>9.9M</strong>).</p>
<p>We encoded one representative member of the patent family according to the following rule:</p>
<ul>
<li>has English abstract</li>
<li><strong>Authority Priority rule:</strong> EP &gt; WO &gt; US &gt; CN &gt; JP &gt; KR &gt; GB &gt; CA &gt; TW &gt; DE &gt; ES &gt; RU &gt; AU &gt; FR &gt; MX &gt; UA &gt; NZ
<br>(for authorities not listed, the following rule applies)
</li>
<li>within authority, the oldest publication date takes precedence and the lower publication number breaks ties</li>
</ul>
<p>Both files have the same structure:</p>
<ol>
<li>Index: DocDB family ID</li>
<li>Column "embedding" with list of float32</li>
<li>Column "publno" with the publication number of representative document</li>
</ol>
提供机构:
Edmond
创建时间:
2024-12-17



