five

PaECTER embeddings for PATSTAT (USPTO and EPO)

收藏
DataCite Commons2026-03-30 更新2025-04-16 收录
下载链接:
https://edmond.mpg.de/citation?persistentId=doi:10.17617/3.BGRPMI
下载链接
链接失效反馈
官方服务:
资源简介:
<p><strong>PaECTER embeddings for all DocDB patent families</strong></p> <p>This data is based on <u>PATSTAT Autumn 2025</u>.</p> <p>File <code>"EPO_PaECTER_embeddings.parquet"</code> contains embeddings for all DocDB patent families with an EPO member (<strong>4.2M</strong>).</p> <p>File <code>"USPTO_PaECTER_embeddings.parquet"</code> contains embeddings for all DocDB patent families with a USPTO member (<strong>9.9M</strong>).</p> <p>We encoded one representative member of the patent family according to the following rule:</p> <ul> <li>has English abstract</li> <li><strong>Authority Priority rule:</strong> EP > WO > US > CN > JP > KR > GB > CA > TW > DE > ES > RU > AU > FR > MX > UA > NZ <br>(for authorities not listed, the following rule applies) </li> <li>within authority, the oldest publication date takes precedence and the lower publication number breaks ties</li> </ul> <p>Both files have the same structure:</p> <ol> <li>Index: DocDB family ID</li> <li>Column "embedding" with list of float32</li> <li>Column "publno" with the publication number of representative document</li> </ol>
提供机构:
Edmond
创建时间:
2024-12-17
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作