five

patccat: A classifier for patent claims

收藏
NIAID Data Ecosystem2026-05-02 收录
下载链接:
https://zenodo.org/record/6395307
下载链接
链接失效反馈
官方服务:
资源简介:
Data version: 3.3.0 Authors:Bernhard Ganglmair (University of Mannheim, Department of Economics, and ZEW Mannheim)W. Keith Robinson (Wake Forest University, School of Law)Michael Seeligson (Southern Methodist University, Cox School of Business) 1. Notes on Data Construction2. Citation and Code3. Description of the Data Files3.1. File List3.2. List of Variables for Files with Claim-Level Information3.3. List of Variables for Files with Patent-Level Information4. Coming Soon! 1. Notes on Data Construction This is version 3.3.0 of the patccat data (patent claim classification by algorithmic text analysis). Patent claims define an invention. A patent application is required to have one or more claims that distinctly claim the subject matter which the patent applicant regards as her invention or discovery. We construct a classifier of patent claims that identifies three distinct claim types: process claims, product claims, and product-by-process claims. For this classification, we combine information obtained from both the preamble and the body of a claim. The preamble is a general description of the invention (e.g., a method, an apparatus, or a device), whereas the body identifies steps and elements (specifying in detail the invention laid out in the preamble) that the applicant is claiming as the invention. The combination of the preamble type and the body type provides us with a more detailed and more accurate classification of claims than other approaches in the literature. This approach also accounts for unconventional drafting approaches. We eventually validate our classification using close to 10,000 manually classified claims. The data files contain the results of our classification. We provide claim-level information for each independent claim of U.S. utility patents granted between 1836 and 2020. We also provide patent-level information, i.e., the counts of different claim types for a given patent. For a detailed description of our classification approach, please take a look at the accompanying paper (Ganglmair, Robinson, and Seeligson 2022). 2. Citation Please cite the following paper when using the data in your own work: Ganglmair, Bernhard, W. Keith Robinson, and Michael Seeligson (2022): "The Rise of Process Claims: Evidence from a Century of U.S. Patents," unpublished manuscript available at https://papers.ssrn.com/abstract=4069994. In the paper, we document the use of process claims in the U.S. over the last century, using the patccat data. We show an increase in the annual share of process claims of about 25 percentage points (from below 10% in 1920). This rise in process intensity of patents is not limited to a few patent classes, but we observe it across a broad spectrum of technologies. Process intensity varies by applicant type: companies file more process-intense patents than individuals, and U.S. applicants file more process-intense patents than foreign applicants. We further show that patents with higher process intensity are more valuable but are not necessarily cited more often. Last, process claims are on average shorter than product claims (with the gap narrowing since the 1970s). We would love to see how other researchers use the data and eventually learn from it. If you have a discussion paper or a publication in which you use the data, please send us a copy at patccat.data@gmail.com. We will the R code used to construct the data on Github with the next data version (version 3.4.0). Contact us at b.ganglmair@gmail.com if you would like to take a look at an earlier version of the code. 3. Description of the Data Files The data files contain claim-level information for independent claims of 10,140,848 U.S. utility patents granted between 1836 and 2020. The files further contain patent-level information for U.S. utility patents. 3.1. File List File list claims-patccat-v3-3-sample.csv claim-level information for independent claims of a sample of 1000 patents issued between 1976 and 2020 claims-patccat-v3-3-1836-1919.csv claim-level information for independent claims of 1,038,041 patents issued between 1836 and 1919 claims-patccat-v3-3-1920-2020.csv claim-level information for independent claims of 9,102,807 patents issued between 1920 and 2020 patents-patccat-v3-3-sample.csv patent-level information for a sample of 1000 patents issued between 1976 and 2020 patents-patccat-v3-3-1836-1919.csv patent-level information for 1,038,041 patents issued between 1836 and 1919 patents-patccat-v3-3-1920-2020.csv patent-level information for 9,102,807 patents issued between 1920 and 2020 3.2. List of Variables for Files with Claim-Level Information For detailed descriptions, see the appendix in Ganglmair, Robinson, and Seeligson (2022). List of Variables (Claim-Level Information) PatentClaim patent claim identifier; 8-digit patent number and 4-digit claim number (Ex: 01234567-0001) singleLine =1 if claim is published in single-line format singleReformat outcome code of reformating of single-line claims Jepson =1 if claim is a Jepson claim JepsonReformat outcome code of reformating of Jepson claims inBegin =1 if claim begins with the word "in" wordsPreamble number of words in the claim preamble wordsBody number of words in the claim body dependentClaims number of dependent claims that refer to this independent claim isMeansPreamble =1 if term "means" is used in the preamble isMeansBody =1 if term "means" is used in the body isMeans =1 if term "means" is used anywhere in the claim (~ means-plus-function claim) processPreamble =1 if terms "method" or "process" are used in the preamble processBody =1 if terms "method" or "process" are used in the body processSimple =1 if terms "method" or "process" are used anywhere in the claim (for simple approach of process claim classification) claimType claim type of full classification (1 = process; 2 = product; 3 = product-by-process; 0 = no type) preambleType preamble type preambleTerm keyword used to classify preamble type preambleTermAlt alternative keyword (if preambleTerm were not used) preambleTextStub first 15 words of the preamble bodyType body type bodyLinesStep number of steps in the body bodyLinesElement number of elements in the body bodyLinesTotal total number of identified lines in the body label 2-character label of the preamble-body combination; classification table maps label to claim type   3.3. List of Variables for Files with Patent-Level Information For detailed descriptions, see the appendix in Ganglmair, Robinson, and Seeligson (2022). List of Variables (Patent-Level Information) patent_id U.S. patent number (8-digit patent number) claims number of independent claims (the sum of the four claim types: 0, 1, 2, and 3) noCategory number of claims without a classified type processClaims number of process claims productClaims number of product claims prodByProcessClaims number of product-by-process claims firstClaim type of the first claim (1 = process; 2 = product; 3 = product-by-process; 0 = no type) simpleProcessClaims number of process claims by simple approach (terms "method" or "process" anywhere in the claim) simpleProcessPreamble number of process claims by simple approach (terms "method" or "process" in the preamble) meansClaims number of means-plus-function claims meansFirst =1 if first claim is a means-plus-function claim JepsonClaims number of Jepson claims JepsonFirst =1 if first claim is a Jepson claim Note: The following variables/fields are currently empty (March 30, 2020); we will populate these variables/fields with data version 3.4.0. preambleTermpreambleTermAltpreambleTextStubbodyLinesStepbodyLinesElementbodyLinesTotal Note: We will release the data for patents issued in 2021 with data version 3.4.0. 4. Coming Soon! We are working on a number of extensions of the patccat data. - With data version 3.4.0, we plan to release data for all published U.S. patent applications (2001 through 2021)- In late spring/early summer 2022, we will release data for patents issued by the European Patent Office (EPO) [Update: March 28, 2023: see https://doi.org/10.5281/zenodo.7776092]- In late spring/early summer 2022, we will release data for patents issued by the Canadian Intellectual Property Office (CIPO)
创建时间:
2025-03-11
5,000+
优质数据集
54 个
任务类型
进入经典数据集
二维码
社区交流群

面向社区/商业的数据集话题

二维码
科研交流群

面向高校/科研机构的开源数据集话题

数据驱动未来

携手共赢发展

商业合作